We're running a Moodle 2.9 install on a 16 core, 2.6GHZ, 48GB server with a separate database server backend. We've been seeing an issue recently where the page loads for things like the Assignment "Grade Submission" pages take 40+ seconds for courses with ~250 students.
It looks like most of this time is being spent in PHP. For example, loading the following url leads to teh stats below.
https://<url>/mod/assign/view.php?action=grading&id=<id>&tilast
43.990729 secs
RAM: 31.6MB
RAM peak: 35.9MB
Included 310 files
get_string calls: 2005
DB reads/writes: 354/1
DB queries time: 0.29162 secs
ticks: 4399 user: 4176 sys: 96 cuser: 0 csys: 0
Load average: 0.72
Session (core\session\database): 15.7KB
To get more insight, we enabled xhprof. That leads to the following stats for the same page show above (sorted by Excl. Wall Time, truncated to top-10):
Function Name | Calls | Calls% | Incl. Wall Time (microsec) | IWall% | Excl. Wall Time (microsec) | EWall% | Incl. CPU (microsecs) | ICpu% | Excl. CPU (microsec) | ECPU% | Incl. MemUse (bytes) | IMemUse% | Excl. MemUse (bytes) | EMemUse% | Incl. PeakMemUse (bytes) | IPeakMemUse% | Excl. PeakMemUse (bytes) | EPeakMemUse% |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
preg_match_all | 130,278 | 9.8% | 37,800,505 | 85.8% | 37,800,505 | 85.8% | 37,176,000 | 87.1% | 37,176,000 | 87.1% | 13,529,488 | 45.3% | 13,529,488 | 45.3% | 162,392 | 0.5% | 162,392 | 0.5% |
preg_replace | 245,679 | 18.5% | 1,723,947 | 3.9% | 1,723,947 | 3.9% | 1,792,000 | 4.2% | 1,792,000 | 4.2% | 1,086,893,344 | 3636.4% | 1,086,893,344 | 3636.4% | 366,320 | 1.1% | 366,320 | 1.1% |
filter_phrases | 414 | 0.0% | 40,951,657 | 93.0% | 1,473,154 | 3.3% | 40,136,000 | 94.0% | 1,044,000 | 2.4% | 8,774,000 | 29.4% | -1,022,890,400 | -3422.2% | 703,592 | 2.1% | 3,072 | 0.0% |
filter_urltolink::convert_urls_into_links | 414 | 0.0% | 1,047,420 | 2.4% | 646,368 | 1.5% | 992,000 | 2.3% | 312,000 | 0.7% | 13,008 | 0.0% | -103,655,240 | -346.8% | 719,064 | 2.1% | 202,792 | 0.6% |
mysqli::query | 350 | 0.0% | 340,713 | 0.8% | 340,713 | 0.8% | 28,000 | 0.1% | 28,000 | 0.1% | 61,488 | 0.2% | 61,488 | 0.2% | 1,248 | 0.0% | 1,248 | 0.0% |
preg_match | 24,095 | 1.8% | 211,461 | 0.5% | 211,461 | 0.5% | 232,000 | 0.5% | 232,000 | 0.5% | 37,248 | 0.1% | 37,248 | 0.1% | 10,384 | 0.0% | 10,384 | 0.0% |
Memcached::get | 508 | 0.0% | 192,168 | 0.4% | 192,168 | 0.4% | 64,000 | 0.1% | 64,000 | 0.1% | 5,552,480 | 18.6% | 5,552,480 | 18.6% | 4,974,888 | 14.8% | 4,974,888 | 14.8% |
intval | 125,856 | 9.5% | 48,559 | 0.1% | 48,559 | 0.1% | 156,000 | 0.4% | 156,000 | 0.4% | 840 | 0.0% | 840 | 0.0% | 0 | 0.0% | 0 | 0.0% |
cache::get | 3,836 | 0.3% | 447,205 | 1.0% | 41,477 | 0.1% | 328,000 | 0.8% | 36,000 | 0.1% | 14,430,960 | 48.3% | -2,111,032 | -7.1% | 6,917,056 | 20.5% | 1,008 | 0.0% |
When we watch the CPU usage, you can see the page load pegging a single core for the duration.
So it appears we're being limited by the PHP preg_match_all() performance on a single core. The other cores are pretty much unused during this time. Any idea what's going on here or how we might be able to fix it?