The first sign was that users started seeing the "Error: Database connection failed". That was because MySQL had reached its connection limit. top showed that mysqld was using all available CPU time, and running "show processlist" in MySQL shows a large number of threads executing the last query in get_user_access_sitewide() stuck in the "Copying to tmp table" state -- like this:
| 325 | moodleuser | localhost | moodle | Query | 0 | Copying to tmp table | SELECT sctx.path, ra.roleid, ctx.path AS parentpath, rco.capability, rco.permission FROM mdl_role_assignments ra JOIN mdl_context ctx ON ra.contextid=ctx.id JOIN mdl_context sctx ON (sctx.path LIKE CONCAT(ctx.path,'/%') ) JOIN mdl_role_capabilities rco ON (rco.roleid=ra.roleid AND rco.contextid=sctx.id) WHERE ra.userid = 11095 AND ctx.contextlevel <= 40 AND sctx.contextlevel <= 50 ORDER BY sctx.depth, sctx.path, ra.roleid |
While I do see some other queries in the processlist, there's usually only one or two of them, with up to 20 of those get_user_access_sitewide() queries. Sometimes, there are up to 5 _identical_ (i.e. with userid set to the same value) queries running simultaneously -- which seems weird?
The three tables involved (mdl_role_assignments, mdl_context, mdl_role_capabilities) contain 50516, 64108 and 7681 records respectively. There are 7097 user accounts in Moodle.
I've checked all sorts of things to try to make that query execute faster: running myisamchk, optimize table, analyze table, converting those three tables to InnoDB, increasing the size of the query cache, checking that all the right indexes exist (by comparing against a clean install). Nothing's really helped: I'm coming to the conclusion that the query is just plain slow, but that it's being run too often. Explain and MySQL profiling show that the query spends all its time "Copying to tmp table" because of the "order by".
What has helped a little has been adding more VCPUs to the virtual machine, but that's just buying a bit more headroom, not actually solving the problem. I'm still seeing those queries queuing up in the processlist but less often.
Does anyone have any ideas? Would converting to PostgreSQL fare better in this circumstance? I've been hesitant to try, because I'm more familiar with MySQL and the conversion seems quite complex -- and if PostgreSQL also struggles with the same query, then I've achieved nothing.
Environment: Moodle 1.9.5, PHP 5.2, MySQL 5.0.86, FreeBSD/i386 7.1-RELEASE
Hardware: 2/4 VCPUs, 3 GB RAM, VMware ESX 3.5 (the host is not heavily loaded)
Usage: 50-100 simulatenous users in peak times