As part of the efforts to improve our system overall performance, our system administrators gave me this query which usually take 2 -3 seconds to complete:
SELECT id, firstname, lastname, email FROM mdl_user WHERE username <> 'guest' AND deleted = 0 AND confirmed = 1 AND id NOT IN ( SELECT u.id FROM mdl_role_assignments r, mdl_user u WHERE r.contextid = xxxx AND u.id = r.userid AND r.roleid = 3 ) ORDER BY lastname ASC, firstname ASC;
Does anyone know where did this query come from (which action would trigger this query)? Thanks!
By the way, when I run this query on our staging server, it usually would take 1.5 second to 2 seconds to complete (compare to the production server, which take 2 - 3 seconds to complete). Typically it will generate a result of 63423 rows. The current size of our mdl_user table is 126417 rows, and the mdl_role_assignments table is 153108 rows.
The url would be http://yourmoodle.com/admin/roles/assign.php?contextid=XXXXX&roleid=3
I didn't actually check that this is the case but based on the query alone, that is my guess.
Two comments: the row numbers in Wen's tables are small. I can't see why such a SELECT would be slow with relatively small tables (both under 1M rows) and passable indexes. Maybe the index statistics are stale? Maybe Wen is using a buggy MySQL version? (Who isn't ?)
Maybe the sort buffers or tempmem for MySQL are configured really low? It's a common thing to have a 4GB DB server using the default config that assumes... 128MB (mysql ships with various my.cnf sample files, make use of them).
Depending on the DB version (and good indexes!), a left outer join with role_assignment with a WHERE ra.id IS NULL might perform better.
"IN" is evil!
Repeat: "IN" is super-evil!
Cause: subquery is executed for each record in the principal query. So things get worse and worse as long as the number of records in the principal grows and grows.
Note: this is a "simple", but basically true explanation.
Definitively I'd use "EXISTS" / outer join / boolean MINUS instead. All them will execute subquery only once. My personal favourite, EXISTS, mainly because it expresses in a natural (English) way what we are doing (helps code readability), while using outer joins is a bit more "cryptic".
Disclaimer: Not tested at all!
SELECT id, firstname, lastname, email
FROM mdl_user u
WHERE username <> 'guest'
AND deleted = 0
AND confirmed = 1
AND NOT EXISTS (
FROM mdl_role_assignments r
WHERE r.contextid = xxxx
AND r.roleid = yyyy
AND r.userid = u.id)
ORDER BY lastname ASC, firstname ASC;
PS: Also, apart of all this... is there any reason in the original query to be selecting from mdl_user table in the subquery? Seems a waste of joining IMO. User id is already in mdl_role_assignments table.
is there any reason in the original query to be selecting from mdl_user
Yes. We want to build the list of candidate users to enrol for the SELECT box.
For really large sites, we SELECT a ton of users, bring them across the DB->PHP wire only to then decide we don't want them.
Is there a fancy "CASE" syntax that would allow us to return all the rows only if the rowcount is below a number? Pseudo-SQL known to not-work follows
SELECT CASE ( COUNT(s.id) THEN 1, null, ... (a "fake" single row full of nulls - to distinguish from true empty) ELSE s.id, s.username, ... (the actual rows) END FROM ( SELECT - the select we have now - ) s
This won't work because CASE is evaluated per-row... If we could SELECT to a temp table (ah... if we could!) then we could retrieve the count cheaply and then retrieve the actual results if desired, without hitting the DB twice.
In terms of why do we use the subselect... I think early versions of MySQL did better with it.
The original query sometimes took around 21xx ms to run, but most of the time it took around 9xx ms. Kinda weird. The "NOT EXISTS" query is pretty stable always taking around 78x ms to run though.
Oh by the way, for Marin, we are running MySQL 5.0.51a on our production server. Is there any known bugs in this version that could cause some performance issues? Thanks!
Wish I knew. I used to follow MySQL's changelog between 2000 (3.23.x series) and roughly 2004. (4.1 or so).
You want to read the changelog of the MySQLs that came after it, inc the 5.1.x series. I haven't been keeping track...
But truly, the tables in the JOIN aren't large. There is a good chance that the answer is in rebuilding the index stats and tuning my.cnf .
nice to see that it's somehow quicker. Also it will we better and better along the number of users grows.
In any case, 780ms (or 952ms) to filter out (and fetch) more than 100.000 records doesn't sound to me like bad timing at all (just one subjective appreciation, it's a lot of data to move!).
About MySQL version, I don't know if there is some exact bug affecting that but after a quick look to http://dev.mysql.com/doc/mysqld-version-reference/en/key-changes-5-0.html it seems that you are really beyond current version (specially in security fixes). Perhaps it wouldn't be crazy to upgrade (after proper tests and so on, of course).
IMO it can be more a thing of MySQL tuning/MySQL throughput limits more than bugs, but, once again, it's one subjective appreciation.
By the way, for larger sites, specifically on this "Assign roles" page, The element on the right side will have too many users to show anyway, and will just display:
"There are too many users to show. Try searching instead."
In this case (large sites), it doesn't make sense to still run this query then as the query result will not be displayed anyway (unless that I missed something here). Am I correct on this? If I'm correct, then maybe we should add a programming logic on this page - if too many users (need to use "search" instead), then skip this query. What do people think? Thoughts or comments? Thanks!
Within the mdl_user table there are fields for lastlogin, currentlogin, lastip, etc. These are constantly being updated.
Fields like these should really be in a separate table to avoid database contention.
The problem is that for many common sql implementations, it will not allow a write to the table while a read is taking place. So, all such users are left hanging while until the read is completed. This can be a big performance hit, and a good many difficult queries are made that include the mdl_user table.
So, if these fields are moved outside of that table, the user table itself would be updated only infrequently (relativly) and there would be many fewer hangs on the user end while waiting for other queries to be completed.
I am suspicious that this is the cause of may short term page loading delays that one sees on busy Moodle sites.
I don't believe you are correct on this. The last user_lastaccess table records access to individual course. And I can certainly tell you that the fields I mentioned to get updated with the latest 1.9 stable version.
The user table is updated directly and at least three fields are continually being written to for every user: user.lastaccess, user.currentlogin, and user.lastip. user.lastlongin also seems to get updated frequently.
So, this is a constant data contention issue with the type of Moode installations that are installed using the many of the most common Moodle implementation instruction sets. And it is easy to demonstrate: Just do a lengthy query involving the user table (which include most long queries) and everyone hangs.
This problem might be solved with different database engines, but I am not certain about that. I don't know any that are immune from all database contention issues, I think which are affected has to do with what fields and rows get used in the queries.
I have created a tracker entry MDL-20613 for the issue.
if I'm not wrong that was fixed long time ago by introducing the LASTACCESS_UPDATE_SECS constant (currently being 60secs), so the users table only gets updated after that interval.
1 minute sounded like a good (enough) time, both accurate and reducing contention a lot (at least in moodle.org if I don't remember wrongly).
In any case, IMO, it should pretty easy to convert that constant to one new configuration variable, so people can play with it (1min, 2 mins, 5 mins...1 hour). After all the accuracy of that info doesn't seem to be highly critical (as far as logs have all the details already if needed).
The feature you describe helps somewhat, but not completely.
Say you have a long query, like what Wen mentions or perhaps a large calculation, something done with a cron job etc. A query might also last for a number of seconds if there is an index problem in a table, etc.
At that point, anyone trying to log in or who's time to update comes up will have to wait for that first query to finish, and any additional query that uses the user table, will have to wait for the write to finish, if I understand database contention correctly.
Only updating the user table once a minute helps a little, but any reasonably sized site may have dozens of logable activities a minute during active times, so the contention is not at all unlikely.
As a result, a single slow query can lock up all users for the duration (to say nothing of the typical user action of hitting refresh whenever this happens). It does not just affect the user updating their user record.
Now, you may not see this with some database engines or with sites or courses that don't do computationally difficult tasks. But it is a common enough problem, in my experience, to suggest that we look at putting those fields that get updated often in the user table into a separate table.
many common sql implementations
Um. It is only an issue with one specific implementation. It is a popular one... but it is just one
As Tim says, it's been fixed in 1.9 -- and now installs using isam/myisam get a pileup of processes contending for the lock on user_lastaccess, instead of on 'user'.
The problem has been shifted, which means that at least we can read the user table.
If log and user_lastaccess are actually being a big issue with lock contention (likely, but we need to measure it first) then there are a few options we could try. But they are not trivial... and they are not guaranteed to work either. That's why we need a measurable case...