Hi all. I need some advice on scaling a complex database query.
The AJAX Marking Block has matured a lot over the last year, and now uses a large, complex SQL query to get all of a teacher's unmarked work from across the whole site. I chose this strategy so that there's the potential to be very scalable as it's basically just a big SELECT, so the DB should be the most efficient way to do it. It works fine on medium-large installs, so for almost everyone it's working great, but yesterday I made a copy of our biggest university customer's database for testing and found it was woefully slow.
Results so far on 2.3:
- Medium sized install (55,000 users, 20,000 assignment submissions, 3000 forum posts, 500,000 question states): ~1.1 seconds - fine for an AJAX call.
- Monster install (45,000 users, 100,000 assignment submissions, 58,000 forum posts, 5,000,000 question states): >60 seconds. Not good.
One big difference is that the monster install is on MySQL 5.0.77, rather than the newer 5.5.x that the medium one is on. I'm running another test currently to see what difference that makes, but in the meantime, I've noticed that many of the fields I'm using in ON or WHERE clauses are not indexed. I'll be running test by adding these indexes over the next few days to see what a difference it will make, but I need to know whether it's a good idea to have the block put those indexes in place automatically if it turns out they help.
As an example, the sub-query to get unmarked essay questions for the quiz (which takes 84 seconds on the monster install with SQL_NO_CACHE specified, but around 1 second on the medium one) looks like this:
SELECT SQL_NO_CACHE quiz_attempts.userid, sub.id AS submissionid FROM mdl_quiz moduletable INNER JOIN mdl_quiz_attempts quiz_attempts ON moduletable.id = quiz_attempts.quiz INNER JOIN mdl_question_attempts question_attempts ON question_attempts.questionusageid = quiz_attempts.uniqueid INNER JOIN mdl_question_attempt_steps sub ON question_attempts.id = sub.questionattemptid INNER JOIN mdl_question question ON question_attempts.questionid = question.id INNER JOIN mdl_course_modules course_modules ON course_modules.instance = moduletable.id AND course_modules.module = 14 WHERE quiz_attempts.timefinish > 0
AND quiz_attempts.preview = 0
AND question_attempts.behaviour = 'manualgraded'
AND sub.state = 'needsgrading'
AND NOT EXISTS( SELECT 1 FROM mdl_question_attempt_steps st WHERE st.state IN ('gradedwrong' , 'gradedpartial', 'gradedright', 'mangrwrong', 'mangrpartial', 'mangrright')
AND st.questionattemptid = question_attempts.id)
This query has the following columns that are not indexed:
- quiz_attempts.timefinished
- quiz_attempts.preview
- question_attempts.behaviour
- question_attempt_steps.state
- quiz_attempts.uniqueid (Unique constraint - not sure if this is translated to an index or not)
I'm pretty sure there's no advantage in indexing the low-selecitivty columns like question_attempt_steps.state as there are so many duplicated columns, but the quiz_attempts.timefinished column may benefit. Should I add the index via the AJAX Marking Block upgrade script if it proves to make a difference or is this a bad idea?