To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -
Number of replies: 10

Hi all. I need some advice on scaling a complex database query.

The AJAX Marking Block has matured a lot over the last year, and now uses a large, complex SQL query to get all of a teacher's unmarked work from across the whole site. I chose this strategy so that there's the potential to be very scalable as it's basically just a big SELECT, so the DB should be the most efficient way to do it. It works fine on medium-large installs, so for almost everyone it's working great, but yesterday I made a copy of our biggest university customer's database for testing and found it was woefully slow.

Results so far on 2.3:

  • Medium sized install (55,000 users, 20,000 assignment submissions, 3000 forum posts, 500,000 question states): ~1.1 seconds - fine for an AJAX call.
  • Monster install (45,000 users, 100,000 assignment submissions, 58,000 forum posts, 5,000,000 question states): >60 seconds. Not good.

One big difference is that the monster install is on MySQL 5.0.77, rather than the newer 5.5.x that the medium one is on. I'm running another test currently to see what difference that makes, but in the meantime, I've noticed that many of the fields I'm using in ON or WHERE clauses are not indexed. I'll be running test by adding these indexes over the next few days to see what a difference it will make, but I need to know whether it's a good idea to have the block put those indexes in place automatically if it turns out they help.

As an example, the sub-query to get unmarked essay questions for the quiz (which takes 84 seconds on the monster install with SQL_NO_CACHE specified, but around 1 second on the medium one) looks like this:

SELECT SQL_NO_CACHE
    quiz_attempts.userid, sub.id AS submissionid
FROM
    mdl_quiz moduletable
        INNER JOIN
    mdl_quiz_attempts quiz_attempts ON moduletable.id = quiz_attempts.quiz
        INNER JOIN
    mdl_question_attempts question_attempts ON question_attempts.questionusageid = quiz_attempts.uniqueid
        INNER JOIN
    mdl_question_attempt_steps sub ON question_attempts.id = sub.questionattemptid
        INNER JOIN
    mdl_question question ON question_attempts.questionid = question.id
        INNER JOIN
    mdl_course_modules course_modules ON course_modules.instance = moduletable.id AND course_modules.module = 14
WHERE quiz_attempts.timefinish > 0 
AND quiz_attempts.preview = 0
AND question_attempts.behaviour = 'manualgraded'
AND sub.state = 'needsgrading'
AND NOT EXISTS( SELECT 1 FROM mdl_question_attempt_steps st WHERE st.state IN ('gradedwrong' , 'gradedpartial', 'gradedright', 'mangrwrong', 'mangrpartial', 'mangrright')
AND st.questionattemptid = question_attempts.id)

This query has the following columns that are not indexed:

  • quiz_attempts.timefinished
  • quiz_attempts.preview
  • question_attempts.behaviour
  • question_attempt_steps.state
  • quiz_attempts.uniqueid (Unique constraint - not sure if this is translated to an index or not)

I'm pretty sure there's no advantage in indexing the low-selecitivty columns like question_attempt_steps.state as there are so many duplicated columns, but the quiz_attempts.timefinished column may benefit. Should I add the index via the AJAX Marking Block upgrade script if it proves to make a difference or is this a bad idea?

Average of ratings: -
In reply to Matt Gibson

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Have you done an EXPLAIN, to find out how MySQL is processing the query?

The most common cause pathalogically bad query performance in MySQL is that for a certain class of queries, it will treat a subquery as a correlated subquery, when there is absolutely no need to do so. There are examples of that in the tracker.

Before you write any upgrade code, add the index directly in the DB, and see if it makes a difference. Also, if possible, test on a big Postgres install with and without the extra index.

For your query, the most likely candiate is the NOT EXISTS bit. I bet that is being executed as a correlated subquery. You could try replacing that with a LEFT JOIN.

Actually, why don't you just copy the code from mod/quiz/report/grading (get_question_state_summary). I have not heard any complaints about the performance there, and that does something similar.

Average of ratings: Useful (1)
In reply to Tim Hunt

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

Hi Tim, thanks for your feedback!

Yes, I did do an EXPLAIN, but it wasn't immediately clear what was happening because there are 50 odd lines in it (for the whole query, not just the small part above) and I'm only just learning how to read EXPLAIN output. I did see a couple of DEPENDENT SUBQUERY bits, but I assumed they were the NOT EXISTS bits that are intended to just use the index and would therefore be super fast. I'll double check this.

I did come across an old bug in MySQL (http://bugs.mysql.com/bug.php?id=9090) that causes a problem correlated subqueries as you suggested, so I'm hoping that when I try the same thing on a higher version of MySQL, it'll be OK. I'll alter it to use LEFT JOIN to see what's happening. Annoyingly, importing and upgrading the database to MySQL 5.5 has so far taken 6 hours and isn't done yet, so I can't fully verify it yet.

I'll check out the quiz report code and see if I can adapt it for the query. Do you know if a MySQL data dump can be imported direclty into postgres?

In reply to Matt Gibson

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Hubert Chathi -

One thing to remember about indexes is that MySQL (I'm not sure about others) will only use one index per table at a time.  So, for exampe, if you have an index on quiz_attempts.timefinished, and another index on quiz_attempts.uniqueid (BTW, yes, it looks like the unique constraint gets translated to an index), then MySQL will only use one index or the other; it won't use both.  In order for MySQL to use an index on both fields, you have to create an index on both fields.  And an index on two fields can also be used as an index on the only first field listed in the index, but not as an index on only the second field listed.

But I suspect that if you change the query to use a left join as Tim suggested, you will find that it will be much faster already, and you probably won't need the extra indexes.

Average of ratings: Useful (2)
In reply to Hubert Chathi

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

Thanks, Hubert. I hadn't realised that at all.

The upgrade to 5.5 improved things a lot, but still has a big performance issue as the above query is 11.8 seconds.

I'm looking at making concatenated indexes now, because altering it to use a left join only took 1 second off, leaving it at 10 seconds. 

In reply to Matt Gibson

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Hubert Chathi -

Can you post your new (LEFT JOIN) query, and post the result of EXPLAIN on that query?

In reply to Hubert Chathi

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

I'll post a few different EXPLAINs for comparison. Here's the one for the above query taking 37 seconds:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY course_modules ref mdl_courmodu_mod_ix,mdl_courmodu_ins_ix mdl_courmodu_mod_ix 8 const 1209  
1 PRIMARY moduletable eq_ref PRIMARY PRIMARY 8 exeter-monster.course_modules.instance 1 Using index
1 PRIMARY quiz_attempts ref mdl_quizatte_uni_uix,mdl_quizatte_quiuseatt_uix,mdl_quizatte_qui_ix mdl_quizatte_qui_ix 8 exeter-monster.moduletable.id 23 Using where
1 PRIMARY question_attempts ref PRIMARY,mdl_quesatte_queslo_uix,mdl_quesatte_que_ix,mdl_quesatte_que2_ix mdl_quesatte_que2_ix 8 exeter-monster.quiz_attempts.uniqueid 5 Using where
1 PRIMARY question eq_ref PRIMARY PRIMARY 8 exeter-monster.question_attempts.questionid 1 Using index
1 PRIMARY sub ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix mdl_quesattestep_queseq_uix 8 exeter-monster.question_attempts.id 1 Using where
2 DEPENDENT SUBQUERY st ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix mdl_quesattestep_queseq_uix 8 exeter-monster.question_attempts.id 1 Using where
In reply to Hubert Chathi

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

Here's a LEFT JOIN version, which is actually a bit faster at 14 seconds:

SELECT SQL_NO_CACHE
    quiz_attempts.userid, sub.id AS submissionid
FROM
    mdl_quiz moduletable
        INNER JOIN
    mdl_quiz_attempts quiz_attempts ON moduletable.id = quiz_attempts.quiz
        INNER JOIN
    mdl_question_attempts question_attempts ON question_attempts.questionusageid = quiz_attempts.uniqueid
        INNER JOIN
    mdl_question_attempt_steps sub ON question_attempts.id = sub.questionattemptid
        INNER JOIN
    mdl_question question ON question_attempts.questionid = question.id
        INNER JOIN
    mdl_course_modules course_modules ON course_modules.instance = moduletable.id AND course_modules.module = 14
        LEFT JOIN
    mdl_question_attempt_steps st ON st.state IN ('gradedwrong' , 'gradedpartial',
        'gradedright',
        'mangrwrong',
        'mangrpartial',
        'mangrright') AND st.questionattemptid = question_attempts.id
WHERE
    quiz_attempts.timefinish > 0 AND quiz_attempts.preview = 0 AND question_attempts.behaviour = 'manualgraded' AND sub.state = 'needsgrading' AND st.id IS NULL
          

And here's the EXPLAIN for it:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE course_modules ref mdl_courmodu_mod_ix,mdl_courmodu_ins_ix mdl_courmodu_mod_ix 8 const 1209  
1 SIMPLE moduletable eq_ref PRIMARY PRIMARY 8 exeter-monster.course_modules.instance 1 Using index
1 SIMPLE quiz_attempts ref mdl_quizatte_uni_uix,mdl_quizatte_quiuseatt_uix,mdl_quizatte_qui_ix mdl_quizatte_qui_ix 8 exeter-monster.moduletable.id 23 Using where
1 SIMPLE question_attempts ref PRIMARY,mdl_quesatte_queslo_uix,mdl_quesatte_que_ix,mdl_quesatte_que2_ix mdl_quesatte_que2_ix 8 exeter-monster.quiz_attempts.uniqueid 5 Using where
1 SIMPLE question eq_ref PRIMARY PRIMARY 8 exeter-monster.question_attempts.questionid 1 Using index
1 SIMPLE sub ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix mdl_quesattestep_queseq_uix 8 exeter-monster.question_attempts.id 1 Using where
1 SIMPLE st ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix mdl_quesattestep_queseq_uix 8 exeter-monster.question_attempts.id 1 Using where; Not exists
In reply to Hubert Chathi

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

Fixed it! I added an index like this:

CREATE INDEX amb_que_att_st_combo ON mdl_question_attempt_steps (state, questionattemptid)

and now it runs at 0.3 seconds. New EXPLAIN for the LEFT JOIN query:

id select_type table type possible_keys key key_len ref rows Extra  
1 SIMPLE sub ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix,amb_que_att_st_combo amb_que_att_st_combo 41 const 23928 Using where; Using index
1 SIMPLE question_attempts eq_ref PRIMARY,mdl_quesatte_queslo_uix,mdl_quesatte_que_ix,mdl_quesatte_que2_ix PRIMARY 8 exeter-monster.sub.questionattemptid 1 Using where  
1 SIMPLE quiz_attempts eq_ref mdl_quizatte_uni_uix,mdl_quizatte_quiuseatt_uix,mdl_quizatte_qui_ix mdl_quizatte_uni_uix 8 exeter-monster.question_attempts.questionusageid 1 Using where  
1 SIMPLE course_modules ref mdl_courmodu_mod_ix,mdl_courmodu_ins_ix mdl_courmodu_ins_ix 8 exeter-monster.quiz_attempts.quiz 1 Using where  
1 SIMPLE question eq_ref PRIMARY PRIMARY 8 exeter-monster.question_attempts.questionid 1 Using index  
1 SIMPLE moduletable eq_ref PRIMARY PRIMARY 8 exeter-monster.course_modules.instance 1 Using where; Using index
1 SIMPLE st ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix,amb_que_att_st_combo mdl_quesattestep_queseq_uix 8 exeter-monster.sub.questionattemptid 1 Using where; Not exists
In reply to Matt Gibson

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Hubert Chathi -

Matt, that's a very interesting result.  Could you, just for curiosity's sake, try the query again without the combined index, but with a new index on mdl_question_attempt_steps, on just the state column?  If I'm reading the EXPLAIN correctly, then it should run just as fast.

As I understand EXPLAINs, the order that it's given represents the order that MySQL processes the tables.  So without the index (in the first EXPLAIN), for the given data set, MySQL decides that it's best to start by querying mdl_course_modules (course_modules) for module=14, then join with mdl_quiz (moduletable), and so on.  It doesn't get to mdl_question_attempt_steps (sub) until second-last.

With the new index, it starts by querying mdl_question_attempt_steps (sub) for state = 'needsgrading', and then proceding from there.  At this point, the 'state' part of the index is the only part that's used, since it isn't matching questionattemptid.

In reply to Matt Gibson

Re: To scale a complex query, I think I need my plugin to add indexes on core tables. What things should I consider?

by Matt Gibson -

Interestingly, the NOT EXISTS version runs faster at around 0.2 seconds:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY sub ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix,amb_que_att_st_combo amb_que_att_st_combo 41 const 23928 Using where; Using index
1 PRIMARY question_attempts eq_ref PRIMARY,mdl_quesatte_queslo_uix,mdl_quesatte_que_ix,mdl_quesatte_que2_ix PRIMARY 8 exeter-monster.sub.questionattemptid 1 Using where
1 PRIMARY quiz_attempts eq_ref mdl_quizatte_uni_uix,mdl_quizatte_quiuseatt_uix,mdl_quizatte_qui_ix mdl_quizatte_uni_uix 8 exeter-monster.question_attempts.questionusageid 1 Using where
1 PRIMARY course_modules ref mdl_courmodu_mod_ix,mdl_courmodu_ins_ix mdl_courmodu_ins_ix 8 exeter-monster.quiz_attempts.quiz 1 Using where
1 PRIMARY question eq_ref PRIMARY PRIMARY 8 exeter-monster.question_attempts.questionid 1 Using index
1 PRIMARY moduletable eq_ref PRIMARY PRIMARY 8 exeter-monster.course_modules.instance 1 Using where; Using index
2 DEPENDENT SUBQUERY st ref mdl_quesattestep_queseq_uix,mdl_quesattestep_que_ix,amb_que_att_st_combo mdl_quesattestep_queseq_uix 8 exeter-monster.question_attempts.id 1 Using where