I'm looking at a moodle postgresql database with about 4000 courses, 300K mdl_context records and 200 course categories. The following query gets executed on pretty much every page load I've seen so far and even operating off of fully cached data consistently takes around 200ms. Because of the frequency with which the query gets executed, with most other common queries falling into the sub-millisecond territory, I'd like to find a way to make it faster. I don't know that much can be done with indexing, as it's not a problem with seeking in any one of those tables. I'd think making a view out of the query and/or using pgmemcache would result in being able to run it substantially more quickly (but would require application code changes). Are there any suggested database optimizations that would be transparent to the application and would help this query run faster? Is it something that's already been addressed in current development (I'm using version 2.2)? Am I thinking it's a bigger problem than it actually is? Right now it seems to have a substantial impact on the number of req/s I can handle, but that may be due to testing against pages that aren't normally hit as heavily.
Here's the query:
SELECT c.id, c.sortorder, c.visible, c.fullname, c.shortname, c.category, cat.path AS categorypath , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance
FROM mdl_course c
LEFT JOIN mdl_context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50)
LEFT JOIN mdl_course_categories cat ON cat.id=c.category
WHERE c.id <> $1
ORDER BY c.sortorder ASC LIMIT 20 OFFSET 0
2012-04-20 22:50:47 UTC DETAIL: parameters: $1 = '1'
This looks like a useful observation. I think it is worth reporting as a performance bug in the Tracker (http://docs.moodle.org/dev/Tracker_introduction).
It might be worth testing to see if
SELECT * FROM mdl_course c ORDER BY c.sortorder ASC LIMIT 20
is also slow. Then test
SELECT * FROM mdl_course c WHERE c.id <> 1 ORDER BY c.sortorder ASC LIMIT 20
If both of those are fast, we can rewrite the SQL. If those are slow, then we need a fundamental rethink of what is being done here.
Hmm... the two queries you suggest take about 3.8 and 4.6 ms respectively. I'm attaching the EXPLAIN ANALYZE results. Will file a ticket in tracker tomorrow. Thanks.
Right, you can see that the query optimiser gets it totally wrong for the more complex query.
EXPLAIN ANALYZE SELECT * FROM mdl_course c WHERE c.id <> 1 ORDER BY c.sortorder ASC LIMIT 20;
It basically does a Sort Method: top-N heapsort on the courses table, which is fast.
For the original table, it does a Sort Method: quicksort of the entire courses table. Then does the left join on all rows of the courses table with the other tables, and only limits to 20 rows at the end.
Stupid thing. I wonder how we can make it realise that it only needs to find the top 20 rows of the courses table before doing the LEFT JOINs? I can't think of a way right now, but there must be a way. I wonder if it is worth reporting this example in the MySQL but database too?
Forgive my ignorance if I am COMPLETELY wrong with this... but wouldn't the following work?
SELECT courses.*,ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance,cat.path AS categorypath
c.id, c.sortorder, c.visible, c.fullname, c.shortname, c.category
WHERE c.id <> $1
ORDER BY c.sortorder ASC LIMIT 20
) as courses
mdl_context ctx ON (ctx.instanceid = courses.id AND ctx.contextlevel = 50)
mdl_course_categories cat ON cat.id=c.category