Side-effects of replacing get_my_courses()

Side-effects of replacing get_my_courses()

by Elvedin Trnjanin -
Number of replies: 12
We have an export process that runs over all of our (<=50k) users that maps users to their Moodle courses. How that's done is we iterate over every user and get the users courses with the get_my_courses() function.

The problem is that process takes about 25 hours to complete, which has gone up from 8 hours before our fall and spring semesters. Since get_my_courses() can take about half an hour for 20k users that are not enrolled in an courses, the function had to be replaced.

This is what get_my_courses_and_get_them_fast() essentially looks like -

$courses = get_records_sql("select c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid from mdl_role_assignments ra, mdl_context con, mdl_course c where ra.userid=$userid and ra.contextid=con.id and con.instanceid=c.id")

Not the most efficient query I'll admit, but it decreased the export time from 25 hours to 2-4 minutes. Although I haven't done thorough testing yet, it does seem to be correct for the bulk of our academic course users. I've noticed that some users are enrolled into our front page (Moodle course id = 1) which the query I'm using doesn't catch, but that's not exactly important.

What other courses will the query I'm using miss?
Average of ratings: -
In reply to Elvedin Trnjanin

Re: Side-effects of replacing get_my_courses()

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Try

SELECT c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid
FROM mdl_role_assignments ra
JOIN mdl_context con ON ra.contextid=con.id AND con.contextlevel = 50
JOIN mdl_course c ON con.instanceid=c.id
WHERE ra.userid = $userid


Actually, don't iterate over users, doing thousands of separate queries. Just do one query, and to avoid running out of memory, use get_recordset_sql. Something like.

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid
FROM mdl_user u
JOIN mdl_role_assignments ra ra.userid = u.id
JOIN mdl_context con ON ra.contextid=con.id AND con.contextlevel = 50
JOIN mdl_course c ON con.instanceid=c.id
ORDER BY u.id, c.id


In reply to Tim Hunt

Re: Side-effects of replacing get_my_courses()

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Tim's query is of course the right one, but just to note, the answer to the 'what am I missing' question is, you will not include students registered at category level. Most sites don't do that, so you are probably okay with it - but some might.
In reply to sam marshall

Re: Side-effects of replacing get_my_courses()

by Robert Russo -
The following query should grab all users from all contexts in all courses for all users.

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE con.contextlevel =50
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE con.contextlevel =40
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE con.contextlevel =10
JOIN mdl_course c ON con.instanceid = c.id
)
ORDER BY sortuserid, sortcourseid

For a single user:

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.username = 'rrusso'
AND con.contextlevel =50
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.username = 'rrusso'
AND con.contextlevel =40
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.username = 'rrusso'
AND con.contextlevel =10
JOIN mdl_course c ON con.instanceid = c.id
)
ORDER BY sortuserid, sortcourseid

If you are running this (like on the my page to speed up getting my coruses), you can modify the get_my_courses query like so:

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.id = $USER->id
AND con.contextlevel = CONTEXT_COURSE
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.id = $USER->id
AND con.contextlevel = CONTEXT_COURSECAT
JOIN mdl_course c ON con.instanceid = c.id
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
WHERE u.id = $USER->id
AND con.contextlevel = CONTEXT_SYSTEM
JOIN mdl_course c ON con.instanceid = c.id
)
ORDER BY sortuserid, sortcourseid


WHEW.
In reply to Robert Russo

Re: Side-effects of replacing get_my_courses()

by Robert Russo -
This may look complex, but it should be 2 or so orders of magnitude faster on large installations compared to using outer joins.
In reply to Robert Russo

Re: Side-effects of replacing get_my_courses()

by Robert Russo -
I must have been drinking when I wrote the above...as joins cannot happen below my happy where clauses.

The following query should grab all users from all contexts in all courses for all users.

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE con.contextlevel =50
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE con.contextlevel =40
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE con.contextlevel =10
)
ORDER BY sortuserid, sortcourseid

For a single user:

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.username = 'rrusso'
AND con.contextlevel =50
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.username = 'rrusso'
AND con.contextlevel =40
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.username = 'rrusso'
AND con.contextlevel =10
)
ORDER BY sortuserid, sortcourseid

If you are running this (like on the my page to speed up getting my coruses), you can modify the get_my_courses query like so:

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.id = $USER->id
AND con.contextlevel = COURSE_CONTEXT
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.id = $USER->id
AND con.contextlevel = COURSECAT_CONTEXT
)
UNION (

SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.id = $USER->id
AND con.contextlevel = SYSTEM_CONTEXT
)
ORDER BY sortuserid, sortcourseid



In reply to Robert Russo

Re: Side-effects of replacing get_my_courses()

by Elvedin Trnjanin -
Thanks for the information Tim, Sam, and Robert. For Tim's join query, I've tested it against my original query with adding "AND con.contextlevel = 50" and they're equivalent apart from the join query taking a few seconds longer. Robert's query (for single user) is equivalent as well, although it does perform a bit slower. Removing the UNIONs and putting the contextlevel check into one query improved performance of it by roughly 10%.

(
SELECT u.id, u.firstname, u.lastname, u.idnumber, c.id, c.fullname, c.shortname, c.idnumber, c.visible, ra.roleid, u.id AS sortuserid, c.id AS sortcourseid
FROM mdl_user u
JOIN mdl_role_assignments ra ON ra.userid = u.id
JOIN mdl_context con ON ra.contextid = con.id
JOIN mdl_course c ON con.instanceid = c.id
WHERE u.id = $userid
AND (con.contextlevel =50 or con.contextlevel =40 or con.contextlevel =30)
)
ORDER BY sortuserid, sortcourseid

Perhaps this is some limitation with the way the MySQL server I'm using is tuned or some obscure query optimization issue for large tables. For the longer (string length) queries, perhaps query optimization/compilation are culprits culprit as it may scale linearly for string length and is most noticable when using the same query over a large group of users.


In reply to Elvedin Trnjanin

Re: Side-effects of replacing get_my_courses()

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Don't use the OR part. Robert was evidently still drinking when he wrote his revised queries smile

(Um, no offence intended.)

You can't use the OR for different contextlevels, because in contextlevel 40, the context instanceid is not a course ID, it's a category ID. Similarly in level 10 it's not the course id (it is a fixed site id). So in order to do this query correctly, you do need the UNION with three different queries.

If the code you are writing is just for your own system and you know you always give students/the people you care about roles at the course level, stick to courses and level 50.

If it's supposed to work in general across other Moodle installations then you simply are going to need a more complex query - there is no clear way to achieve this in Moodle 1.9 as roles in a location might not necessarily indicate any belonging to that location, but something that starts with get_users_by_capability (and will therefore be complex and horribly slow) is probably the right approach. An alternative approach is to include selected roles and have a configuration interface that defines which roles they are [ie for the user to define 'this role counts as belonging to the course'], then you can use something like the above union and if somebody has a role at category level, you count them as belonging to all courses in that category and subcategory.

As for your MySQL performance issues, yeah I guess those are just MySQL sucking as usual. In particular, expressing something as an (inner) JOIN rather than by putting tables in separated by commas and a WHERE clause shouldn't take any longer as far as I can see... but whatever.

--sam
In reply to sam marshall

Re: Side-effects of replacing get_my_courses()

by Elvedin Trnjanin -
Thanks for the information Sam, it'll be very useful for our documentation as well as I couldn't find something like this on MoodleDocs. Is there and I'm just not finding it?
In reply to Elvedin Trnjanin

Re: Side-effects of replacing get_my_courses()

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
I don't know, I rarely look at documentation ;)

Since we're mainly talking about details of the database structure (things like what table instanceid applies to), these *should* be documented in detail not in moodledocs but in the xmldb file itself. I made a thingy for viewing xmldb doc files, which produces html that documents each table and field. So in other words, if the 'comment' fields were completed, you should be able - wait why am I saying this, I can just look at it.

Ho and then again hum - here's the documentation produced for the context table in 1.9 (paste below). Looks like somebody could do with improving the comments in that install.xml. smile Hey, there's ONE comment in the whole table (at the top), and that one comment doesn't make any sense... good job... ;)

--sam

context

one of these must be set

Field Type Description
id int (10)
contextlevel int (10)
instanceid int (10)
path char (255)
depth int (2)

Keys

Name Type Field(s) Reference
primary primary id

Indexes

Name Type Field(s)
contextlevel-instanceid Unique contextlevel, instanceid
instanceid Not unique instanceid
path Not unique path





In reply to sam marshall

Re: Side-effects of replacing get_my_courses()

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Just by the way in case anyone tries looking for it - the xmldb documentation display code is actually a 2.0 feature. But the 'comment' fields used to make that documentation have existed (well, mostly blank, it seems) since 1.8.
In reply to sam marshall

Re: Side-effects of replacing get_my_courses()

by Robert Russo -
I was the one who recommended using UNIONs, not ORs. Elvedin tried using the OR query, which as you stated, will not work like the union queries I specified.

smile
In reply to Robert Russo

Re: Side-effects of replacing get_my_courses()

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Look again at your UNIONs. As given (in the post Elvedin replied to with the OR query), they'll do exactly the same thing wrong as the OR. smile

For example notice this chunk:

AND con.contextlevel = CONTEXT_COURSECAT
JOIN mdl_course c ON con.instanceid = c.id

This is an indication of what's wrong. A correct query on categories would be doing 'JOIN mdl_course_categories cc ON con.instanceid = cc.id' or whatever.

--sam