Scalability Coding Issues

Scalability Coding Issues

by Mike Churchward -
Number of replies: 26
Picture of Core developers Picture of Plugin developers Picture of Testers
A conversation in the DMS discussion ( http://moodle.org/mod/forum/discuss.php?d=13320&parent=64245), was lamenting over scaling issues with the MyDMS code. Apparently there are elements of the coding structure with significant performance issues when the number of users gets very high.

To that end, maybe we should post some guidelines that address this issue. What things should/shouldn't be done in the code to make Moodle scalable to very large numbers of users. This would help all of us developers keep them in mind as we change/add code.

mike
Average of ratings: -
In reply to Mike Churchward

Re: Scalability Coding Issues

by David Scotson -

Scalabilty and performance are strange topics, the kind that can start near apocalyptic flame wars because there is such a wide range of strongly held opinions.

I have personally read in-depth, well argued articles advocating everything from learning assembly language so you will always know what your code is doing at the lowest level through to using the highest level language and abstractions possible and letting the Virtual Machine writers and Moore's Law take care of it.

But I think the one thing almost all of them agree on is that if you aren't measuring your app and finding the bottlenecks then your optimisations will have no impact at all on overall application speed (though they might be breaking your code in odd corner cases and/or making it harder to debug).

The book Advanced PHP Programming by George Schlossnagle has some good advice on how to benchmark your website and find exactly where you are needing a speed up:

In reply to David Scotson

Re: Scalability Coding Issues

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
Those are general programming concerns for sure, but I was more thinking of the things that affect Moodle as the number of users, posts, activities, etc., gets into the large numbers.

These things can be difficult to test, as you need the numbers you are concerned about to test them. It can be easy to fool yourself into believing your code is working well by benchmarking smaller situations.

Things I can think of off the bat, are:
- don't try to load an entire data table into an array without placing some limits on it. If you need to display something like a user table, make sure you use paged loads.
- if you have to pass large structure (arrays, objects), use 'pass by reference' arguments.

etc.

mike
In reply to Mike Churchward

Re: Scalability Coding Issues

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
OK, well let's have a bit of a jam on this ... here's a couple to start off ..

1) The user table and the course table are two in particular that can get very large, so be very careful to only get exactly what you need from them when you need it.

2) Cache any data that was expensive to produce.

3) Check database indexes carefully.

...
In reply to Martin Dougiamas

Re: Scalability Coding Issues

by John Papaioannou -
Right on the money, like Mike's mentioning of getting results paged.

Having read Schlossnaggle's excellent book (even though I found it more useful as food for thought than as a practical handbook) and some experience in algorithm theory (which is just fascinating wink) I 'd like to say that:

Profiling and high-speed-optimization is really not needed IMHO as long as the code is sensible, unless of course we 're up in the thousands of users range. For example, Moodle itself would do a lot better if suddenly someone put a gun to my head and said "make this run twice as fast NOW" but it's really very good as it is (un-optimized, let's say, but sensibly coded).

What can really cripple an application is a) too much access to "expensive" data (db and to a lesser degree filesystem operations) and b) really bad algorithm design. As long as you got these two in check, it's going to be decent at the least. Simple, yet effective.

The funny thing is that DMS has both of these flaws at this time. More specifically, the code responsible for the folder tree at the left just plain sucks. The reason: MyDMS's tree view was simply not up to standards (it didn't really give you any information, just a trail of breadcrumbs). I found the need to code a solution as versatile as any OS's file manager (collapsible folders etc) and indeed DMS can do that right now; it's just not implemented at a UI level. However, having to work on the "still-being-rewritten" framework (not being able to rewrite it from scratch) of MyDMS and time constraints do that to you sometimes.

Jon
In reply to Martin Dougiamas

Re: Scalability Coding Issues

by Samuli Karevaara -
Check database indexes carefully.

MySQL server status variable called "select_full_join" increments while using Moodle. According to MySQL manual, this should be zero.

This means that some queries in Moodle do a join (in some cases it seems that a non-join WHERE clause is enough too) on unindexed columns. One such query is in function forum_get_child_posts(). It has "...WHERE p.parent = $parent AND p.userid = u.id..."

forum_posts.parent and forum_posts.userid columns are not indexed. According to EXPLAIN, the returned rows drop (in my test query case only, obviously, with almost an empty database) from a total of 28 rows with just the primary key to total of three rows with indexes on those columns.

Can somebody test with a bigger database if indexing these columns gives any real speedups? Or have these (among with some other columns) been left unindexed on purpose?
In reply to Samuli Karevaara

Re: Scalability Coding Issues

by John Papaioannou -
Speaking of JOINs, I 've been wondering for some times why we aren't using outer (left and right) joins and instead do cross (or inner) joins.

Explanation: assume table A has 1000 records, 3 of which have id equal to some value X. Table B has 1000 records, 5 of which have idAlpha equal to the same value X. Now, in order to join these tables to get all relevant results according to some other criterion, MySQL would procude a candidate-recordset of one million records and reduce that to three (due to the WHERE clause) before returning it.

An outer join of the form

SELECT * FROM tablea LEFT JOIN tableb ON tablea.id = b.idAlpha WHERE tablea.somefield = somevalue

would produce a candidate recordset of 3 records before the WHERE clause kicks in.

Do I have my SQL facts wrong here or is something else going on?

For the record, I want to say that the best argument I can find for outer joins is that there is one application of mine floating on a web server somewhere that in order to produce a simple "Top 10" display executes a query some 4 lines long which joins 4 tables, containing something like 60, 200, 400, and several hundred thousand records respectively. That query uses WHERE, GROUP BY and ORDER BY and produces results in about 0.05 seconds. The same query written with cross joins took several seconds to return.

Jon
In reply to John Papaioannou

Re: Scalability Coding Issues

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
I have to admin, I get confused when dealing with the different type of 'JOIN's. Do you know of any good resources explaining them? I'd really like to approach them with my eyes a little more open.

mike
In reply to John Papaioannou

Re: Scalability Coding Issues

by Martín Langhoff -
Here is an example SQL optimization. We are just now deploying Moodle with 6K courses and 25K users. We lumped all the courses in one category, and performance was awful.

As it happens, get_courses() and get_courses_page() were using a query that under mySQL is not very efficient, and under Postgres is a disaster. As luck would have it, we use Postgres sad

There were several options to rewrite it. I settled for one that works better in both MySQL and Postgres, and did not require major changes in the code. It wasn't the best, but the best implied doing things (UNIONs over subSELECTS) that mySQL 3.x doesn't support (mysql 4 and postgres do).

So I did a minor reorg of the query to use an explicit LEFT JOIN rather than an apparent inner join that ends up acting like a left join. Both databases liked it a lot better, and using EXPLAIN showed much better query plans.

With mysql, the inefficient query took ~32 seconds, on Postgres it took several minutes. My replacement, while not ideal, brought that down to sub-second responses on both.

This will make it to CVS once it is tested more. In the meantime here are the two files I've patched:

http://lists.eduforge.org/cgi-bin/archzoom.cgi/arch-eduforge@catalyst.net.nz--2004/moodle--eduforge--1.3.3--patch-222/lib/datalib.php.diff?diff

http://lists.eduforge.org/cgi-bin/archzoom.cgi/arch-eduforge@catalyst.net.nz--2004/moodle--eduforge--1.3.3--patch-222/course/category.php.diff?diff

One major issue we have to deal with are queries requesting "all fields" (for instance c.*). These are specially wasteful -- most of the time you need a few fields only. And for queries that do SELECT DISTINCT c.* it just kills the DB... why would it have to 'distinct' (which emplies sorting over _every_ fields) over fields like 'modinfo'?

Anyway, we are going through the paces, seeing some queries hit our DB server hard, and reworking them. Fun ;)

In reply to Martín Langhoff

Re: Scalability Coding Issues

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Maybe you didn't notice because of your separate Arch repository wink, but those functions in lib/datalib.php (MOODLE_14_STABLE) were already re-written a few weeks ago to remove the JOINs completely (using a bit of PHP instead). Your solution is nicer though, being all in the database - but it would be interesting to compare them speedwise.

That particular call to get_courses_page from course/catagory.php that uses c.* is much improved by by listing explicit fields there, thanks!  I've put that in CVS.
In reply to Martin Dougiamas

Re: Scalability Coding Issues

by Martín Langhoff -
Yay! You're moving fast!

You are right, in the rush, we weren't looking at the CVS repo. Things I overlook when the pressure is on.

The rewritten functions in datalib.php are indeed working *way* faster than those in 1.4.1, at least with mySQL. My version is still faster smile

The source of the problem was a poorly planned JOIN, rather than mySQL not knowing how to handle it. So I guess I'll try and make a bit of a case for more JOINs in SQL rather than more loops in PHP.

The more work we push onto the DB's back, the better off we are when dealing with large datasets. The DB (Postgres or MySQL) will be much more efficient ORDERing, JOINing, doing UNIONs, etc when we have millions of rows.

A reasonably well designed JOIN will get the database cracking, and give back to PHP exactly the records needed, in the required format.

Even if it is tricky to find the "right" way to write the JOIN, once you've got it, you are sorted. And you completely avoid the issue of selecting a gazillion rows into memory ;)

And even if you don't have a large dataset, in any scenario where the are resource constraints (memory or cpu) the DB backend is an order of magnitude more efficient than PHP. In low-mem conditions, a couple thousand rows selected into PHP can sour your day.

By relying on the DB, Moodle becomes more efficient in both these constrained conditions (low mem, slow cpu) as well as when scaling 'out' (splitting the DB from Apache) with big hardware.

There's no way we're going to get value out of a dual Opteron database server if all we ask it to do is 'get me all those records'. Once you have that kind of setup, the DB server better serve the data sliced, diced, and in the right order ;)

By the way, I'm surprised there's so much activity on MOODLE_14_STABLE! Is 1.4.2 coming?
In reply to Martín Langhoff

Re: Scalability Coding Issues

by Eloy Lafuente (stronk7) -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Nice !!

Really having SELECT DISTINCT * statements from only ONE table hasn't much sense, mainly because EVERY record in every table is unique as all tables should have the id-autonumbered field. Anyway it seems that there isn't too much of this in Moodle.

Different is the case of (not needed) SELECT * statements. Getting all the fields is a common practice in some areas of Moodle. And we shouldn't do it. Differences are really important in terms of speed !!

About using BOOLEAN queries and SUBSELECTS, it's a problem to mantain MySQL 3.x compatibility, BOOLEAN are supported from 4.0 and SUBSELECTS only from 4.1 (if I'm not wrong). Some day, DB requirements should change, but not now...

Some of that can be emulated using LEFT JOINs but that isn't the solution always.

About indexes I think that we could examine every table in Moodle. At least every FOREIGN KEY field/s should be indexed. Plus UNIQUE keys in n:m tables. I'm not really sure about how is this currently in Moodle.

Anyway, all your work, Martin L, is amazing and, from my point of view, a MUST to check.

Ciao smile
In reply to Eloy Lafuente (stronk7)

Re: Scalability Coding Issues

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
We need to be careful here to consider the code as well. I've run into a lot of situations where I need to go back to the database to pull more fields from a record that would have already been retrieved once, because the fields I needed weren't retrieved. I have to believe that its more effecient to get the fields the first time then going back for them again.

This where it would be good to have a consistent object model. Right now we do it with globals and session variables.

mike
In reply to Samuli Karevaara

Re: Scalability Coding Issues

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
I actually asked this earlier (and still don't know the answer), but what reasons are there for not indexing frequently used fields? I know there is a small disk space penalty, but is that the only reason?

mike
In reply to Mike Churchward

Re: Scalability Coding Issues

by Jason Cole -
I agree that we probably need more indexing, but I just found this in the MySQL manual

Create only the indexes that you really need. Indexes are good for retrieval but bad when you need to store things fast. If you mostly access a table by searching on a combination of columns, make an index on them. The first index part should be the most used column. If you are always using many columns, you should use the column with more duplicates first to get better compression of the index.
In reply to Mike Churchward

Re: Scalability Coding Issues

by Jason Cole -
Ok.. We need to look at our joins and keys.. As part of another project, I've been looking at tuning MySQL. I just ran a show status on my Moodle MySQL server and found some really interesting (and kinda scary) stats

The MySQL server has been up 27 days... We've got about 300 users on 5 active courses right now... a relatively small load.

Since the server was restarted there have been 3 million queries on 2216 threads.

The three statistics that indicate we need to do better with joins and keys are

| Handler_read_rnd         | 2629333    |
| Handler_read_rnd_next    | 107280278
| Select_full_join         | 14769      |

ALL of these should be very small numbers (select_full_join should be 0)

Handler_read_rnd
The number of requests to read a row based on a fixed position. This will be high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan whole tables or you have joins that don't use keys properly.
Handler_read_rnd_next
The number of requests to read the next row in the data file. This will be high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
Select_full_join
The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables. This variable was added in MySQL 3.23.25.
So, now I need to go through the code, find all the joins and develop indexes. Fun, fun, fun...
In reply to Jason Cole

Re: Scalability Coding Issues

by Michael Penney -

Hi Jason, do you have any way of relating these numbers to system resources?

eg this level of mysql usage ~= x amount of system resources on y system type?

We have a few more users and courses here, and our sys admin reports we haven't used more than ~%5 system resources peak (& the moodle shares the server with our campus and student news portals, which are both separate postnuke sites, as well as a large scout database), which would suggest we have plenty of room to grow.

But what does that really mean when relating to growth, eg what amount of the system's hardware resources are currently being occupied by the MySQL numbers you are reporting?

In reply to Michael Penney

Re: Scalability Coding Issues

by Jason Cole -
Hi Michael,

I don't have the data that would relate usage to system resources. My system is overpowered for the load right now, so it's hard to get accurate stats. What we should do is have a test box run binary logging so we can see how long the queries take.

Even if the current configs and loads don't tax the server, these queries are taking longer than they should. So proper indexing would make the system seem snappier, even if the overall load didn' t improve much. A slow query on an underutilized box is still a slow query.. smile

I also haven't tuned MySQL server settings as I'm just learning about much of this... but one indication that the system could be faster is the table_locks_waited = 33. That is a relatively small number, but it's not zero.

Everything else seems to be working fine. But I'd be nervous once we got up to a few thousand users.

Jason
In reply to Jason Cole

Re: Scalability Coding Issues - Indexes

by Ray Kingdon -
I found it informative to look at the ratio between the cpu usage of apache and mysql when looking at indexes. You can only do this when there is lots of data in a course and the page is taking some time to create. In my case I was looking at the Administration page of the Workshop module which with about 500 submissions was taking about a minute to create. Looking at the cpu usage during the page creation, mysql was taking 80% and apache 20%. Adding two indexes switched the cpu usage - apache 80% and mysql 20% and cut the page creation down to about 10 seconds or so. A success for indexes then. However, adding more indexes (even ones which on paper you would think would help) did not effect that 80%/20% ratio and had no significant effect on the page creation time. (I've also found a similar situation in the Exercise module.)

The moral then IMO is indexes are important but use them sparingly and only when you can demonstrate a performance improvement.

In reply to Ray Kingdon

Re: Scalability Coding Issues - Indexes

by Jason Cole -
I'm not sure which is more impressive, the fact that you cut page creation time by 80% or that you have a workshop with 500 submissions!

You are absolutely right about some indexes being necessary and others not.
It all depends on the SQL search scheme...

Perhaps we should start creating a list of indexes that result in perfomance gains like the ones you describe?
In reply to Jason Cole

Re: Scalability Coding Issues - Indexes

by Martín Langhoff -
Jason said:
"Perhaps we should start creating a list of indexes that result in perfomance gains like the ones you describe?"

Why a list? Post a patch that adds the indexes ;)
In reply to Martín Langhoff

Re: Scalability Coding Issues - Indexes

by Ray Kingdon -
Eh? The indexes that Moodle uses currently (and there's quite a few) are "listed" in the mod/<module>/db/mysql.sql files.
In reply to Ray Kingdon

Re: Scalability Coding Issues - Indexes

by Martín Langhoff -
Ray,

that is exactly what I was referring to. We are working in adding/refining the existing indexes for mysql and postgres.

I guess my answer to Jason's suggestion was not really /that/ polite. I should have pointed to the database schema files.

Bottom line is: if you find that adding indexes (or changing SQL in the Moodle code) improves performance demonstrably, then submit a patch, or if you have CVS access, commit the patch.
In reply to Samuli Karevaara

Re: Scalability Coding Issues

by Martín Langhoff -
A couple of points I would like to comment upon, given that (as part of the NZVLE/Eduforge project) we are doing a lot of work on database indexes, the structure of JOINs, etc. related to Moodle's scalability.

For those wondering about outer/inner JOINs, they will give you a _different_ result. If you can come up with several queries that give you the exact same logical result, yes, you should check which one is faster, and probably use that one.

With regards to the forums tables, yes, they are missing a few indexes. They are not the only ones ;)

With Martin Dougiamas' help, he had a look at a 'slow and strange queries' report from the database that runs behind Moodle.org, and based on that info we have added a few indexes to forums(in CVS now, committed by "mjollnir_" which is Penny's handle).

We are also adding other indexes and fixing a few database operations that either thrash the database or consume too much memory when dealing with many users. We have 25K users and 6K courses -- some test runs we have done had 87K users and 12K courses loaded.

One of the aspects we are looking into is replacing get_courses() and get_records() with something that returns a wrapped database handle, rather than a huge multidimensional array. Without this, Moodle's scalability is limited by PHP's memory_limit setting, and eventually available RAM.

On a different front, we have started to add conditional begin/commit instructions that improve scalability of large INSERT/UPDATE/DELETE operations on Postgres (and potentially INNODB). And we are looking at nested loops in PHP that should be replaced with JOINs.

Our 'bleeding edge' work is done against 1.4.x (and publicly available for those bold enough to use our Arch repo), and we are porting it to 1.5, so you'll be able to use it as part of Moodle official release.
In reply to Martín Langhoff

Re: Scalability Coding Issues

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Hmm, I can't see those forum fixes in CVS yet (but would love to!) ... can you or Penny please check them in?
In reply to Martin Dougiamas

Re: Scalability Coding Issues

by Martín Langhoff -
Should we commit those into MOODLE_14_STABLE? How frozen/active is that branch?

<offtopic>
We are still wrapping up things with Open Polytechnic's deployment. The plan is go through our list of Arch commits, bringing the good ones into CVS HEAD (some we'll discuss before committing). Timeline is later this week or early next week.

Anything you want for a 1.4.2 release, let me know, and we'll merge in. A few things (LDAP enrolments) need to mature a bit. Wouldn't put them in a stable branch yet.
</offtopic>
In reply to Martín Langhoff

Re: Scalability Coding Issues

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Small, safe improvements and bug fixes (including speed boosts) are always welcome in the current STABLE branch.  Moodle 1.4.2 isn't far off.

LDAP enrolments etc are fine in 1.5, thanks.