Moodle 2.4 performance

Moodle 2.4 performance

Tim Hunt -
回帖数:10
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像

The Open University is upgrading to Moodle 2.4 in June, and so we have been doing some load-testing.

Bad news: initially performance sucked.

The biggest problem was storing the MUC data on a shared network drive. Once we switched to the memecache back end, things got a lot better, but still slower than 2.3.

So then we started profiling and fixing performance bugs, and you can see the results in MDL-39443.

Because it would be good if more people did this, I wrote a blog post about how we have been doing it: http://tjhunt.blogspot.co.uk/2013/05/performance-testing-moodle.html.

平均分:Useful (4)
回复Tim Hunt

Re: Moodle 2.4 performance

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像

A further update on this. We are still having problems in performance testing, but what we are seeing is weird.

It is not that the Moodle code is slower than it was before. The real problme is that on about 5% of page-loads, we get a really slow time like 5 seconds (when the average is under one second). (5% really slow pages drags up the average page-load time.)

We were able to catch a couple of those slow pages in profiling runs, and on these slow pages, all the extra time seems to be taken up by memcache. The worst example we had was 18s for a course view.php page, and 15s of that was Memcache::get. Normally Memcache::get is around 0.1s.

So, something really weird is going on. Has anyone else seen Memecache behave like that? Is there a fix?

回复Tim Hunt

Re: Moodle 2.4 performance

J S -

Hi

We have experienced slowness with memcached and versions of moodle prior to 2.4 .  Can you provide versions of memcache, web frontend, and php you are running?  We've been using memcached for session caching and found that at times the "session_stub::__construct" takes an unusually large amount of time.  We've been thinking this is a php issue, but not sure exactly.

回复J S

Re: Moodle 2.4 performance

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像

I have been doing some reading today, and the memcached docs have good advice about debugging performance problems: http://code.google.com/p/memcached/wiki/Timeouts.

The following seem to be the common reasons why memcache performance goes to the dogs

  • memcache server running out of ram, and starting to swap / use virtual memory. This is really bad. Make sure memcache is configured to only use the ram that is available.
  • memcache server running out of available connections. This causes any subsequent connecitions to time-out, which is slow. If this is happening, your code is probably doing something wrong.
  • there is a problem on your network, which is slowing things down between your web server and your memecache server.
  • you are tying to store values bigger than the 1MB limit. Don't do that.
  • creating too many memcache objects on the client side (actually, calling add_server too often) is causing memory leaks.

Sorry, don't know what version of memcache we are running. Looking at their release notes, it seems like a very stable product, and so that does not matter.

回复Tim Hunt

Re: Moodle 2.4 performance

Matteo Scaramuccia -

Hi Tim,
my experience with Moodle 2.4 (and above) and memcached is limited right now to my home lab: I've got the impression  - running the dev system on CentOS - that the memcached extension seems to perform better than the memcache one, both compiled from source because of the pre-compiled binaries I found didn't perform well.
Unfortunately I missed the opportunity to keep on performing comparisons&tests to confirm the first impressions and then search for issues under load, given also the two different API but quite similar MUC cache store implementations.

Which extension is installed in your environment?
I'll keep looking at this thread, which promises to be actually interesting for Moodle performances (both Sessions&MUC): keep up the good work 微笑 !

Matteo

回复Matteo Scaramuccia

Re: Moodle 2.4 performance

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像

We are using memecache, not memcached. I don't know if we have tested both, but it seems fast enough now.

It seems like the problem may have been we were creating too many connections to the memcache servers for each page load. (Slight oversight in some OU-specific code.)

Our other problem was then that our load-testing scripts keep posting to the same dozen forum threads, and because we kept posting to the same forum threads, then ended up with over 1000 posts. Strangely, displaying a thread containing 1000+ posts is a bit slow. Who would have thought it? Once we cleaned out the forum and ran the tests again, we now have 2.4 running as fast as 2.3. Yay!

Good team effort by Rod, Derek, sam and me. Now we just have to hope that our testing environment is a good simulation of what will happen when we go live with this in 2 weeks time. It should be.

回复Tim Hunt

Re: Moodle 2.4 performance

Michael Haskell -

Tim, we are planning on upgrading to either 2.4 or 2.5 this summer. In your experience was it completely necessary to install and utilize memchache?

Couple of questions I'd really love to here about:

1. Is your memcache node a separate server... specs, single node?

2. Are you running apache or nginx with php-fpm?

3. Do you have a general sense of 2.5 readiness?

It sounds like you're probably live already, but If you are interested I have a jmeter script to simulate quiz taking which we used when we were thinking about upgrading to 2.4 earlier this year. It can simulate answering most of the quiz types out there and I'd like to get it out to the community if others are interested.

I didn't see any improvement for quizzes at that time, but maybe with memcache there would be gains.

 

 

回复Tim Hunt

Re: Moodle 2.4 performance

Matt Gibson -

We'v done a bit of testing of 2.4 and found that under heavy load, the DB bottlenecks, seemingly based on disk IO for a few slow queries which are written to disk-based temp tables.

Our load test script (jMeter) is as follows:

  • Get a list of a user's courses via webservice
  • Login to first course page
  • Wait 4 seconds
  • View my Moodle page
  • Wait 5 seconds
  • View a different course
  • Wait 10 seconds

This was done on a loop with 400 users and a 60 second ramp up.

The pattern we saw was a major spike in slow queries for the first minute, which then dropped off, with generally slow performance (~7-12 second page loads). This is on a very large pair of servers (front end: 16CPU, 31GB, DB: 8CPU, 16GB) on gigabit ethernet, with nothing else running. Despite turning quite a lot of the MySQL buffers up to 11 (notably tmp_table_size and max_heap_table_size, which don't make an impact even at 800M), I can't get the tables to stay in memory, and we have ~10% of temp tables written to disk on every run. The front end is not being stretched too much and the DB server is at 50% RAM and 20% CPU, suggesting it's disk IO holding it back. I have used mysqltuner.pl to try to adjust things and now the only thing it complains about join_buffer_size. There are several thousand joins without indexes causing this warning, but it seems we can't do much about these (it's the ones in the capabilities code that do CONCAT() on the context paths). Even with join_buffer_size at 128M, it still complains.

Does this look normal, and what can we do about it? Does the OU see a similar number of slow queries? Can we just throw RAM at it and turn the buffers up even higher? It feels like we've done this already and I'm not sure exactly how to tell how much RAM it would ultimately need or how big to set the buffers.

I should add that this is for a large DB with 125,000 users and 30,000 courses.

Here's an example of one of the slow queries, that takes 1.7 secs normally and 46 secs under load:

(SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                     FROM mdl_role_capabilities rc
                     JOIN mdl_context ctx
                          ON (ctx.id = rc.contextid)
                     JOIN mdl_context pctx
                          ON (pctx.id = '1'
                              AND (ctx.id = pctx.id
                                   OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                   OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                LEFT JOIN mdl_block_instances bi
                          ON (ctx.contextlevel = 80 AND bi.id = ctx.instanceid)
                LEFT JOIN mdl_context bpctx
                          ON (bpctx.id = bi.parentcontextid)
                    WHERE rc.roleid = '7'
                          AND (ctx.contextlevel <= 50 OR bpctx.contextlevel < 50)
                   )
UNION
(SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                     FROM mdl_role_capabilities rc
                     JOIN mdl_context ctx
                          ON (ctx.id = rc.contextid)
                     JOIN mdl_context pctx
                          ON (pctx.id = '357'
                              AND (ctx.id = pctx.id
                                   OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                   OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                LEFT JOIN mdl_block_instances bi
                          ON (ctx.contextlevel = 80 AND bi.id = ctx.instanceid)
                LEFT JOIN mdl_context bpctx
                          ON (bpctx.id = bi.parentcontextid)
                    WHERE rc.roleid = '5'
                          AND (ctx.contextlevel <= 50 OR bpctx.contextlevel < 50)
                   )
UNION
(SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                     FROM mdl_role_capabilities rc
                     JOIN mdl_context ctx
                          ON (ctx.id = rc.contextid)
                     JOIN mdl_context pctx
                          ON (pctx.id IN ('35653','58299','71817','71818','71819','71820','71821','71822','71823','71824','71825','71826','71827','71828','71830','71831','71832','71833','71834','71835','71836','71837','71838','71839','71840','71841','71842','71843','71844','71845','71846','71847','71848','71849','71851','71852','71853','71854','71855','71856','71857','71858','71859','71860','71861','71862','71863','71864','71865','71866','71867','71868','71869','71870','71871','71872','71873','71874','71875','71876','71877','71966','71967','71969','71974','71977','71995','71999','72002','72008','72009','72013','72014','72015','72050','163422','377743','377745','377747','377748','377750','377751','377753','377755','377757','377758','377759','377760','377761','377762','377763','377764','377765','377766','377767','377768','377769','377770','377771','377772','377773','377774','377775','377776','377777','377778','377779','377780','377781','377782','377783','377784','377785','377786','377787','377789','377790','377791','377792','377793','377794','377795','377796','377797','377798','377799','377885','377886','377888','377893','377912','377914','377923','377924','377928','377929','377930','525948')
                              AND (ctx.id = pctx.id
                                   OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                   OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                LEFT JOIN mdl_block_instances bi
                          ON (ctx.contextlevel = 80 AND bi.id = ctx.instanceid)
                LEFT JOIN mdl_context bpctx
                          ON (bpctx.id = bi.parentcontextid)
                    WHERE rc.roleid = '10'
                          AND (ctx.contextlevel <= 50 OR bpctx.contextlevel < 50)
                   )
UNION
(SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                     FROM mdl_role_capabilities rc
                     JOIN mdl_context ctx
                          ON (ctx.id = rc.contextid)
                     JOIN mdl_context pctx
                          ON (pctx.id IN ('35653','71285','71817','71818','71819','71820','71821','71822','71823','71824','71825','71826','71827','71828','71829','71830','71831','71832','71833','71834','71835','71836','71837','71838','71839','71840','71841','71842','71843','71844','71845','71846','71847','71848','71849','71851','71852','71853','71854','71855','71856','71857','71858','71859','71860','71861','71862','71863','71864','71865','71866','71867','71868','71869','71870','71871','71872','71873','71874','71875','71876','71877','71966','71967','71969','71974','71977','71995','71999','72002','72008','72009','72013','72014','72015','72050','163422','377743','377745','377747','377748','377749','377750','377751','377753','377754','377755','377757','377758','377759','377760','377761','377762','377763','377764','377765','377766','377767','377768','377769','377770','377771','377772','377773','377774','377775','377776','377777','377778','377779','377780','377781','377782','377783','377784','377785','377786','377787','377789','377790','377791','377792','377793','377794','377795','377796','377797','377798','377799','377885','377886','377888','377893','377912','377914','377918','377923','377924','377928','377929','377930','525948')
                              AND (ctx.id = pctx.id
                                   OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                   OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                LEFT JOIN mdl_block_instances bi
                          ON (ctx.contextlevel = 80 AND bi.id = ctx.instanceid)
                LEFT JOIN mdl_context bpctx
                          ON (bpctx.id = bi.parentcontextid)
                    WHERE rc.roleid = '15'
                          AND (ctx.contextlevel <= 50 OR bpctx.contextlevel < 50)
                   )ORDER BY capability

 

回复Matt Gibson

Re: Moodle 2.4 performance

Petr Skoda -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
The query above is usually executed only during login, the results are cached in user session. It should not cause any major problems unless your students are logging in repeatedly all the time.

However it is also used during each webservice request without any caching if I remember it correctly.

Please make sure to test performance without any webservice calls.