Problems with cron after upgrading to 2.1.x -> 2.2.x

Problems with cron after upgrading to 2.1.x -> 2.2.x

by Tim Hunt -
Number of replies: 3
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Cron and accesslib.php were both changed substantially in Moodle 2.2.

One bit in particular has been causing us problems since we upgraded from 2.1.x this morning. In cron-lib, there are now calls to

  • context_helper::cleanup_instances(); // Line 149 of lib/cronlib.php
  • context_helper::build_all_paths(false); // Line 151 of lib/cronlib.php
  • context_helper::create_instances(); // Line 183 of lib/cronlib.php

At least one of thase calls was bringing down our server, but we are not 100% sure which. We commented out all three lines of code, and now running cron does not bring down the server, and we don't really want to experiment enabling them one at a time, and see which crashes our live systemwink

 

So, the first purpose of this message is to ask: has anyone else seen this?

 

To go more specifically into what the problem was, the symptom we actually saw is that the DB query in user_accesstime_log ($DB->update_record_raw('user', $last); line 1783) was getting stuck for a long time, waiting for a lock on the user table. Since this happens on every page in the require_login function, (with a limit of only doing it once per minute for each user) lots of requests would back-up until the server got stuck.

It was only when we notced the the server was crashing every 15 minutes that we worked out that it must be something in cron getting the lock.

We are on Postgres here. We don't yet entirely understand why those context_helper calls lock the user table for so long. If we run bits of the SQL in isolation, it executes quite quickly. Looking at the logs, there is a possiblity that it is not the queries themselves that cause the problems, but instead it may be that the queries change enough rows that it triggers an auto-vacuum that locks a table for a long time. Unfortunately the information in the DB logs is inconclusive at the moment.

 

Another more general observation about 2.2. Before our upgrade, page loads were averagins about 0.35 seconds in 2.1.x. Now in 2.2.x they are averaging 0.5 seconds. Again, it is too early, so we have no idea why. Does anyone else have good comparative page-load times before and after an upgrade to 2.2?

Average of ratings: -
In reply to Tim Hunt

Re: Problems with cron after upgrading to 2.1.x -> 2.2.x

by Stephen Overall -

Tim:

I may have experienced this also, but I'm not certain (I apologize for my lack of expertise and hard data here).

My small and very humble five-course Moodle 2.2.2+ installation (VPS/Linux/MySQL) experienced its first-ever server crashes during my live student labs last week (4/2 & 4/4) after I had upgraded to Moodle 2.2.2+ (build 20120329) from the previous week's 2.2.2+ build. Server performance seemed normal for the first 45 minutes, then the site became gradually unresponsive and finally inaccessible ("The server may be busy..." error message) for 40-50 minutes. Then the site would become available and responsive for another 40 minutes until it would become unresponsive and then inaccessible again for a similar period. A long afternoon ensued.

I had been running cron.php every 15 minutes which roughly corresponded to the server freezes and freeings, so I thought of your post and have commented out the three offending lines you detailed (thank you for sharing). I have not experienced any server problems since, but then my college is on our spring break just now and my server has also not experienced any comparable load since last week so I can't say if this has actually solved the server crash problem. Hopefully it has. Any further information on this issue?

(Unrelated BTW: Since upgrading to 2.2.2+ (build 20120329) from the previous week's build my previously fully functional OU Wiki module has begun throwing a DB write error when attempting to save page changes. I've posted the issue at https://github.com/moodleou/moodle-mod_ouwiki )

In reply to Tim Hunt

Re: Problems with cron after upgrading to 2.1.x -> 2.2.x

by Dan Poltawski -

Hi Tim,

I've just been looking around this area whilst investigating some problems we are seeing with performance on Moodle.org.

One thing that intrigues me about your post though is this:

It was only when we notced the the server was crashing every 15 minutes that we worked out that it must be something in cron getting the lock.

From the looks the file, those lines in question only get called by a 20% random - occasional clean-up tasks. So I wouldn't expect you to see it every 15 minutes, though of course I know its mathematically possible. I just wondered if you noticed that aspect of these tasks?

The fact that these tasks are actually randomly called might be part of the problem here both in detecting a problem and with the weight of 'clean up tasks' which are heavy weight and have not run for some time due to the randomness.

In reply to Dan Poltawski

Re: Problems with cron after upgrading to 2.1.x -> 2.2.x

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Yes, we spotted that contradictoin betweeb 'crash every 15 minutes' and 'suspect code only runs 20% of the time' later. We still don't entirely understand it.

Since we commented out those three lines of code, the servers have been running fine. Derek (Woolhead) is trying to work on a more long-term fix. If we get one, we plan to submit it for integration.

I am more interested by the 0.35s -> 0.5s increase in the average load times. I have no idea what is causing that.