backups taking a LONG time and killing the system resources.

backups taking a LONG time and killing the system resources.

by Adam Lambert -
Number of replies: 7
Ok, I am brand new at this, and an inheritor of all that came before me at this job, so bear with me a little please.

Moodle backups appear to be going into an infinite loop and thrashing the system almost to death, and running pretty much until I get tired of watching the system thrash and kill them before they complete.

cron.php is being called thusly every 5 minutes:
curl -s -o /dev/null http://www.myschool.com/school/admin/cron.php

Most of the time this goes off without a hitch, but when what I think is the backup module is triggered, I get two hung cronjobs and two apache processes that are just thrashing the box to death as I mentioned earlier.

# strace -p $runaway_apache_PID gets me thousands of pages of lstat64(); working it's way up my moodle directory tree to backuplib.php files and occasionally open()'ing one of them. There are only 16 such backuplib.php on the system - so this is an infinite loop of some kind - is my assumption.

Further indication of an infinite loop, is that this goes on for days (yes, days) if I don't stop it.

A little grep'ing around gets me a bunch of places where backuplib.php is include_once()'d or require_once()'d. Am I to assume that one of these is recursively calling itself?

My moodledata directory is big - about 2 gigs currently - but as this looks like an infinite loop, I'm fairly sure that has nothing to do with what I'm seeing here.

Anyone who's willing to help, I'm more than happy to provide any debugging info that may help with this. Just speak in small words, and pretend you're talking to a guy who knows nothing about Moodle, but tons about sysadmin stuff. smile

Thanks all.


Average of ratings: -
In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Adam Lambert -
Hrmmm... more interesting stuff to note.

I verified with the development guys that nobody has changed the backup*.php files, and what version of Moodle and PHP we're on (Moodle 1.7, PHP 5.1.6). The dev guys said we have made efforts to not change anything in Moodle other than cosmetic/HTML stuff, so that future upgrades wouldn't be too painful - so this is a fairly stock install.

I may have to retract my thoughts on it being an infinite loop. Some further use of strace and a few grep -v statements, gets me the debug output you'll see below (and to spare you having to read all of it to see what I find interesting: note that the ID changes each time - so we're not looking at an infinite loop apparently; as the data it's working on changes with each run through the loop).

A bit more grep, and I see that the loop we're in is the backup_log_info() function in backuplib.php. Apparently looping from either line 1286 or line 1293. While the code below may not indicate it (because I've grep'ed out the stuff that's reloading the various backuplib.php files), it is having to reload the backuplib.php file for EACH LINE of the below output which is going to the moodle.xml file that's being generated...

Here's the debug output, just for refernce..

write(60, " <ID>625061</ID>\n", 24) = 24
write(60, " <TIME>1170623267</TIME>\n", 32) = 32
write(60, " <USERID>439</USERID>\n", 29) = 29
write(60, " <IP>x.x.x.127</IP>\n", 31) = 31
write(60, " <MODULE>quiz</MODULE>\n", 30) = 30
write(60, " <CMID>2353</CMID>\n", 26) = 26
write(60, " <ACTION>close attempt</A"..., 39) = 39
write(60, " <URL>review.php?attempt="..., 43) = 43
write(60, " <INFO>1010</INFO>\n", 26) = 26
write(60, " </LOG>\n", 13) = 13
write(60, " <LOG>\n", 12) = 12
write(60, " <ID>625062</ID>\n", 24) = 24
write(60, " <TIME>1170623273</TIME>\n", 32) = 32
write(60, " <USERID>439</USERID>\n", 29) = 29
write(60, " <IP>x.x.x.127</IP>\n", 31) = 31
write(60, " <MODULE>quiz</MODULE>\n", 30) = 30
write(60, " <CMID>2353</CMID>\n", 26) = 26
write(60, " <ACTION>review</ACTION>\n", 32) = 32
write(60, " <URL>review.php?id=2353&"..., 55) = 55
write(60, " <INFO>1010</INFO>\n", 26) = 26
write(60, " </LOG>\n", 13) = 13
write(60, " <LOG>\n", 12) = 12
write(60, " <ID>625083</ID>\n", 24) = 24
write(60, " <TIME>1170623665</TIME>\n", 32) = 32
write(60, " <USERID>439</USERID>\n", 29) = 29
write(60, " <IP>x.x.x.127</IP>\n", 31) = 31
write(60, " <MODULE>quiz</MODULE>\n", 30) = 30
write(60, " <CMID>2353</CMID>\n", 26) = 26
write(60, " <ACTION>view</ACTION>\n", 30) = 30
write(60, " <URL>view.php?id=2353</U"..., 36) = 36
write(60, " <INFO>1010</INFO>\n", 26) = 26
write(60, " </LOG>\n", 13) = 13
write(60, " <LOG>\n", 12) = 12


Again, all help welcome and appreciated.

In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Adam Lambert -
Sorry to keep replying to myself, but my time to edit keeps running out. smile

In ./school/backup/backuplib.php line 1750, function backup_encode_absolute_links

follow on down to line 1774, and see if we don't see the code that's causing the issue here.

Best as I can tell, "include_once" is causing the file not to be reloaded every time, but PHP is going so far as to lstat64() the file (and every directory along the way in the path to the file - each as a separate call to lstat64()), and the file is being opened before it is being not loaded because of the include_once use.

I also note this from php.net - code was added to normalize the path to work-around Windoze: "This behaviour changed in PHP 5 - the path is normalized first so that C:\PROGRA~1\A.php is realized the same as C:\Program Files\a.php and the file is required just once."

I am strongly suspicious that this 'path normalizing' code in PHP5.1.x is the cause of the exorbiant amount of lstat64() calls (one for each directory in the path to backuplib.php - which in my case is quite a few). Each call to lstat64 causes disk I/O, and doing a gazillion disk I/O calls in a short period of time causes exorbiant system overhead to be generated.

So, guys (and gals) - now I have a more serious developer question: Me not knowing anything of significance about what is being done here - can someone point me to a good way to not have to do this include_once() call a few gazillion times during every backup run? And perhaps such changes might want to become part of Moodle proper - as I can't imagine I'm the only one running into this.

Thanks,


In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Rich Schumaker -
Adam,

There appears to be several issues with the backup system. We have not yet found a solution for our large course issues. Ours courses that have a large number of students and large amount of content won't back up either.

You may want to report your issue to http://tracker.moodle.org/secure/Dashboard.jspa.

Good luck - we haven't had much luck on this issue.
In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Martín Langhoff -
How large is your install? Moodle backups are not particularly scalable -- a solution you can use instead is to backup your DB (mysqldump or pg_dump) and your moodledata directories to tape or whatever you use for backup media.

When you get a request to restore just one course - run the restore to a temporary Moodle installation, and then you can let the teacher export the Moodle backup zipfile for just that course, and restore it onto the "production" Moodle system. Much more reliable and complete. And if you need a full restore, you have it.

Another good thing to do is to avoid using curl and hitting the cron script directly from the commandline PHP binary.

Hope that helps.
In reply to Martín Langhoff

Re: backups taking a LONG time and killing the system resources.

by Adam Lambert -
Thanks for the replies!

My Moodle install is probably pretty big - about 2 gigs in the moodledata directory, and about 200 megs of database.

I did go over and report it on the bug tracker website - maybe some good will come of it.

And if everyone else is having this problem, I may have found our solution (or at least got down to what needs to be changed in the code to fix the issue).


And a side note, in the meantime I always have rsync for 'real backups'. smile



In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Adam Lambert -
Work-around available - see here: http://tracker.moodle.org/browse/MDL-8700

Thanks,

In reply to Adam Lambert

Re: backups taking a LONG time and killing the system resources.

by Adam Lambert -
The fix brought my system load down during backup down into relatively sane territory (1.5 average - not bad on a quad processor box), and the time to complete a backup went from days to about 2 hours.

I still think there's lots of room for improvement, but this looks acceptable.

Thanks all.