Manually clearing automated backup status of RUNNING

Manually clearing automated backup status of RUNNING

by Bobby Siegfried -
Number of replies: 12

Ran into a strange one today. The cron run that was currently running automated backup job was accidentally terminated (15). Unfortunately, when trying to kick off the automated backup task again to pick up where it left off is failing with the following:

Execute scheduled task: Automated backups (core\task\automated_backup_task)
Checking automated backup status...RUNNING
automated backup are already running. Execution delayed
... used 2 dbqueries
... used 1.6667408943176 seconds
Scheduled task complete: Automated backups (core\task\automated_backup_task)

I ran the backup cleanup task to see if that would work, I also cleared caches and ran an strace on the scheduled_task command. No luck. I'm curious if anyone else has run into this and/or knows how to clear the running status so it will actually run again.

Thanks in advance for any insight!

Average of ratings: -
In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

What does the backup logs look like?

/report/backups/index.php

Had a server that went south due to a 130+ Gig course that would never complete and that caused havoc with db tables related ... report above. Make sure the next backup date makes sense!

I ended up having to truncate all tables related to autobackups ... so it was like the server had never run autobackups to get it working again.   I also set a designated directory where backups were to be saved so I could actually see and count them! :|

You might install course sizes plugin ...

https://moodle.org/plugins/report_coursesize

to see if you've got some biggies! :|

'SoS', Ken


In reply to Ken Task

Re: Manually clearing automated backup status of RUNNING

by Bobby Siegfried -

Hi Ken,

Thanks for your response! Overnight, Moodle seems to have cleared the running status itself - however, the task is now failing when trying to pick up where it left off - and I *think* I have a theory... maybe?

So yesterday, when the task was running normally and interrupted (we think it was killed gracefully by accident as the last line in the cron log was "Terminated: 15". However, that did not clear the running status. After spending some time trying to research this, my understanding is that there is a status value set somewhere in the Moodle db, potentially for the individual course(s) that is(are) currently being backed up that results in the "running" status response. I'm not sure of this as of this morning since the status did clear overnight and now the task is failing when it attempts to backup the first course where I'm guessing it left off:

Skipping COMM-001-010-FL18-Media and Society (Not scheduled for backup until Sat, 29 Feb 2020 03:00:00 -0500) Backing up COMM-001-011-FL18-Media and Society... ... used 10324 dbqueries ... used 5.2991099357605 seconds Scheduled task failed: Automated backups (core\task\automated_backup_task),error/setting_already_added

Skipped courses appear first (there are a bunch) and then on the first actual course backup attempt I receive the above error. I'm wondering if there is a status value in the db somewhere or a temp file I should delete to resolve this error. Unfortunately, we have just over 18,000 courses in the system, so deleting all temp files and forcing a new backup of all courses is truly a very last resort (takes about 72 hours to complete a backup of all courses).

Thanks for the plugin recommendation, I will check that out. Most of my work is done from the command line because of how large our site is - I can't even load the course backup log from the web GUI because of how many courses we have.

Thanks so much again for your response.

In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

Sounds like site is in dire need of cleanup.
Skipped courses could be no change ... depending upon settings of auto.
OR course hasn't been used or even invisible.

And because you can't call up that log via GUI, sounds like
you are in need of some cli scripts to see what's up.

I put mine in code/admin/cli/ and they are bash shell root access only.
one called 'getbackups' shared here below

mysql -u $mddbuser -p$mddbpass -e "use $mddbname;select id,filename,filearea,timecreated,contenthash from mdl_files where filename like '%.mbz';" > backups.txt;cat backups.txt
wc -l backups.txt

I sometimes (when troubleshooting) copy the .txt files out to a web accessible directory
restricted to IP address for viewing with browser.

In your moodledata/temp/backup/ successful backups leave a 0 byte .log file.
Any failed/probleamtic backups leave a larger than 0 byte .log file.
Find the largest non-zero .log file and cat it/grep it for clues.

Any contenthash named directories not active ... ie something being written to them ... are junk.  Worth checking what's in them for sometimes the fail is at the very end where the backup.mbz file has already been created and is a valid backup but system couldn't **copy** it to destination directory.  (which is why I use designated directories for auto).

Think you also need to install moosh - command line only tool.

It has some commands you might be able to use in relation to issue.

Like moosh -n course-info CID#

renders in text something like:

Course ID: 2
No of contexts: 4
Context by level:
    CONTEXT_MODULE (70):    4
Context by module:
    page (15):    2
    resource (17):    1
    url (20):    1
Number of role capability overwrites: 0
Role capability overwrites by context:
Number of local role assignments: 0
Locally assigned roles by context:
Number of enrolled users: 0
Users enrolled by role:
Number of groups: 0
Group statistics:
    Min number of members in a group:    0
    Max number of members in a group:    0
    Avg number of members in a group:    0
Course modinfo size: 10400
Number of sections: 2
Section statistics:
    Sections visible:    2
    Sections hidden:    0
    Min number of modules in a section:    0
    Max number of modules in a section:    4
    Avg number of modules in a section:    2
Number of grades: 0
Number of log entries: 121
Number of files: 1
Total file size: 61953
Cache build time: 0.016153812408447
===============================================

There is a moosh command for backup ... course-backup.

https://moosh-online.com/commands/

On one troubled site couldn't run autobackups.

Instead have a series of bash shell scripts looping through course ID numbers ( grouped by small courses, medium size courses, and mega large courses) saving to mounted data drive with tons of room - breaks apart the backups where there is a better chance they will finish!

Think Alain Raap in these forums found a way to offload auto backups of his large site to another server.  See if you can find Alain Raap postings in here!

Alain?  You still here?  Rattling your tree here to help! smile

'SoS', Ken


Average of ratings: Useful (2)
In reply to Ken Task

Re: Manually clearing automated backup status of RUNNING

by Bobby Siegfried -
Hi Ken,

This is fantastic information, thank you!

Just a little background that I should have included previously. We are running 3.7 first off. We have a lot of courses in the system because our data retention policy requires us to keep two years worth of courses in the production system. This is one of the reasons I do just about everything from the command line. Great suggestion about moosh as well! We use that all the time and it's amazing.

Thanks for the suggestion to check out moodledata/temp/backup - you were spot on. There are 3 course backups that were partially complete in that folder. Only one actually contains a backup.mbz folder, the other two never made it that far even. I'm wondering, if I delete those directories, will I be able to run the automated backups scheduled task successfully? Or is there still some record in the db that I need to search for and potentially update? Thoughts? Also an FYI, we run our automated backups and cron jobs on a separate server from the server that serves web traffic.

Thanks again for your time!
In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

Hmmm ... the more we talk ... the more we hear about your setup! ;)

Cleaning up in moodledata/temp/backup/ is normally handled by cron job.  If you found a backup.mbz file in one of those contenthashed named directories it was a valid backup but the process couldn't *copy* it to destination as per settings for automated backups.   That .mbz file you probably couldn't cp either as there are limits on size of file to copy ... one could have 'mv'd it, however.

To get a handle when courses are next to backup ...

mysql> select * from mdl_backup_courses;

Your query above probably needs cli > to a txt file since you've got soooo many courses, but output looks like:

+----+----------+---------------+-------------+------------+---------------+
| id | courseid | laststarttime | lastendtime | laststatus | nextstarttime |
+----+----------+---------------+-------------+------------+---------------+
|  1 |        1 |    1559152201 |  1559152202 | 3          |    1583087400 |

Am guessing if one changed the 'laststatus' to some value other than 3 ... on server am looking at 3 is success, I guess. and changing the epoch time under nextstarttime to something in near future where could use something like:

watch  'ls -ltR /path/to/moodledata/temp/backup/[somecontenthash]'

of some in-cantation of ls.

Something about the courses that is causing so have you looked at them?

'SoS', Ken

Average of ratings: Useful (1)
In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

Guess Alain Raap that solved similar problem busy ... soooo ...

" I updated the table with laststatus = 1, start/endtime with a valid timestamp (endtime after starttime) and nextstarttime with the timestamp for the next nightrun"

from:
https://moodle.org/mod/forum/discuss.php?d=383012#p1544803

'SoS', Ken

In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

Thought about ...

"We have a lot of courses in the system because our data retention policy requires us to keep two years worth of courses in the production system"

Suggest that's a choice to keep on production server.  Don't think even GDPR dictates such courses need to be on XYZ server.  College on West Coast USA, for example, has a new Moodle every year ... what ever is latest/greatest ... and that's the active/currently used Moodle.  Something like mdl2020.tld.  Last years moodle is therefore retained automagically and only admins/teachers can access - it's an archive.  Maybe students too but in a kiosk kinda mode - can see but can't interact with old courses, etc.  Think they keep 5 such servers ... so a 2018, 2017, and a 2016 are also online.

You could do something similar ... all courses not currently active could be xferred to a 'mdlarchive.tld' box thus reducing autobackup processing.  The mdlarchive.tld box wouldn't need daily autobackup ... maybe just one run to have a course backup.  That server set to 'nochange' and 'nomail' just in case.

'spirit of sharing', Ken


In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Bobby Siegfried -
I'm happy to share about our setup! Our Moodle is connected to our CRM so courses/people/enrollments are all created/updated in real-time. We have an process where we archive courses after the active semester but having them in the same system makes it easy for us to make them available to students when requested and for faculty for a least a year and a half so they can reference and import content from a previous course. That happens a lot and the import is a better option for us for a number of reasons. That being said, of the 18,000+ courses in the system, automated backups only backs up changes which is less than 2,000 courses during a run. That's pretty manageable and works well. But my hesitation for resetting a backup table is simply because I don't want to trigger the system to run fresh backups for every course in the system.

So I took a look at the mdl_backup_courses table and the timestamps were actually all in order, but the laststatus was 0 (which indicates an error). I could manually change that value, but I was unsure if that would actually trigger the backup to restart. I say that because I did some additional research into the mdl_backup_controllers table and found those hashed directory names of each of the failed backups that I cleared from moodledata/temp/backup and each entry had a status of 800 (backup executing).

I'm thinking that if I change the status of 800 to 700 or simply delete those three records from the mdl_backup_controllers table, the process should run successfully again. What do you think?

Also simple little command I was using to show only non-.log files in moodledate/temp/backup: ls -lI *.log (that's a capital 'i' after the -l)
Here is the source I was reading to find more about the status codes: https://github.com/moodle/moodle/blob/MOODLE_37_STABLE/backup/backup.class.php
I'm sure you are aware of all of those, but it's always good to share.

I did check the courses with the bad statuses and they are all average courses, no strange config, not very large. I think it is all just related to the job being killed administratively in the middle of the task running and those courses just happened to be caught in the middle of a backup.
In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

So much for suggesting and thinking out loud! smile

"I'm thinking that if I change the status of 800 to 700 or simply delete those three records from the mdl_backup_controllers table, the process should run successfully again. What do you think?"

Don't think that will do anything.   I have seen many course backups reach status of 900 or 1000 to finish successfully.   Think that table isn't really controlling anything but simply reporting.

The way I thought moodle cron/task worked ... if a task begun, a lock is placed on not only it but all things related to that task so that no other, what might conflicting process, can until it's finished.

Since moodle uses moodledata/temp/backup/ as the area to build resources for the building of a .mbz file there is one task for cleaning up temp file areas of which temp/backup/ would be one.

Since you say it takes a day or more to finish autobackups, can you set anything seen in task for cleaning up files/tables that you could set to run @ a time just a little after autobackups?

Did you read Alian's thread?  Thought he described how he offloaded autobackups to another server which wasn't a full clone of production site's code, but just enough to run the cron/task.

'SoS', Ken

In reply to Ken Task

Re: Manually clearing automated backup status of RUNNING

by Bobby Siegfried -
Got it working! Here's what I did last. Thank you for your suggestion of going back to the mdl_backup_courses table. That ended up being the right place. Alian's post was really helpful as well. I'm thinking of writing up some documentation for Moodle on this subject. The automated backup task feels like one of the most important tasks in the system, but the process is quite the mystery I'm finding! Below I'll share everything I did, even though it will recap some of our previous posts in hopes it helps someone else down the line.

So in full, when I first encountered the issue, one of the first things I did was run the automated backup cleanup task. It had run with the cron run, but I ran it again manually just to be extra sure. It completed successfully both times but that didn't resolve the issue.

Upon your suggestion, I took a look at moodledata/temp/backup/ to see if there were any unfinished backups (hashed directories that were not 0 byte .log files) and found 3. I removed those three directories. This did not resolve the issue.

Then I took a look in the mdl_backup_controllers table because I had seen some posts where it seemed like this table was more than just a reporting table - however as you said it does appear to be simply a reporting table. Nevertheless, I found some helpful information in the Moodle backup class code that actually describes the "status" codes well (Note we are running 3.7.x at this time so that's what this code is). I removed the 3 entries for the failed courses (had a status of 800 which is "currently executing") in case that was causing the backup task to fail - again not the case.

Finally, upon your suggestion and Alian's post that you shared, I located the entries in the mdl_backup_courses table for each of the 3 problem courses and found only two of them had "laststatus" set to 0, which I learned from another post of yours, Ken, means that the backup is "unfinished". More details on those status codes I found in your post and in the Moodle code: https://github.com/moodle/moodle/blob/MOODLE_37_STABLE/backup/util/helper/backup_cron_helper.class.php. I changed the status of both to 1 (status OK), but I did not change the "nextstarttime" at that point because I wanted to see what simply changing the laststatus value did. After updating the "laststatus" value, the automated backup task ran and completed successfully. Interestingly enough, the problem courses both reported a status of Skipping due to neither course being modified in the past 30 days (which is expected since these courses are both over a year old and not actively changing). After the first run of the automated backup task, the nextstarttime value was set to Saturday at 3:50am which is when the next scheduled run of automated backups is set for. I ran the task manually one more time and it completed successfully again, no more "error_setting_already_set". So it looks like the table mdl_backup_courses is the one to look at and potentially make changes to.

I'll update again after the task runs at its scheduled time on Saturday beginning at 3:50am and see if it runs into any further issues. My theory is that everything should be good at this point. Statuses in the related db tables are all good and the temp directories of the failed backup attempts have been removed.

Thank you Ken for some great insight here, your assistance has been tremendous.
Average of ratings: Useful (4)
In reply to Bobby Siegfried

Re: Manually clearing automated backup status of RUNNING

by Ken Task -
Picture of Particularly helpful Moodlers

Thanks for your kind words.   If I could rate your posting higher than +1 (useful) it would be +20! smile

'SoS', Ken

In reply to Ken Task

Re: Manually clearing automated backup status of RUNNING

by Bobby Siegfried -
I should also add we are running 3.7 and all plugins are up to date currently. smile