I need to come back to the developers, since I suspect a bug, a race condition in the processing of ad-hoc tasks, specially when they are slow.
The original thread
Has Ghostscript something to do with a half a million lock files? was about 500 k lock files which were created a long time ago, June/July 2021. After we deleted them on 23 Nov they didn't reappear. No problem so far.
But then on the 26 Nov morning the CPU load average took a steep uphill, passed 20 (in a
server with 40 cpu threads) and the Moodle notices page showed cron.php not run a whole day - although it was set to run, and was running, every 2 minutes.
Server Time: Fri, 26 Nov 2021 08:54:00 +0530
Execute scheduled task: Background processing for caches (core\task\cache_cron_task)
... started 08:54:00. Current memory use 18.9MB.
Cleaning up stale session data from cache stores.
... used 0 dbqueries
... used 0.009213924407959 seconds
Scheduled task complete: Background processing for caches (core\task\cache_cron_task)
Execute scheduled task: Automated backups (core\task\automated_backup_task)
... started 08:54:00. Current memory use 19.8MB.
Checking automated backup status...INACTIVE
... used 0 dbqueries
... used 0.20417094230652 seconds
Scheduled task complete: Automated backups (core\task\automated_backup_task)
[many more lines]
Execute scheduled task: Background processing for scheduled allocation (workshopallocation_scheduled\task\cron_task)
... started 08:54:01. Current memory use 41.7MB.
... no workshops awaiting scheduled allocation. ... used 1 dbqueries
... used 0.0008080005645752 seconds
Scheduled task complete: Background processing for scheduled allocation (workshopallocation_scheduled\task\cron_task)
Ran 0 adhoc tasks found at Fri, 26 Nov 2021 08:54:00 +0530
Cron script completed correctly
Cron completed at 08:54:01. Memory used 41.7MB.
Execution took 0.987506 seconds
--> Now here there is a huge binary blob!!!
--> followed by
Convert 1 submission attempt(s) for assignment 808
Convert 1 submission attempt(s) for assignment 808
Convert 1 submission attempt(s) for assignment 808
Convert 1 submission attempt(s) for assignment 808
[repeated close to 100 times]
... used 83686 dbqueries
... used 4059.3367171288 seconds
Scheduled task complete: Prepare submissions for annotation (assignfeedback_editpdf\task\convert_submissions)
Stopping processing of scheduled tasks as time limit has been reached.
Ran 0 adhoc tasks found at Fri, 26 Nov 2021 08:30:00 +0530
Cron script completed correctly
Cron completed at 09:37:50. Memory used 1.2GB.
Execution took 4070.286554 seconds
===
Notice that this log file started at 08:54:00 and reports a task which was aborted at 09:37:50 after consuming 68 minutes (4070 seconds). Which means that the process must have started at 08:30:00!
What does that mean? Was there a race condition? The binary data in the log file are typical for a lost or confused file descriptors. Is it possible that the same thing happened in June/July and created that sea of lock files? Or, the deleted lock files broke the logic of the Lock API.
This is an important production server of an academic institution which has become "strategically important" during the pandemic! It is not a server I maintain, just a trouble shooter.