disk or moodle component issues?

disk or moodle component issues?

by Ángel Ayllón -
Number of replies: 2

Hi,

Today the system was doing well without particularly large load until at 13:02 I saw in analytics how the number of connected users was rapidly falling from 900 to less than 100; I went to the 4 web nodes and verified that the average load (load average) was 700-800% of CPU in all of them. I checked that everyone had reached the maximum of apache workers but there was plenty of RAM.

I could not even enter myself, the login page did not load me.

I checked that HAproxy was still balancing the requests that came to it and that it was deriving requests to the database. Keepalived had the virtual IP configured in DNS and there was connectivity between the web nodes and the session memcached and application MUC.

I restarted everything around 13:20 (keealived, HAproxy, memcache, apache) and the controller node (HAproxy / keepalived / memcached) the whole machine and when starting everything remained the same. After 20 minutes from the restart without explanation, everything returned to normal around 1:48 p.m.

Some things that I have seen later in the php_errors of the nodes of that time and not usual:


[09-Jun-2020 13:04:59 Europe / Madrid] PHP Warning: unlink (/moodle/moodledatacv/2019_2020/cache/core_component.php): No such file or directory in / moodle / www / moodlecv / 2019_2020 / lib /classes/component.php on line 295


[09-Jun-2020 13:09:08 Europe / Madrid] PHP Warning: include (): Failed opening '/moodle/moodledatacv/2019_2020/cache/core_component.php' for inclusion (include_path = '/ moodle / www / moodlecv /2019_2020/lib/pear:.:/usr/share/pear:/usr/share/php ') in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 259


[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: Use of undefined constant CACHE_DISABLE_ALL - assumed 'CACHE_DISABLE_ALL' (this will throw an Error in a future version of PHP) in / moodle / www / moodlecv / 2019_2020 /lib/classes/component.php on line 254


[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: realpath () expects parameter 1 to be a valid path, string given in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 640


[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: is_dir () expects parameter 1 to be a valid path, string given in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 649


In / var / log / messages you see disk messages like these (OCFS2 moodledata partition):


Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_check_dir_for_entry: 2058 ERROR: status = -17
Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_mknod: 492 ERROR: status = -17
Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_create: 672 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_check_dir_for_entry: 2058 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_mknod: 492 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_create: 672 ERROR: status = -17
Jun 9 13:11:05 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29873,1): ocfs2_rename: 1666 ERROR: status = -2
Jun 9 13:11:06 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29664,4): ocfs2_rename: 1666 ERROR: status = -2
Jun 9 13:11:06 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29786,1): ocfs2_rename: 1666 ERROR: status = -2

The last messages as above are from 1:48 pm when the problem ended.

Thanks,

Average of ratings: -
In reply to Ángel Ayllón

Re: disk or moodle component issues?

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
OCFS2? You may well be on your own there...

Might this be some sort of file locking issue?
In reply to Ángel Ayllón

Re: disk or moodle component issues?

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Cor... ocfs2, I tried that many years ago but found that it was far too slow and unstable at the time and we reverted back to ZFS. That would have been over 10 years ago now.

It does look like an infrastructure issue there, especially given is resolved with all nodes working at once without explanation. My guess would be that the ocfs2 service may have been doing something or something caused it to be extremely unresponsive. I imagine that the 700-800% load average would have been predominantly I/O. It _could_ be a DOS attack, but you would also see the obvious logs in your haproxy/httpd logs in such a case.

There is one bug in there which we should address in Moodle:

[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: Use of undefined constant CACHE_DISABLE_ALL - assumed 'CACHE_DISABLE_ALL' (this will throw an Error in a future version of PHP) in / moodle / www / moodlecv / 2019_2020 /lib/classes/component.php on line 254

As I say, this looks very much like a disk went away infrastructure issue an not an application issue.

Andrew