Hi,
Today the system was doing well without particularly large load until at 13:02 I saw in analytics how the number of connected users was rapidly falling from 900 to less than 100; I went to the 4 web nodes and verified that the average load (load average) was 700-800% of CPU in all of them. I checked that everyone had reached the maximum of apache workers but there was plenty of RAM.
I could not even enter myself, the login page did not load me.
I checked that HAproxy was still balancing the requests that came to it and that it was deriving requests to the database. Keepalived had the virtual IP configured in DNS and there was connectivity between the web nodes and the session memcached and application MUC.
I restarted everything around 13:20 (keealived, HAproxy, memcache, apache) and the controller node (HAproxy / keepalived / memcached) the whole machine and when starting everything remained the same. After 20 minutes from the restart without explanation, everything returned to normal around 1:48 p.m.
Some things that I have seen later in the php_errors of the nodes of that time and not usual:
[09-Jun-2020 13:04:59 Europe / Madrid] PHP Warning: unlink (/moodle/moodledatacv/2019_2020/cache/core_component.php): No such file or directory in / moodle / www / moodlecv / 2019_2020 / lib /classes/component.php on line 295
[09-Jun-2020 13:09:08 Europe / Madrid] PHP Warning: include (): Failed opening '/moodle/moodledatacv/2019_2020/cache/core_component.php' for inclusion (include_path = '/ moodle / www / moodlecv /2019_2020/lib/pear:.:/usr/share/pear:/usr/share/php ') in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 259
[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: Use of undefined constant CACHE_DISABLE_ALL - assumed 'CACHE_DISABLE_ALL' (this will throw an Error in a future version of PHP) in / moodle / www / moodlecv / 2019_2020 /lib/classes/component.php on line 254
[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: realpath () expects parameter 1 to be a valid path, string given in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 640
[09-Jun-2020 13:02:45 Europe / Madrid] PHP Warning: is_dir () expects parameter 1 to be a valid path, string given in /moodle/www/moodlecv/2019_2020/lib/classes/component.php on line 649
In / var / log / messages you see disk messages like these (OCFS2 moodledata partition):
Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_check_dir_for_entry: 2058 ERROR: status = -17
Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_mknod: 492 ERROR: status = -17
Jun 9 13:09:35 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29666,4): ocfs2_create: 672 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_check_dir_for_entry: 2058 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_mknod: 492 ERROR: status = -17
Jun 9 13:09:42 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29783,5): ocfs2_create: 672 ERROR: status = -17
Jun 9 13:11:05 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29873,1): ocfs2_rename: 1666 ERROR: status = -2
Jun 9 13:11:06 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29664,4): ocfs2_rename: 1666 ERROR: status = -2
Jun 9 13:11:06 moodle2017-n2 kernel: (/ usr / sbin / httpd, 29786,1): ocfs2_rename: 1666 ERROR: status = -2
The last messages as above are from 1:48 pm when the problem ended.
Thanks,