Moodle is down with error messages

Moodle is down with error messages

by Aly Kong -
Number of replies: 12

The version of our Moodle 3.10, CentOS 7.4, Apache 2.4.29, PHP 7.2.16, runs event mpm and php-fpm.

We set up the cron job to restart httpd at midnight every day.

Recently, the Moodle is down and we even can't be login the CentOS via SSH after several hours latter than the daily httpd restart. We can do nothing except power off the VM and reboot it.

There are a lot of error messages showing "AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit." For another case, the apache log recorded a lot of error messages: "AH01067: Failed to read FastCGI header" and "AH01075: Error dispatching request to :"

Is there anyone know what is the cause of this problem? Please advise. Thanks a lot!


Average of ratings: -
In reply to Aly Kong

Re: Moodle is down with error messages

by Ken Task -
Picture of Particularly helpful Moodlers

Comment: sounds like you were having issues and decided to run a cron job to fix it. Only decision to restart web service every night didn't fix.
It's very rare that one cannot ssh into a CentOS 7 server even if all other services are having issues.

Is this an all in one server?   Apache/MySQL/Moodle on one box?
Specs?   Memory/Disk etc.
What does the top of top look like?

Think I'd advise, removing that cron job and dig into the problems.
For one, reduce what apache mods you load.  If a module isn't really necessary for a moodle don't use it.

Install Apache2Buddy ...
https://github.com/richardforth/apache2buddy.git
a pearl script and like MySQLTuner, it looks at config
of Apache and makes recommendations based upon usage ... rebooting clears usage so you'd want longer usage states to get a better picture of what is going on with server.

Also, assume you are keeping the OS up to date ... just 2 days ago:
Jan 28 06:24:58 Updated: httpd-tools-2.4.6-98.el7.centos.6.x86_64
Jan 28 06:24:59 Updated: httpd-2.4.6-98.el7.centos.6.x86_64
Jan 28 06:25:08 Updated: httpd-devel-2.4.6-98.el7.centos.6.x86_64
there was an update to Apache!

'SoS', Ken


In reply to Ken Task

Re: Moodle is down with error messages

by Aly Kong -
Yes, this is an all-in-one server. MYSQL is located in different virtual disk but is in the same CentOS (VM) as that of Apache and Moodle. It is not an easy task for us to split the database and apache although we know the advantages. I am also worried if the access to database mounted in another VM by NFS would be slow.

The server is equipped with total 133GB physical RAM and 121GB free memory. The moodle folder in sda virtual disk has size 36GB but has already used 32GB. I wonder if this would cause problem.

I have checked the Apache2Buddy and mysqltuner. The mysqltuner recommends optimizing the tables to free 645MB space. Overall, they look normal.

The value of ServerLimit is 32 and the MaxRequestWorker is 800. As it is not peak hour when the incident happens, I estimate the number of apache process request should not be over 200. I can't understand why the error message "AH03490 the scoreboard is full..." would appear. I checked the server-status and found that there are only 9~10 symbols including "W" and "R" when Moodle runs in normal situation. I may have to check it when the incident happens.
In reply to Aly Kong

Re: Moodle is down with error messages

by Aly Kong -
My colleague said he saw "Database is overload" at the moodle web GUI when the incident happened.

The below is part of the mysqltuner result.

!!] Total fragmented tables: 3

-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[!!] failed to execute: SELECT CONCAT(user, '@', host) FROM mysql.user WHERE (IF(plugin='mysql_native_password', authentication_string, password) = '' OR IF(plugin='mysql_native_password', authentication_string, password) IS NULL) AND plugin NOT IN ('unix_socket', 'win_socket', 'auth_pam_compat')
[!!] FAIL Execute SQL / return code: 256
[OK] All database users have passwords assigned
[!!] failed to execute: SELECT CONCAT(user, '@', host) FROM mysql.user WHERE CAST(IF(plugin='mysql_native_password', authentication_string, password) as Binary) = PASSWORD(user) OR CAST(IF(plugin='mysql_native_password', authentication_string, password) as Binary) = PASSWORD(UPPER(user)) OR CAST(IF(plugin='mysql_native_password', authentication_string, password) as Binary) = PASSWORD(CONCAT(UPPER(LEFT(User, 1)), SUBSTRING(User, 2, LENGTH(User))))
[!!] FAIL Execute SQL / return code: 256
[!!] User 'root@%' does not specify hostname restrictions.
[!!] There is no basic password file list!

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.2% (3M used / 16M cache)
[OK] Key buffer size / total MyISAM indexes: 16.0M/45.0K
[!!] Read Key buffer hit rate: 50.0% (6 cached / 3 reads)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 28.0G/11.3G
[OK] Ratio InnoDB log file size / InnoDB Buffer pool size: 3.4G * 2/28.0G should be equal to 25%
[OK] InnoDB buffer pool instances: 28
[--] Number of InnoDB Buffer Pool Chunk : 224 for 28 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.98% (1996259851 hits/ 1996626792 total)
[!!] InnoDB Write Log efficiency: 45.67% (889941 hits/ 1948635 total)
[OK] InnoDB log waits: 0.00% (0 waits / 1058694 writes)

General recommendations:
Control warning line(s) into /var/log/mariadb/mariadb.log file
Control error line(s) into /var/log/mariadb/mariadb.log file
Run OPTIMIZE TABLE to defragment tables for better performance
OPTIMIZE TABLE `moodle`.`mdl_files`; -- can free 61 MB
OPTIMIZE TABLE `moodle`.`mdl_notifications`; -- can free 480 MB
OPTIMIZE TABLE `moodle`.`mdl_question_attempts`; -- can free 104 MB
Total freed space after theses OPTIMIZE TABLE : 645 Mb
Restrict Host for user@% to user@SpecificDNSorIp
Adjust your join queries to always utilize indexes
Variables to adjust:
join_buffer_size (> 320.0K, or always use indexes with JOINs)

Please advise if MySQL database is the cause of the problem. Thanks a lot.
In reply to Aly Kong

Re: Moodle is down with error messages

by Ken Task -
Picture of Particularly helpful Moodlers

With 133GB physical RAM you should not be having memory issues!

MySQLtuner
Do optimize tables!   It's not really the size/space regained it's
*speed*!


Databases run best when you can fit *all* of it in memory.
buffer pool/data size and buffer pool instances in settings

Do I think is there a database issue ..... well, yes!

[!!] FAIL Execute SQL / return code: 256[!!]

Needs investigating, don't you think?

scoreboard - never found that useful.

How many other little things do you have running that take away from processing DB for Moodle or serving out a client browser request in the Moodle?

All 'little things' add up ... but this is the issue:

Recently, the Moodle is down and we even can't be login the CentOS via SSH after several hours latter than the daily httpd restart. We can do nothing except power off the VM and reboot it.

Even if DB/Apache struggling you should be able to ssh into server!!!!!

What is the VM Server?   VMWare, VirtualBox, ??????   Is the moodle server the *only* guest OS on that VM Server?   Fact that you can't reach the Moodle until you restart the VM, does sound like a VM issue to me!

'SoS', Ken


In reply to Ken Task

Re: Moodle is down with error messages

by Aly Kong -
I realize that we have recently deleted 1600 graduated students' accounts. This may cause the need of table defragmentation.

[OK] InnoDB buffer pool / data size: 28.0G/11.3G
[OK] Ratio InnoDB log file size / InnoDB Buffer pool size: 3.4G * 2/28.0G should be equal to 25%
[OK] InnoDB buffer pool instances: 28
We should have fit the whole database in memory

We also suspect that the ESET Antivirus may cause problem which put heavy workload on database.

It is strange that we can't login the Moodle even at the VMware console.

The VM Server is VMware ESxi. The moodle server is the only guest OS on that Server.
In reply to Aly Kong

Re: Moodle is down with error messages

by Aly Kong -
We also found a lot of error messages at message log as below when the incident happened.
moodle kernel: eset_rtp(ertp_wait_for_result): wait fro scanner timeout, id: -l, path: /run/systemd/system/session-......
moodle kernel: eset_rtp(ertp_wait_for_result): wait fro scanner timeout, id: -l, path: /tmp/MY0lu9nz (deleted), size: ....
moodle kernel: eset_rtp(ertp_wait_for_result): wait fro scanner timeout, id: -l /mnt/sdb1/moodledata/sessions/sess_gr8og497....

That's why we think the ESET antivirus is also problematic.
In reply to Aly Kong

Re: Moodle is down with error messages

by Ken Task -
Picture of Particularly helpful Moodlers

Yep, ESET AV also!  Scanning a users sessions file?

/mnt/sdb1/moodledata/sessions/

Anyway to set that scanning to just moodledata/filedir/

What does df -h look like?   moodledata on a mounted device/point?

Happen to admin a K12 moodle on a VMWare box - just the guest OS (CentOS 7). Don't know version/flavor of VMWare, but the moodledata on that server is in a /home on  /dev/mapper/centos-home  2.0T  1.1T  921G  54% /home

OS + DB server and DB's + Apache on /dev/mapper/centos-root   50G   15G   36G  29% /

BenchMark Plugin reports a 72 last time I ran it - just the other day.  Has a 3.9.highest for production and a 4.1.1 sandbox they are exploring.

but only 6874 users and 243 courses - they used to have all campuses on it but now only HS.

What's the difference between /mnt/sdb1/ and a /dev/mapper/ in VMWare lingo?

'SoS', Ken


Average of ratings: Useful (1)
In reply to Ken Task

Re: Moodle is down with error messages

by Aly Kong -

The below is the result of df -h

Filesystem      Size  Used Avail Use% Mounted on

/dev/sda2        37G   33G  4.2G  89% /

/dev/sdc1       197G   58G  130G  31% /mnt/sdc1

/dev/sdb1       2.5T  465G  1.9T  20% /mnt/sdb1

sda2 is moodle folder, sdc1 is database, sdb1 is moodledata

VMware is ESXi 6.7

For ESET antivirus, it is scheduled to scan all files from the root directory tree /. Is there any problem?

Our benchmark reports 50 points.

Our moodle has about 3500 courses and 5000 users.

Please share some sources of online information about fixing Log4j. Thanks a lot.


In reply to Aly Kong

Re: Moodle is down with error messages

by Ken Task -
Picture of Particularly helpful Moodlers

Uhhh ... you need to sign up for CentOS list:
https://lists.centos.org/mailman/listinfo/centos-announce
so you can keep up with fixes/patches/etc. for your OS!

As root, run: yum check-update
and let's see what your server says!

Am almost certain VMWare has something similar!

Log4j - if I re-call correctly - affected Moodle only if one were
using Solr for Moodle Search - and involved Java/SDK.   
But, there were still some OS fixes/patches for it that should be acquired.  Maybe a Moodle security expert can confirm that ... or clarify.

Also appears you are running MariaDB and not MySQL - while MariaDB was supposed to be a drop-in replacement for MySQL, there are some differences.   What do you get for mysql -V as root?

And you might want to refer to:

http://www.syndrega.ch/blog/#php-and-dbms-compatibility-of-major-moodle-releases

'SoS', Ken

In reply to Aly Kong

Re: Moodle is down with error messages

by Ken Task -
Picture of Particularly helpful Moodlers

What's total number of users?

What's total number of courses?

Yes, removing 1600 users - and all of their work - would cause table fragmentation.

Optimize the tables.

I take it that issues related to Log4j were addressed ... that could have affected both VMware and CentOS.


In reply to Aly Kong

Re: Moodle is down with error messages

by Mark Johnson -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
"AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit."

This means that httpd has reached the limit for the number of workers it can use to serve requests at once. This is a configured setting in your http config (look at the values of ServerLimit and MaxRequestWorkers), so when a new request comes it it cannot respond.

This message combined with the fact that you cannot SSH in suggests to me that your server lacks sufficient memory to serve the number of requests it is configured to handle, and at the same time it is receiving more than this number of requests at a time.

One useful way to see this is with mod_status. Once it's enabled (it may already be), visiting http://yoursite/server-status will show the "scoreboard" that it's talking about. You can see the status of each request, and that might help you understand why you are running out of workers. For example, if there are lots of "W"s it may be taking a long time for your site to generate and send a response, so you can investigate why that is. If there are lots of "K"s, it's keeping connections open after the request has been served, blocking additional requests, so you might want to reduce your KeepAlive timeout.

It would also be useful to install some monitoring software that can log metrics on your server, such as CPU, Memory, Disk usage, and the number of requests being served. This will help you understand how the issues are related.

Average of ratings: Useful (1)