Help! Database server CPU maxed out

Help! Database server CPU maxed out

by Ben Kahn -
Number of replies: 14

Hello, I run a self-hosted Moodle at a small University. I took over admin duties last Spring and have spent as much time as I can familiarize myself with the architecture but I did not originally build it out.  We are entering the 5th year of running the same instance. It was originally 3.0 or 3.1, upgraded from 3.2->3.4 last year, then I recently upgraded it from 3.4 -> 3.6 a few weeks ago. We run Linux servers on VMWare: three webservers behind a load balancer, data on a shared NTFS mount, a memcached server,  and a database with a separate read-only slave db.

Everything has been stable but today on the first day of classes the site is running somewhat sluggishly. Looking at metrics the webservers are fine but the database server is pegged at or near 100% since 9am this morning when usage started really ramping up. Looking at htop on the database server I can see there is one MySQL process that is using almost all of the CPU (see screenshot below)

I restarted the MySQL service and have rebooted the VM, hoping that this was a rogue runaway process, but neither has helped. I have some thoughts but am really hoping those in the community with more technical experience might be able to provide some insights.

  • Is there a good way to identify what the MySQL process is? Maybe it's related to a plugin or something specific. I can log-in to the database and do SHOW FULL PROCESSLIST but not sure if I can get more information about that specific process to see what tables it's writing to.
  • I admit I don't know much about memcached - is it possible the upgrade broke caching and this is putting extra strain on the DB? Is there an easy way to tell if caching is working or not? I am using the same config.php file after the upgrade. Is there somewhere else in Moodle GUI I should be looking?
  • Is it possible something else in the upgrade broke that would cause something like this? Anything I should be looking at?
  • Are we crazy for keeping a big Moodle install around for this many years without a fresh install?

Thanks in advance for any advice...if I left out any info that would be useful to know I'm sure I can get it. I've been enjoying learning more about administering Moodle with this community's help over the last year and hope this is a learning moment for me.


Average of ratings: -
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Ken Task -
Picture of Particularly helpful Moodlers

A little more background gathered from OP's other postings ...

"5,000-ish University users) Moodle install with a lot of plugins
there is never a time when at least 100-200 users aren't logged in
"

Questions:

Have student numbers increased this year?

DB server on a separate dedicated server ... same server as
memcached server?

MySQL or MariaDB? What version?

Have you ever run MySQLTuner.pl?  Tweaked the config of the DB server?

Same config.php you've mentioned ... I take it there have been tweaks to that.

How many addons do you have and are all compat with 3.6.highest?
Any locally customized/developed plugins?
What theme are you running?
Is cron job set to execute/run every minute?   Anything in cron or task complaining ... failing to run?

Is Autobackup running?

How are these dedicated servers networked?  Load balanced web front end with moodledata (all of it?) on backended NFS share ... private IP on a another segment of network?    Not trying to be nosey ... but networking does come before application and could be a source of bottlenecks.

'SoS', Ken


In reply to Ken Task

Re: Help! Database server CPU maxed out

by Ben Kahn -

Ken, you're a legend...here are the responses I can get out there quickly...

Questions:

Have student numbers increased this year?

Not substantially...and as much as I'd like to, I can't blame increased user count (orange is this year so far vs last year at the beginning of Fall term in blue)

chart showing user count in moodle

DB server on a separate dedicated server ... same server as memcached server?

 Everything is on its own dedicated server each for moodle mysql1 (main) for mysql2 (backup) and for memcache.

 Running htop on memcache server shows very low CPU utilization.

 MySQL or MariaDB? What version?

MariaDB 5.5.65

Have you ever run MySQLTuner.pl? Tweaked the config of the DB server?

Have not but sounds like I need to as part of my next steps. I can also request a CPU resource increase on the VM resource but don't want to waste people's time if the culprit is an out-of-control process.

Same config.php you've mentioned ... I take it there have been tweaks to that.

It has some customization of course. But I did not change anything when upgrading from 3.4 to 3.6. just did a copy from the old code folder to the new

How many addons do you have and are all compat with 3.6.highest?

Any locally customized/developed plugins?

60 additional plugins…most are rested up to 3.6 but there are a few old ones kicking around that list something like " 2.7 and up"...A few (2 or 3) custom plugins remain despite my efforts to get off of them. if there is a way to trace processes in htop back to specific plugins i'd love to hear ideas. 

What theme are you running?

Vanilla Boost theme

Is cron job set to execute/run every minute?

Yup

 Anything in cron or task complaining ... failing to run? 

 Don't think so but will check the logs

Is Autobackup running?

Nope

How are these dedicated servers networked? Load balanced web front end with moodledata (all of it?) on backended NFS share ... private IP on a another segment of network? Not trying to be nosey ... but networking does come before application and could be a source of bottlenecks.

Have to admit I'm not as familiar with the network topology. That's handled by another team. All of moodledata is on an NTFS mount that is available in the Linux filesystem for the webservers. As far as I know all components are on the same subnet, behind the campus firewall and can all talk to each other but only the webservers can access the internet.


In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Ken Task -
Picture of Particularly helpful Moodlers

Do install and run MySQLTuner on node 1 of web front end and on DB server itself.   One involves networking ... web -> DB server .. the other ... does not.  You will see two different outputs with recommendations which shouldn't be applied blindly without researching further.

Number of students ... nothing like a query of mdl_user table to know exact info.  Doing that from mysql client from web server is realistic to what goes on via GUI, me thinks.  You might be surprised at how many accounts there really are.

Am sure you are aware that there is a ball-park forumula for config of DB server ... think it was 20 concurrent users = 1Gig memory on DB server - which really means connections from mdl to DB server.    Tuner will show some info related.

Plugins ... do any/all have a task to clean up?  Any of them 'heavy' processing?  Really don't know how to measure that other than running via CLI or run now and at the same time trying to run something on DB server in realtime to see what (if anything) spikes.

There is a report plugin called Bench Mark ... author doesn't recommend running on production servers, but ... measures performance of many things ... quickly.  As long as you don't run during prime time might give you some more info/insight into where to look for bottlenecks

Hidden ... not linked in admin menus .. .is a thing called Health Check.  Not something you run during prime time.  /admin/tool/health/index.php it is heavy processing but might also help you discover how well your quiz bank is.

Using DB for sessions or files?  Use files.  Related to moodledata ...

If all of moodledata is on NFS consider using NFS mount only for filedir ... the other directories ... like sessions/localcache/temp ... need to be 'fast' ... and nothing faster than on the web servers.  Which brings a thought about the load balancing ... when a student connects does that connection become 'sticky' ... stays with that LB instance?

Cron jobs can be run from a separate server now ... might consider adding another instance in the LB setup and make that instance dedicated to cron jobs.

There is also number of devices ... typical college campus?  Users might have their iphone/smartphone connected as well as their laptop if on campus.  All wireless and from different locations in your WAN.

There is also how courses are designed and used ... multimedia served from moodle or from dedicated/internal servers for videos/audio's? Moodle never was a steaming server.

Beginning of new Academic session in your situation just like first day of school for a K12 entity ... heavy hits ... then kinda settles down.

Hey!  I know there are other folks in these forums that could contribute here ... feel free to jump in guys/gals! smile

'SoS', Ken


Average of ratings: Useful (2)
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Jerry Lau -

How much ram is on the db server?

If possible, could you copy and paste the .cnf file you have (change the name of your db and path to executables for privacy - use any dummy path ie. /var/opt/ourserver/dbexample, etc.)

Are there automated snapshots on your VM platform that may be taking up space.

Your cpu should at least be 4 core I think

In your moodle, I noticed you said you copied the 3.4 code and to 3.6 folder? How did you do it exactly. Did you actually do a proper upgrade? I have notes that I can share with you offline that I use to do an annual upgrade. Its in the moodle docs but mine is a more fine-tuned to what I want to do..

do you have to have a large question bank for 3.4? This would need to be cleaned up before upgrading to 3.6. We had about 14 million unused question banks that took a toll on our system to an unusable state

Let's start with these ....





Average of ratings: Useful (1)
In reply to Jerry Lau

Re: Help! Database server CPU maxed out

by Ben Kahn -
Hi Jerry, thanks for your reply! Here are some answers to your q's:

How much ram is on the db server?

6GB. Currently not seeing utlization over 2GB, even when the CPU is fully pegged. Makes me think some tweaking is needed to cache more/bigger requests. I will research but welome any suggestions.

If possible, could you copy and paste the .cnf file you have (change the name of your db and path to executables for privacy - use any dummy path ie. /var/opt/ourserver/dbexample, etc.)

Here is the contents of /etc/my.conf - I don't see any private/connection info so maybe there is another local config file to look at? The files in the my.cnf.d directory appear to be left at defaults.

#
# This group is read both by the client and the server
# use it for options that affect everything
#
[client-server]
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
[client]
default-character-set = utf8mb4
[mysqld]
server-id = 1
log-bin = mysql-bin
binlog-ignore-db = “mysql”
# Added by Tony Box 2/17/15 to fix bug here http://bugs.mysql.com/bug.php?id=39489
#
log_bin_trust_function_creators = 1
expire_logs_days        = 7
max_binlog_size         = 300M
sync_binlog             = 1
binlog_format=row
max_connections         = 1000
max_allowed_packet      = 64M
key_buffer_size         = 128M
thread_cache            = 16384
query_cache_limit       = 64M
query_cache_size        = 256M
character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci skip-character-set-client-handshake tmp_table_size = 128M max_heap_table_size = 256M table_cache=1000 innodb_file_format = Barracuda innodb_file_per_table = 1 innodb_large_prefix [mysql] default-character-set = utf8mb4


Are there automated snapshots on your VM platform that may be taking up space.

There are automated backups but they only run at night and are stored on an NTFS mount, I don't think they are causing any issue.

Your cpu should at least be 4 core I think

We are doing an emergency maintenance window this evening to double the VM for 2 to 4 cores. Hopefully this gives us some breathing room.

In your moodle, I noticed you said you copied the 3.4 code and to 3.6 folder? How did you do it exactly. Did you actually do a proper upgrade? I have notes that I can share with you offline that I use to do an annual upgrade. Its in the moodle docs but mine is a more fine-tuned to what I want to do.

Sorry if that was unclear. I just meant I copied the config.php file from the old Moodle3.4 code folder to the new Moodle3.6 code folder, then ran the upgrade. So the process was:

• Site in CLI maintenance mode
• Rename Moodle folder to Moodle-bak
Git clone 3.6 to newly created folder named Moodle
• Copy config.php from Moodle-bak to Moodle
• Run CLI upgrade

I've been meaning to write up my notes for upgrade (with 60 plugins managed as Git submodules), would love to compare notes.

do you have to have a large question bank for 3.4? This would need to be cleaned up before upgrading to 3.6. We had about 14 million unused question banks that took a toll on our system to an unusable state

Hmmmm…I have no idea! I haven't looked into this. I'm sure we had lots of old crap in our quiz banks…. Lots of folks importing courses from past semesters too, and probably multiplying old quiz questions…. Any suggested reading on this before I start Googling? This sounds like something that could definitely be cleaned up.

I will post back here after the CPU bump to let folks know how that effects things.
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
If you didn't tune MySQL (MariaDB) then it normally is configured to limit the amount of memory it will use. That could be your problem right there... you simply aren't allowing the db the resource it needs.

For a Moodle site or any significance, database tuning is pretty much mandatory.

So, I'm going to say what everybody else has said... run mysqltuner. This is the vital next step. There's really very little point discussing this much further until you've established that MySQL has been set up to use the available resources on its server. Please excuse my trademark bluntness

PS. If you don't have it (or an equivalent) run munin on your Moodle servers. If covering the basics on your site doesn't clear this up then we/you are going to need all the information we can find. 

PPS. Just a thought - you're not running sessions in the database are you? Don't. It overloads the database on anything more than a trivial site. 
Average of ratings: Useful (1)
In reply to Jerry Lau

Re: Help! Database server CPU maxed out

by Ben Kahn -
As an update: I worked with my infrastructure team to increase CPU cores on the DB server from 2 to 6. That required about 5 minutes of downtime to turn the server off, change a setting and bring it back up but it instantly helped performance quite a bit. Utilization is now sitting well below 100%. It is still not using more than 25% or so of available memory (6GB). When I have time later in the semester I will definitely revisit tuning the DB with mysqltuner and possibly look at cleaning up old quiz question as well. Thanks for the advice, everyone!
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Noted that Ken is talking about NFS and you about a "shared NTFS mount". Is that a typo?
In reply to Visvanath Ratnaweera

Re: Help! Database server CPU maxed out

by Ben Kahn -
Visvanath, I may be a bit sloppy in my terminology...my school runs Windows fileshares, and one of these is mounted to all of the Linux Moodle webservers and used as the Moodle Data folder. So, the filesystem on the actual drive I think is NTFS. But it is mounted to Linux as a network file share.
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Ben

Rereading the discussion I do not think you were sloppy, not in the original post. There you wrote "We run Linux servers on VMWare: three webservers behind a load balancer, data on a shared NTFS mount, a memcached server, and a database with a separate read-only slave db." The others saw NFS perhaps because attaching a NTFS filesystem mounted on a Windows Server through SMB/CIFS to a Linux machine to be the $moodledata file area is unthinkable to them. I didn't expect it either, to be honest.

Now before doing anything else please run the Moodle Benchmark and post the result here. You can take the recent discussion I/O File Performance as a sample.
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Ken Task -
Picture of Particularly helpful Moodlers

Again, install and run mysqtuner.   Can run it even when site is not in maintenance mode.  It is not a moodle plugin.

In output pay attention to:

Performance Metrics
Maximum reached memory usage:
Maximum possible memory usage:
Highest usage of available connections:
Aborted connections

InnoDB Metrics
nnoDB buffer pool / data size
InnoDB buffer pool instances:

and the recommendations at the end.

'SoS', Ken

Average of ratings: Useful (1)
In reply to Ken Task

Re: Help! Database server CPU maxed out

by Ben Kahn -
Definitely will be doing this Ken. Just waiting for my "Scaling up" of adding more CPU cores to be done and will go from there. Thanks.
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Jerry Lau -
Ben... run this query and see what you get... is the database on its own server or shared with moodle application. 6GB seems awful small to me... it should be on its own db with at least 16GB with 4-core CPU on same network segment.


SELECT count(*) as NumerOfRandomQuestions from  yourprefix_question where id in (SELECT q.id FROM  yourprefix_question AS q LEFT JOIN _quiz_slots AS qslots ON q.id = qslots.questionid WHERE qslots.questionid IS NULL AND q.qtype = 'random');

*yourprefix is your moodle table's prefix name


There was bug introduced in version 3.4 but fixed in version 3.4.8 (tell us which version of moodle 3.4.x you had before you upgraded to 3.6.x).

We also upgraded from 3.4.8 to 3.6.5 a couple of months ago and its performing very well now.. thanks to new disk storage type by our network team.. (flash storage). Snapshots now takes seconds instead of minutes and about 90% faster..
In reply to Ben Kahn

Re: Help! Database server CPU maxed out

by Jerry Lau -

Did you run the query I suggested and what was the result of it?



SELECT count(*) as NumerOfRandomQuestions from  yourprefix_question where id in (SELECT q.id FROM  yourprefix_question AS q LEFT JOIN _quiz_slots AS qslots ON q.id = qslots.questionid WHERE qslots.questionid IS NULL AND q.qtype = 'random');

*yourprefix is your moodle table's prefix name