I upgraded Moodle from 1.9.12 to 2.3, then migrated the database from MyISAM to InnoDB. It runed fine for 3 days.
I got "Error: Database connection failed" yesterday. Munin shows memory info: after 3 days' rapid memory usage increase, all of my memory and swap are used up. （ ref attachment）
I need you help.
I made max_connections in my.cnf bigger(500) than the number MaxClients in php.ini, and I installed php-apc. After restart the server, memory usage decrease and there is a gently trend. But Cpu usage increase rapidly and used up by some php threads( top) quickly.
Command Top shows:
How are you running PHP?
It doesn't look like you are using apache with mod_php, which would be the most common way of running Moodle.
I think you have a problem not really related to Moodle or MySQL but instead it is the environment you are running it in.
I run " ps -ef|sort -n -k4" and find it is the problem of cron.php which runs every 15 mins.
So I disable the cron, then everything go back to normal.
But how could I deal with the cron?
Did you find the cause of the problem?
At first glance, it is an obvious memory leak. As Simon pointed out, the system software (LAMP) is the first place to check.
But from the way the memory gradually builds up (Munin graph) and because the problem vanishes once you deactivate the cron job, obviously the cron job takes too long to finish or they never end. Look at those very long total CPU times of those processes (column TIME in top)!
Talking of long cron jobs, the Moodle cron job is not meant to be parallelized. If a second cron job starts before the first one is finished, it'll confuse Moodle.
How long does a single cron job take in your machine? Deactivate Moodle cron job frst, leave enough time for those stray php processes to exit or reboot the machine. Then during a quiet hour and a busy hour run cron.php via the command line. You output will be something like this:
# su - OWNER-OF-THE-APACHE-PROCESS
$ /PATH/TO/php /PATH/TO/cron.php
Starting activity modules
Processing module ...
Cron script completed correctly
Execution took N seconds
What is your N?