Moodle Cluster Apache crashes

Moodle Cluster Apache crashes

by Thomas Haines -
Number of replies: 8
I've been banging my head against the wall trying to figure this out. We run a moodle cluster that includes a load balancer, 3 apache / php nodes (although we've added two more nodes to try and fix this) , and 2 replicated MySQL servers. All are running CentOS 5.2 on brand new dell hardware. We host about 12 Moodle instances that range from very minimal usage to one that handles about 3,000 logins/day. Our moodledata is currently on an NFS share, but we originally had each node connected to a SAN sharing a LUN with Oracle Clustered File System 2. When the apache nodes began locking up we thought it may be OCFS2 so we shut that down and moved our moodledata to an NFS share.

We've been running this for months now without an issue but when our heavy users began nearing the end of the semester, and load increased drastically, our Apache nodes began freezing up. We can't even SSH into them. We thought at first that we underestimated the load that the Dell R200 could handle, so we added another R200 and a 1950 (all specs below). This helped relieve the problem, but the crashes continued to occur. There is no way that the 3 R200s can't handle the load. Not to mention it was the 1950 that began locking up.

Some stuff I've tinkered with from the apache config file:

Timeout 20
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5

StartServers 16
MinSpareServers 16
MaxSpareServers 64
ServerLimit 512
MaxClients 512
MaxRequestsPerChild 0


Specs:

Apache/PHP nodes x3:
Dell R200
2Ghz Dual Core
4GB Ram
Centos 5.2

MySQL
Dell 2950
Dual Core 1.6GHZ
16GB RAM
Storage on ISCSI SAN

Any help would be greatly apprecieated. I'm sure there's plenty of relevent info I am forgetting, so please ask.

Tom Haines
Average of ratings: -
In reply to Thomas Haines

Re: Moodle Cluster Apache crashes

by Elvedin Trnjanin -

If it's locked up, it looks like your web servers are running out of memory and going to swap, unless you don't have swap space allocated, although you should.

I would set the MaxRequestsPerChild to some large value less than 10000 because of memory leaks. Killing of Apache child processes will reclaim that temporarily. Also, your MaxSpareServers and MaxClients is a bit high. If you had 64 Apache children and 512 clients per child, that means you could serve 32768 clients. You'd run out of memory before you even got to a fourth of that.

It's also safe to decrease the KeepAlive Timeout to a handful of seconds. It'll help you not DoS yourself if you don't decrease your MaxClients, ServerLimit, and MaxSpareServers.

In reply to Elvedin Trnjanin

Re: Moodle Cluster Apache crashes

by Thomas Haines -
Swap space is allocated

Thanks. I am recalculating those apache settings, but MaxClients was at the default 256 when this began, and I was receiving "Too Many Connections" errors.

Here is my updated httpd.conf

Timeout 5

KeepAlive On

MaxKeepAliveRequests 100

KeepAliveTimeout 5


StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 54
MaxClients 54
MaxRequestsPerChild 4000



And what's more:

[root@mc-node-1 moodlelogs]# ps -ylC httpd --sort:rss
S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD
S 0 13861 1 0 78 0 9056 62871 - ? 00:00:00 httpd
S 48 13880 13861 2 75 0 34448 84384 semtim ? 00:00:05 httpd
S 48 13865 13861 1 76 0 34668 83919 - ? 00:00:04 httpd
S 48 13876 13861 3 76 0 34752 83869 - ? 00:00:08 httpd
S 48 13869 13861 3 75 0 35464 84384 semtim ? 00:00:09 httpd
S 48 13864 13861 2 76 0 35644 84121 semtim ? 00:00:06 httpd
S 48 13866 13861 3 75 0 35928 84183 - ? 00:00:09 httpd
S 48 13867 13861 2 75 0 36344 84637 - ? 00:00:08 httpd
S 48 13874 13861 1 75 0 36432 84377 - ? 00:00:04 httpd
S 48 13868 13861 2 75 0 36948 84443 semtim ? 00:00:07 httpd
S 48 13884 13861 3 75 0 38992 84662 - ? 00:00:08 httpd
S 48 13875 13861 1 75 0 41412 85198 - ? 00:00:05 httpd
S 48 13870 13861 3 75 0 42820 85456 - ? 00:00:08 httpd
S 48 13882 13861 2 75 0 47720 87457 198427 ? 00:00:06 httpd
S 48 13863 13861 3 75 0 49540 87453 semtim ? 00:00:08 httpd
S 48 13883 13861 3 75 0 55032 88733 semtim ? 00:00:08 httpd
S 48 13881 13861 2 75 0 73984 93631 semtim ? 00:00:05 httpd
R 48 13879 13861 5 76 0 135040 109018 - ? 00:00:13 httpd


My largest Apache child is 135MB. Doesn't that seem like it's way too large?
In reply to Thomas Haines

Re: Moodle Cluster Apache crashes

by Martín Langhoff -

Search this forum on how to calculate maxclients properly. And you'll want to run sysstat to get forensic data smile

My largest Apache child is 135MB. Doesn't that seem like it's way too large?

Yes. Are you running any custom code? If you enable performance stats logging, you can then grep the output (which should go to errorlog) to find out what script (or part of moodle) is allocating all that memory.

A likely suspect could be moodle's cron. Are you running it via apache? (hint: that's a big no-no on large sites).

In reply to Martín Langhoff

Re: Moodle Cluster Apache crashes

by Thomas Haines -
A likely suspect could be moodle's cron. Are you running it via apache? (hint: that's a big no-no on large sites).

No, moodle cron is handled through php cli, and on a seperate management node.

Are you referring to performance stats logging in Moodle or Apache?

I've reconfigured a bit, and the largest process is running at 75MB which is still way too large.

Also, what is the recommended MPM for Moodle? I am using prefork as default, but worker is looking more enticing as it appears to be able to handle more with less of a memory footprint.
In reply to Thomas Haines

Re: Moodle Cluster Apache crashes

by Martín Langhoff -
Good to hear it's not cron!

I mean perf stats from moodle - set it in config.php - have a look at the sample values proposed in config-dist.php - if you google around, you'll also find a perl or ruby script I've written to summarise the output. Run that on cron daily (and email it to sysadmins) for a good heartbeat of what are the hot areas (bottlenecks) of your moodle install.

As for MPM, use prefork. In theory, the core PHP code is thread-safe (writing C code for threaded execution involves some tricky coding practices, google for "reentrant code" for the gory details). However, noone is using it threaded so there will probably be bugs lurking there, and the PHP extensions we use are explicitly not thread-safe either (GD and precompilers, for example). So you'd have to run a PHP with no extensions (I don't even know if the mysql/postgres extensions are thread-safe) and run the risk of being the pioneer that hits the thread-safety bugs. And thread-safety bugs are notably hard to diagnose and resolve mixed

The memory stats that top shows for your apache processes are also somewhat misleading -- for example, if you use apc, mmcache or any other precompiler with shared mem tricks, the shared mem is also being listed there. It's very hard to tell what's shared and what's not with precision. If you are using a recent linux kernel, use ps_mem.py to get a usable number -- it reads the smap info and summarises it for you.
In reply to Martín Langhoff

Re: Moodle Cluster Apache crashes

by Thomas Haines -
Prefork it is. Thanks a lot.

Here's a dumb question: These servers are running 64-bit Linux / PHP. That would probably account for the extra RAM usage on those boxes compared to my 32-bit Moodle Installs, would it not?

As far as a caching solution, I have eAccelerator on my list, and I'm also considering a Squid box in front of the Load Balancer. But, I agree that caching will help significantly.
In reply to Thomas Haines

Re: Moodle Cluster Apache crashes

by Thomas Haines -
I installed eaccelerator on each of the nodes and....WOW. I am very impressed with the performance gains.
In reply to Thomas Haines

Re: Moodle Cluster Apache crashes

by Dan Poltawski -
You don't mention using a php accelerator (e.g. eaccelerator/apc) - these can buy you *massive* performance improvements with moodle. So if you are not using one i'd highly recommend it.

Dan