Have just moved my moodle 1.5 into the production environment and am having huge issues with CPU usage maxing out with only 15+ logins.
Dell PowerEdge 1600SC Rack Mount
- Inetl Xeon 3.2GHz 1Mb Cache
- 2Gb RAM
- 73Gb Ultra 360 HD in RAID 1 config
I am new to server admin and am unsure as to where I should start looking.
1. Server runs fine with very small load. Last friday had 30 teachers log in and server CPU went to 98% usage. Most got the error:
"Error: Database connection failed.
It is possible that the database is overloaded or otherwise not running properly."
I am assuming this is due to my server CPU maxing out.
2. Used top in unix and am having issues identifying where my CPU load is being generated from . The httpd requests were running at 5% and the other 90% of CPU didn't seem to be listed as being used by any other process.
Langcahe is on
Langmenu is off
dbsessions are off
Any suggestions in how to properly diagnose this are more than welcomed
1. I think CPU is one problem, the other problem is memory capacity. Moodle developers are discussing about using application/session variables to reduce memory load (see more at http://moodle.org/mod/wiki/view.php?id=2935&page=view/Application%2FSession+Variables)
3. Your problem already dicussed at: http://moodle.org/mod/forum/discuss.php?d=25179. I think they found the solution to this problem.
That's all that I can share with you. Good luck!
I would have thought that kind of processor and memory should easily handle 100 logins. I have looked into persistant logins with no louck. i am looking into other suggested solutions and will keep all informed.
Am looking for what I can look at to help diagnoes the issue as i am new to this backend admin side of things
1) "Error: Database connection failed.
This is due to a db connexion persistence ... see ->
2) 95% ... Have you activate a PHP caching like turck_mmcache ? This will help your processor and performance.
The langcache depends on having cron configured correctly. Is the cron running ok?
Bear in mind that the homepage and course page are very heavy on the server -- if you are asking users to login to test-drive Moodle, ask them to use a variety of pages and facilities for a better picture of normal usage. Needless to say, "logs" pages are also heavy ;)
You mention that top is showing high usage, but not showing user processes with high usage. Do you have a high percentage of "wait" or "sys" states? This would be indication of slow IO, which can mean a bad o outdated SCSI/IDE/SATA driver, a slow filesystem type, or even something like... a slow disk. Have you tried testing the disk with hdparm or bonnie?
I'd recommend you get sysstat installed, because following top is quite hard. Sysstat captures the stats every 5' and then you can run sar -A and see a million metrics taken over time.
turkmmcache is required for high concurrency, as is tuning your database correctly. You'll want to have persistence ON but before you do that, you'll want to use the forum search to find my posts about having maxclients tuned correctly. If you turn persistence ON without first tuning Apache and the DB correctly, it'll be far worse.
Edit: if people are using chat, you'll want to configure the chat daemon.
Matrin, as always your a brilliant resource. Thanks again.
My issues seem to rise on many fronts. And the forums here, while getiing larger and larger to sift through, are proving just as usefull.
I have found my issues included but are not limited to:
- not pairing max_clients and max_connections in apache and mySQL
- not properly checking/repairing tables regularly
- "heavy" blocks on the homepage (ie calendar or tree)
Good news is that by looking into these issues it seems I may have repaired my troubles. Not sure until it is properly load tested (any Open Source options here), but so far am showing far better stats.
You can see it now and tell me how fast it is for you at:
I will try and document all these into a FAQ of sorts for other troubleshooters.
I can assure you that your problem is *not* hardware--your CPU and memory are great. It is somewhere in software settings and tuning. Look below and see the results of a very similar server installation which achieved 60 simultaneous users doing very heavy audio quizzes. Sorry I can't help you with the tuning, we had a server specialist set up the LAMP+PHP accelerator.
Our school did an English placement test (Moodle Quiz Module, ver 1.4.4 with 30 audio questions) this past April 2005 for all 1100 entering freshmen to our university. If you check an earlier report, you will notice in April 2004, I used a single slow Mac as a server--and yes, another thread shows us OS X and MySQL don't love each other so well--no problem with Linux and MySQL. This is what we did to make our test a complete success.
- switched to cheap Linux servers (see specs below)
- divided the students into 5 sessions of up to 250 students each
- used four servers (one server per 55 students)
- Dell small business server--no OS, no monitor/keyboard
- Dual Xeon 2.8ghz CPU
- 2GB Memory
- 80GB hard drive
- Internal LAN--servers in another building on campus
- Fedora Core 1, Apache 2.0.50, PHP 4.3.1, MySQL 4.1.10a
- Ioncube PHP accelerator for PHP 4.3.0
I can understand it if you are not root, the SELinux patches are applied, you're on Solaris, on AIX, or perhaps you are in a rented "virtual server". If you're running a vanilla 2.4 or 2.6 kernel, I'm tempted to thing you have a problem somewhere. I'd say probably sar (sysstat) and iostats can help you there.
BTW: get rid of all the crud you don't need
this issue is starting to drive me insane.
Followed the old maxim last night of "If at first you dont succeed, kill it and start again"
Having followed this to the letter be formatting and reinstalling from scratch, I thought my issues would have been solved. BUT NO!!
Instead, rather than maxing out my server just freezes...stopps dead in its tracks. It does this with userload and without. I have used the ab test with the following results but don't really know how to read these reports. So I am posting them here to see if any of you guru's have an ideas.
The test was with 100 connections and 500 concurrent users on the url montenet.monte.nsw.edu.au/index.php (my Moodle page)
Included in this screenshot is top running on the right and mytop running on the bottom.
Give it -n 1000 -c 10 for a more reasonable test. It'll hit the same URL 1000 times, with a concurrency of 10. Having concurrency < number of hits doesn't make sense at all. And having concurrency over 50 doesn't make sense either ;)
You have to consider that ab takes concurrency very literally at the connection level. Even with thousands of users, we rarely see more than perhaps 20 really concurrent connections.
In the end it was my Linux install. We used Mandrake 10.1 for our OS and guess what? It was released with a dodgy kernel. As the kernel itself is not monitored by top this is why the use of CPU could not be tracked to a process.
Having finally discovered this fact we installed the newly patched Mandrake 10.2, and after following many of the suggestions on this forum, now have our server happily serving pages to all our users over the last week at an average page load time of 0.23 seconds.
Thanks to all who posted in this forum. Your help was invaluable.
So you know, I had been watching your situation with interest, since I might be talking about scalability in the near future (I hope I will be). I find the scenarios and their resolutions posted here are as informative as the technical underpinnings.
Your posting is appreciated.