First of all, I'd like to describe our scenario:
Dedicated physical servers, running Proxmox
1 VM - Linux (Ubuntu) + MySQL 5.7 - 8 cores, 16GB RAM
Approximately 1.000 Students
The /moodle and /moodledata are stored in a NFS Storage (Netapp) that has a very high performance.
Almost all the time, everything is doing great and the performance is very good, but, we have big problem:
Four times during the course, the users need to make a test and all of them access the moodle simultaneously and than, sometimes, the webserver crashes, although the MySQL server continues working like a charm.
The load average on the webserver reach more than 100 and then, everything became unstable.
What can I do to solve this problem? I had some ideas, but I really don't know what is the best way:
-Creating a tmpfs to store caches
-Clustering some webservers
-Improve even more the VM “hardware”
Thank you guys
While not the only thing to check, it's a place to start .... install mysqltuner.pl on the web server and on the DB server.
Run the tuner from the web server (that takes into account the networking between the web server and the DB server).
Check your web server logs ... do you see any 'server has gone away' in error logs?
With 1000 connecting at one time, mysql will have to allow that number of connections + 1 to the DB server.
I also take it that DB server is configured not to check DNS - skip networking.
Tuner will show 'dropped connections' BTW. And show if one can tweak setting for the DB server such that it will use as much memory as it can ... obviously, the more in memory, less IO/swap.
You probably are in the range of use that might require some sort of web front end.
'spirit of sharing', Ken
My immediate reaction was, if you have a dedicated server, why you add a virtualization layer to it? You'll be better off with the default LAMP on Debian with the usual tuning and caching.
But the above may not be your current bottleneck. The next suspicious thing is the NFS, which serves both(!) /moodle and /moodledata. Why? The main reason to have /moodledata on the network is if you run a cluster of web servers. Search this forum for NFS and you'll see the problems people have. And BTW, if you have file system based session files, experiment with database sessions.
You have a big problem since it is not easy to replicate the crash, you need 100 users taking an on-line exam. You have to simulate that with Jmeter or use some other benchmark. But projecting it back to real use is not very accurate.
But then, that the CPUs of the web server max out, is a good indication. What are those processes, are they really running or waiting for something, I/O for example?
If you ask me for ideas, I would say the culprit is overly complicated installation. My pointers may not help in a big way. I stay with the native Debian LAMP in my first para.
Do you have a proper cache server (Redis or similar)?
I'm sorry, but I've never met NFS that is "high performance". By default Moodle will cache to files on moodledata and that usually causes terrible performance issues if using NFS.
As Visvanath says, I also think that you are trying to make life complicated for yourself. Although there may be reasons that are not clear to us.