We are experiencing periods of slowness with our Moodle 1.9.15 installation, and identifying the bottleneck is not easy. We have quite a complex setup, as we serve 15,000 students, with nearly 3,000 courses.
First, our setup:
Incoming requests are handled by a BigIP F5 load balancer. We have tried various load balancing algorithms, but in the end we settled on a plain round-robin rotation.
The F5 passes requests to one of our five web servers. These are all Linux VMs, each with 4GB of RAM, running Red Hat Enterprise Server 5.8.
The MySQL database is on a separate Linux server with 8GB of RAM.
File storage is accessed via a NFS Linux server with 8GB of RAM, connecting to an IBM XIV storage system.
When the slowness occurs, running "top" on the web VMs shows I/O wait figures up to 90% of CPU time. Normally we see wait figures 1-10% with occasional spikes to 30-40% if a big request happens. Once a slow period starts (usually between 10am and 2pm) Moodle logins and page loads can take 30 seconds or more.
We are trying to identify the source of the latency. Our Linux systems manager has been investigating, and he says that read requests to the XIV disk system show only 0-2 msec latency, but the file system VM host machine shows considerable read latency.
Is anyone else running Linux VM systems and seeing this sort of problem?
I'm not familiar with Moodle 1.9 too much but there are a few things regarding general FS access times you can look at. I was going to point out that NFS is naturally slow but shouldn't have ridiculous latency like you're describing. If your sysadmin says the latency is fine but the machines themselves are causing issues, you have a few things to look at. Firstly, look at general load on the storage machine. Try to identify if the CPU or disks are causing a problem there. Tools like iotop, hdparm, iostat, and sar are helpful here. You may also want to investigate IRQ striping but I don't think that is something to look at this early.
Once you have some disk benchmarks, you can go from there. Do you know if the disks are SSD or spinning platter, and possibly what links they are using to connect them to the machine? Also, do you know what system scheduler you are using? These seem somewhat arbitrary but can do a lot in the way of disk speeds.
I hope this helps,
I think you have a generic stuff is going slow problem, it's not really a moodle problem as such.
How busy is the machine hosting the 5 webserving VMs? Where are you storing the VMs themselves, and how are you accessing them? On the XIV via fibrechannel or something else?
When you run iostat, (Say, iostat -n 5) which device is the one things are waiting on? Or is the NFS filesystem it is waiting on? 'iostat -n' will display info about that.
I'm not familiar with the XIV, but if it's the NFS things are waiting on, maybe you can run the NFS straight off the XIV. If the XIV can do that.
What is the virtualisation technology you are using?
If it's KVM your VMs may will respond well to putting 'elevator=noop' in their kernel command line, using the virtio devices (instead of emulating scsi disk and e1000s). If your VMs disk are backed by raw disk devices 'cache=none' in KVM commandline will help. Ideally use raw disk devices and not files as your storage for VMs. Add all those together and you will get an improvement, in my experience.
If you are running on VMWare, make sure all the VMWare tools are installed. And then call VMWare.
Stupid question, the machines aren't just in a swap thrash are they? Then it'll be stuck in wait/io for sure
WHat else have you done to monitor the situation? You should at least be running something like 'munin' to see what's going on.
I'm not clear from your description if MySQL is also on a VM. At the risk of stating the obvious, a database server on slow IO can be nasty.
You also don't say how many front-ends you are running but 4GB of ram isn't particularly generous. If Apache isn't well tuned you could easily be going off into Swap - again munin would tell you that at a glance.
- Virtualisation is slow on disk I/O
- NFS is a bad place for moodledata, even worse for session files.
That said, many have reported severe performance problems in Moodle 2.3. Use the
'Advanced search facility" mentioned in the introduction to the those discussions.
And the inconvenient question I always ask: Why virtulasation? Yours seems to be
a major institution. After investing that much on virtualization, you are having
performance problems. Have you compared the performance of four 4 GB + one 8 GB VMs against one state-of-art 24 GB machine. Or two, one 16 GB web erver and one 8 GB database server, connected directly through 1 or 10 GBit/s cross-over? (All Linux of course!)
I keep banging on about this. While I fully understand that VMs provide a bunch of operational advantages, what they do not provide is the best peformance you could possibly extract from the hardware. Exactly, this is often just what you need running Moodle - a big, resource hungry application.
This doesn't necessarily apply to you - but there does seem to be a sheep mentality going on with VMs. Time and time again, I encounter users who'se requirements starts with using a VM rather than ending up there after some thought.
Additionally, high network throughput causes issues due to how the linux kernel and scheduler handle IO interupts on high performance network cards, especially RDI or e1000x based cards.
Having a single large system just means you'll run into obsoleted tech faster. Having a well structured virtualised environment makes upgrades and performance testing and tuning significantly easier. I'll be writing a bit about this in the next few weeks from a moodle.org perspective too.
If anyone has run into scaling or performance issues which cannot be sorted out by a good VM setup, feel free to comment at http://docs.moodle.org/dev/Talk:Cluster_Performance on it. I can also go into more detail on how to get a performant VM for Moodle in this thread if the OP can provide some more information on what type of setup they have and what the actual problems are, but it sounds like generic disk tuning at this point.
Thanks to all for this helpful discussion. Last weekend we moved the disk store to our new SONAS storage and that seems to have fixed the I/O Wait issue. Re VMs, we will increase them to 8GB RAM, but our IT Dept are very strong on the VM solution so we're unlikely to change to physical servers. Sigh...
Thanks again - cheers, Gregor