We recently moved to new hardware and started using NFS for moodledata in preparation for clustering for failover (the NFS server is in a partition on the database server).
While it does not appear to be significant for end users (no complaints yet), page load is now 20-50% slower and I am also seeing strange performance spikes that I have never seen before, specifically the number of uninterruptable processes on the webserver which can be seen in the graph below (committed RAM also spikes at this time).
On both database and webserver there is also an equivalent increase in forking and threading. During these periods the system is slow for about 5-10 minutes, but does not crash.
1) NFS configuration - What mount options do you recommend?
Our config is
xxx.xxx.xxx.xx:/moodledata /rmoodledata nfs rw,sync,hard,intr 0 0
2) Memcached - Where to install?
We are currently using the file system for sessions. I plan to use memcached but I am uncertain whether to use a standalone server for memcached or to install it on the webserver(s) as described here: https://moodle.org/mod/forum/discuss.php?d=279280
Sessions are my first priority (php is logging errors) but I also intend to use memcached for the application cache to hopefully improve the database performance (work day average 450 queries per second / average 7 slow queries per second and max. concurrent connections reached in last week of uptime 1393).
PHP 5.5, OPcache v7.0.3 - Apache2.4 (mpm_prefork, xsendfile, deflate) - mysql 5.5
Our servers are potentially very powerful (56 cores, 256G RAM, Solid State Disks on the database server (which also houses the NFS server)) and even during these performance spikes we are using a fraction of the available resources (webserver: max 20% CPU and 33% RAM. So little of the database (and NFS) CPU is used that munin can't graph it, max active RAM 21G).
I would appreciate any advice for improving performance and recommendations for failover replication as this is all new to me.
In terms of usage we currently have approximately 25 - 30 000 daily distinct logins (1/10th of total logins) and approximately 700-900 daily quiz attempts but for the bulk of our users Moodle is a file server (over 40 000 daily resource views).
Uptake of Moodle increasing at a formidable rate and by 2016 we will need to serve 50 000 users.
here are some thoughts:
- NFS: on the client side you should start using noatime which could seriously change your performances whilst you could try rsize=8192,wsize=8192,tcp,timeo=14 to see if you'll gain something more. You could also use _netdev to mount the device after bringing up the network, just for clean up;
- NFS: on the server side, is it async required? On the client side you're using sync;. If not strictly required all_squash is better for security reasons;
- Memcache: a slow $CFG->dataroot file system slows down MUC (https://docs.moodle.org/28/en/Caching) so it is required to setup memcached first for the application cache. Then, if you want to improve the sessions too you could use memcached (https://docs.moodle.org/28/en/Session_handling#Memcached and MDL-46552) but using a separate instance (MDL-45724). Where should you install memcached? It depends on your cluster setup, ideally in a dedicated virtual IP (HA) to let Moodle see that address regardless where the service is actually running (see also: https://moodle.org/plugins/view.php?plugin=cachestore_memcachedcluster). In you current setup you should install it in your web server since the network round trip is almost null compared to the one required to talk with the database server. Note: untill MDL-45375 will be solved, do not share these memcached instances with nothing but one Moodle instance.
Give https://docs.moodle.org/28/en/Server_cluster a read ("See also" included e.g. http://www.severalnines.com/blog/clustering-moodle-multiple-servers-high-availability-and-scalability), you should find some other useful hints e.g.:
- $CFG->localcachedir: this is a nice option;
- $CFG->tempdir: this is nice too but there is currently an issue, MDL-44874, which prevents to use it on any setup.
You should also evaluate to plan the upgrade to the latest 2.8 for the latest bits in terms of clustering support e.g. MDL-42071.
I will try out your NFS recommendations today and do more reading on memcached...
Thanks again your guidance is much appreciated
Oh Matteo! - I made your changes to the NFS mount options and the page load time immediately improved! It is now equivalent to that of our previous hardware which housed moodledata on the webserver.
I cannot thank you enough!
Next stop memcached...
I have been monitoring the much improved performance during a heavy load* - using NFS Client mount options: xxx.xxx.xxx.xx:/moodledata /moodledata nfs rw,noatime,sync,rsize=8192,wsize=8192,tcp,timeo=14,intr 0 0
1) Disk IOs: Significant increase (see pic)
2) NFS: Significant increase in read requests on both client and server
3) Page Load time: Reduction of 50-20%
4) Apache processes, Load average and uninterupterble process stabilised
5) Reduction in webserver CPU usage
* For an indication of what currently passes as a heavy load for us here are some of yesterday's usage stats:
Number of Logins: 29 973
Unique logins: 15 250
Resource views: 58 520
Quiz attempt: 1 432
Quiz attempt continue: 8 712
Course views: 113 763
Thanks for your assistance Matteo