performance problems solved in 3.1

performance problems solved in 3.1

by Dominique LALOT -
Number of replies: 4

Hello

We were having problems with our infrastructure. At that time we have around 200 users. What we have:

load balancer haproxy 
two backend ubuntu 14.04
one database (mysql 5.5)

NFS server DELL FS8600

All virtualized under VMWARE. No problems with our virtualized servers with windows or linux.

History:

We migrate in july from 2.8 to 3.1 and evrything was fine. But in september, students were coming again and it was awfull..

Sometimes 5 minutes to get a page, load at 200

We checked everything and tests lots of possibilities:

haproxy: transparent or not

CPU/ram: added to backend and SQL..

/var/www/moodle on local disks

We also installed an Ubuntu 16.04  with PHP7, 8Go, 4 vcpu, same problem.

We then spent some time around cache. The sessions were since the beginning in memcached (defined in config.php). We noticed that there is a possibility to add another cache store using memcached. At first we just put just some caches to this store. Things became better, but we lost sessions. So we put two instances and everything was fine. 

We spent around 2 weeks at three persons. When it was panic, that was impossible to say if a script was wrong. A few hours later, you can take the url and it was fast. We were using performance stat in error_log, but nothing trivial appeared.

Conclusion: It seems that if you are in cluster mode with NFS, it's mandatory to use two memcached instance for sessions and cache. We dropped NFS requests from 6000 request/sec to just a little.. No more CPU or load.. and fast pages. What is still very bad is to show user avatar through pluginfile.php (5 sec or more sometimes) 

Hope this helps

Dom

Attachment cpu-day.png
Attachment nfs_client-day.png
Average of ratings: Useful (1)
In reply to Dominique LALOT

Re: performance problems solved in 3.1

by David Monllaó -

Hi Dominique,

Just for reference, this was documented in https://docs.moodle.org/31/en/Caching#Memcached -> Important implementation notes and in https://docs.moodle.org/31/en/Session_handling#Memcached -> Notes

Average of ratings: Useful (1)
In reply to David Monllaó

Re: performance problems solved in 3.1

by Dominique LALOT -
Hi David
We had something which was working, hardware, load balancer in 2.8. We upgraded and it was a nightmare. We were obliged to create a memcache store (an another one as sessions where already in memcache). And of course, in 2 weeks, we red a lot of docs.
In performance page, nothing is written about the necessity to create another cache. If I remember, we juste put userdata and string to that new memcached. Suddenly, everything was OK
Then we put many other caches to that store.
I believe, there should be something written about cluster mode. What we encounter should be due tio NFS and probably in some other situations, you don't notice..
What is funny is that the total amount of data in the memcached store is less than 10Mo compressed.
2 more grafs showing memcached working, iowait stopping. On the others, you can see the load also and NFS I/O dropping from 6000/sec to 200/sec
Attachment cpu-day.png
Attachment memcached_multi_commands-day.png
In reply to Dominique LALOT

Re: performance problems solved in 3.1

by Jeff White -
Personal opinion that NFS should not be used for a clustered environment but if you have to use it, use it as little as possible. I would check out OCFS2 or GFS2. Your system is running faster because its writing less to NFS and frequent requests are being handles directly in RAM (memcached).


From a load testing experience, I would recommend that your session memcached be on a dedicated server (a small VM would be fine). If you have session memcached on the web nodes your response times will be great as the session handling will be local but as soon as those servers are maxing out in performance sessions are going to start dropping. Its one thing for the website to be slow but its a whole new level of angry customer when they are kicked off smile



In reply to Jeff White

Re: performance problems solved in 3.1

by Dominique LALOT -

Hi Jeff,

We put memcached on our load balancer. We have haproxy running and we don't need another VM. We don't use keepalived for the moment having no crash for years.

We also put nginx on that server to serve some static pages. What is still long, is  getting the users avatars. pluginfile.php seems very slow even for less than a one Kb of image... Can be several seconds.

For the good news, trying to find out the problem, we have setup an ubuntu 16.04 and I can see that PHP7 is consuming less CPU (50%!)