Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Kunal Gholap -
Number of replies: 10

We have University application running on single AWS EC2 server (Application and MySQL DB). When university having normal learning sessions we kept server specs c5.4xlarge (16 CPU, 32GB RAM). Usage went about 20-25% which all works fine. 

But last week University scheduled the exams, where 1000 students attend exam at same time on 1st session and 1000 on next session. Considering this load we increased the server specs to c5.9xlarge(32CPU, 72GB RAM). But we faced Applications slowness and unreachability issues when exams starts. When we observe the CPU usage it's just 15-20%  

So I request you all to help me understand the cause of problem and how could I gets its resolution. 

Moodle version- Moodle 3.8.3+ (Build: 20200512) Version 2019111803.01 

php version- 7.2

Mysql Version- 5.7.32-0ubuntu0.18.04.1

Please find php.ini and mysql.cnf configuration during the exams. 

Average of ratings: -
In reply to Kunal Gholap

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi

1000 candidates taking a MC exam at the same time is a heavy application. I don't know Amazon cloud, but wonder whether the solution is as simple as raising the AWS package A to package B.

There's a huge database on this topic as a knowledge base in the form of previous discussion in the Hardware and performance forum. Its compiled documentation is linked to the forum header.
In reply to Kunal Gholap

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
When in the process of running the quiz does the problem occur? Is it in the middle of the quiz when students are responding to questions? Or at the end when they all submit, or at the start when they all try to start their quiz at the same time? Or, is it even before that when everyone tries to log in and navigate to the quiz?

1000 simultaneous students attempting a quiz is perfectly possible (according to MoodleMoot prestanatation I have seen given by Moodle partners) . However, as Visvanath says, it is not something to do casually. It needs to be set up properly - e.g. with well configured caching, and more than one web server (so database on a separate server) etc.
Average of ratings: Useful (1)
In reply to Kunal Gholap

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Alex Rowe -
You can't just raise the specs of the server and do nothing else to the application stack then expect it to work.

Once you have raised the CPU/RAM, you will then need to re-tune your Apache/NGINX/PHP/MySQL/etc to take advantage of all the new memory and CPU.

I can see from your MySQL config, it's not tuned which I'm also guessing is the case for PHP. The max these default configs seem to be able to handle with Moodle is a bit over 100 concurrent users.

There is a lot of information in these forums if you search (and also available online) on how to properly tune MySQL (InnoDB buffers especially) and Apache/PHP (look for PHP-FPM specifically for 1000 users).

You would also want to be looking at PHP Session storage in something like Redis as well as Moodle MUC in Redis also for extra caching.

The other issue would be the constant changing of your AWS instance size, because if you spec if to 72GB ram, then reduce it to 32GB ram, your instance may crash as it's trying to overcommit how much memory is now available.

You may need to re-think your infrastructure in AWS if you want to stay on this path by first separating Web and DB to different servers, then splitting the Web tier into a load balancer with the ability to auto scale the web servers. When you have more users coming in, then scale the front end servers up and reduce it when there is no need.
Average of ratings: Useful (1)
In reply to Alex Rowe

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Kunal Gholap -
Hi Alex,

Really appreciate your detailed response. We had set up an AutoScaling (2 instances behind Load balancer) Database pointing to RDS DB, and having Redis for performance. We have this setup specially for exams. But with this also we faced slowness issues.

As you mentioned, our main bottleneck is tuning of PHP/apache/MySQL with concurrent users counts. We lack that tuning. we had done many increments in MySQL and php.ini configuration parameters but it didn't work. We are still working on it to find that solution. If you or anyone from the forum who is a php/mysql expert refer us to the tuning let's say for 1000 concurrent users load. I highly appreciate it.

Thank You!
In reply to Kunal Gholap

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Alex Rowe -
There is no real magic setting to apply and get it working for 1000 users. Getting to be able to handle 1000 concurrent students in a quiz is quite an ask and normally needs a robust set up with lots of additional monitoring to work out where your issue could be.

It may be PHP, or MySQL, or sessions, or disk, or network, it's hard to know without proper monitoring.

Moodle should be on the web front end servers, running PHP with Opcache enabled. If using Apache, use mpm_event. For PHP, use PHP-FPM. There are a lot of guides online about how to properly configure/tune Apache and PHP-FPM. The basics is to work out how much RAM you have to use then divide by how much RAM a single PHP process uses to give you your max PHP workers number. The Apache max workers isn't as important depending on your set up.

MySQL should be on a separate server, query cache disabled, InnoDB buffer pool configured to include the total size of your DB, also make sure all the buffers don't mean you overcommit on memory. Disks should also be SSD due to the amount of tmp tables. You could use the MySQLTuner perl script or search for how to configure MySQL innodb and tune it properly.

Redis for MUC and PHP sessions is also necessary with separate instances if you can but doesn't have to be.

You could also look at front end caching some web content like theme assets, images, css, js etc to further reduce the load of your servers.
Average of ratings: Useful (1)
In reply to Alex Rowe

Re: Application Slowness and unreachability when 1000 concurrent students attempting the exam || Moodle

by Kunal Gholap -
Noted. Appreciate the suggestion we do take this into consideration for our next exams. 

Also, one more thing I want to discuss is I had checked Apache's error log during the exam, and I found an error message saying "zend_mm_heap corrupted". I search for this issue and found this can be resolved if we increase the Output buffering. So I increase my from 4kb to 1000kb. 

Also at the same time found below error logs. 



What Could be the Resolution for this Child's Pid Segmentation error?

In reply to Kunal Gholap

zend_mm_heap corrupted

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators

P.S. This may or may not be related to your original problem. In case this has to be split, I gave this sub-thread a new Subject.
In reply to Visvanath Ratnaweera

Re: zend_mm_heap corrupted

by Kunal Gholap -
Thanks for the all suggestions I have been receiving on this forum.

Now. This week we are planing the exam. A total of 3000 students will give this exam. 1000 students will be attaining the exams at the same time.

Based on the previous failure and your guidance on this forum we will do an exam on the below architecture.

We will put 5 EC2 servers at starting each server having 2CPU and 8GB RAM with AutoScaling setup which can increment up to 7 if usage went high. This all is behind Application Load Balancer.
For Database, We will be using the AWS RDS Mysql (dB.m5.large) service. max connection limit we set 10000.
We also set up a Redis service for caching and performance.

Now the only one thing we are concerning is about the php.ini config file.
Below are the parameters in php.ini of each server where we make kept.

php.ini
max_execution_time = 1200
max_input_time = 1200
Memory Limit = 4G
Post Max Size = 2G
Upload_max_filesize = 600M
max_file_uploads = 20

Now Team, based on our students count and our architecture setup shall we keep the mentioned php.ini parameters? There is a change needed in parameters? if it is then let us know what will be changed?

Thanks!
In reply to Kunal Gholap

Re: zend_mm_heap corrupted

by Alex Rowe -
The php.ini file really has no direct impact on performance.

These settings really are only saying that a PHP script can run for 20 minutes before it gets killed, the script can use up to 4GB of memory (way too high), a file up to 600MB can be uploaded and the posted data to the server can be 2GB and up to 20 files.

The only thing to configure in the php.ini file for performance is the opcache settings, but they should probably be in a conf.d directory but that's just semantics.

Also, DB connection limits don't impact your performance. For roughly 1000 quiz users, you might only have 200 active database connections. Without setting an upper limit on your database connections, MySQL sets per connection memory buffers so you can easily over commit your RAM and crash or cause worse performance to your server.

Your 5 Application servers with 8GB ram will need to be properly configured with PHP opcache, PHP-FPM and Apache or NGINX (or other web server). This will make sure you can have the max number of PHP processes running to handle incoming requests. The default is 5 for PHP-FPM but a good number may be 20 (for example). Google how to set this number properly based on PHP process size.

I don't know how much you can configure AWS RDS databases, but you need to make sure that you have your DB in memory which is normally controlled by InnoDB buffer settings. There are lots of good guides on how to set these properly.
A 2 core, 8GB server may not be powerful enough but I don't have any information on your DB size.

If you can provide more information on your configs or requirements, we may be able to provide some further assistance.

When you start to get to 1000 or more quiz users, there are other aspects too, what plugins are installed, when do their scheduled tasks run, are there other underlying performance issues etc. This can only be shown with proper monitoring.

Set up really (really) good monitoring before you run your tests and exams to show you exactly where the bottleneck is. It could be shared disk, DB server, caching, app servers, etc.
Average of ratings: Useful (1)
In reply to Alex Rowe

Re: zend_mm_heap corrupted

by Kunal Gholap -
Hi,

We have conducted the exam of 2000 concurrent users this week. And Finally, it went smoothly. Thanks for all your suggestions and support.

Our Architecture was, we used an AutoScaling setup with 5 servers running at the start (each has 2CPU, 8GB RAM). Traffic is equally distributed using Load Balancer. For RDS DB we did one change we migrate our DB from m5.large (2CPU, 8GB) to r5.2xlarge (8CPU. 64GB RAM). Also, we have Redis for cashing and performance.

This is the one solution but at the time it's been costly when we upgrade our RDS DB. So now we are in process of cost-effective Database architecture that can handle 2000-3000 concurrent requests. So for our next exam of such load, we kept everything the same but will look for cost-effective Database architecture.

Any suggestion will be appriciated.