I have a testing server where I have generated a large course and populated it with 1000 users. I decided I might test the performance by attempting to do things to the gradebook. I decided to populate the gradebook. I downloaded the grades, inserted random numbers for all 100 grade items and 1000 users and re-imported them. The import has been chugging along now for 6 hours. I have no idea how far it has gotten. It appears to be a pretty intense process, but it keeps rolling along and nothing has shown any signs of breaking.
I'm curious, has anyone created any sort of delayed gradebook calculating plugin or something that can prevent the recalculating of grades until after all data has been updated or otherwise limit the impact of grade operations on the rest of the site?
My database and server are on separate systems, I'm using a Percona database, which is a match to our production system, and the webserver is just a standard-issue Apache on Centos, stock configurations since I was mainly trying to see how I could stress the database and (so far unsuccessfully) reproduce a database problem we had recently during final exams.
You are working on a testing environment consisting of two separate servers for the web and database. Are they dedicated servers? If so what are the hardware? Or are they virtual servers? What are their specs? How are they interconnected? Through the LAN? Or, do they have a second separate LAN connection?
You said that "the webserver is just a standard-issue Apache on Centos, stock configurations". From the database we know that it is Percona Server for MySQL. What is the rest?
So you've created an artificial course with 100 gradable activities, enrolled 1000 users, populated the grade book. You exported (course?, grades?) and tried to re-import them. And the job is not finished even after 6 hours? Could you explain your steps in more detail?
Are you monitoring your servers? Have you taken screen-shots of their output?
P.S. Do you think that the problem is related to any of your previous issues, like this one https://moodle.org/mod/forum/discuss.php?d=348510#p1405962 ?
The total time to import the grades was 6.5 hours. But the import was successful.
I'm not that good at sysadmin stuff so I'll do my best to answer your questions. I used Eucalyptus to set up the web server. The web server is a instance with CentOS7, 2 CPUs, 1024 MB memory, 10GB disk with an 8GB volume attached for the backups and dataroot files. It has php7 and The Percona is a cluster of one and it is equivalent to what we run for production for UCSB, but it's not within the production cluster. I think it's mysql 5.6. I thought there is a query_cache_size set to 128MB but the graph doesn't seem to show that. The data has to move across the network, halfway across campus, but the sysadmin put a proxy of some sort in somewhere to try to improve performance (although it seemed to only help a little bit.)
Here are a couple snapshots of the mysql monitoring. I did not monitor the web server. If my math is correct, inserted 100,000 grades. As I think one of the graphs is showing, it starts out doing a huge number of selects, then begins inserts and then gradually the updates increase as the inserts decrease.
I'm going to try a jmeter test to see what happens if 400 people take a quiz with all those grades in there. It wasn't much impact when there weren't any grades.
I mainly thought that this was interesting information.
Your "sysadmin stuff" is fine. In any case, they are modern than mine!
The question we are trying to answer is whether the grader report is not well scalable. Right now, in your set up, it is obviously not. But before making a case for the developers to look at, you have to be sure that a) your infrastructure is not weak b) it is not poorly set up.
About a) Your web server is a VPS with 2 CPUs and 1 GB RAM, which is definitely low-end. It is possible that the performance of the web server is not critical for this case, as the performance graphs show, there was a huge load on the database. So the database part is the bottle neck. You didn't give the resources of the database server. But I get the feeling that your IT team runs it. (Why should it be on the other end of the campus otherwise?)
My estimation is that your the infrastructure is low-end.
About b) You also did not mention how the 8 GB volume for the dataroot is attached. The response of the dataroot is critical for Moodle! Also the fact that the database server goes through the campus network (with or without proxy) is a weak point. The two servers need to be back-to-back on their own GBit segment.
Even on this point I am not too convinced.
I don't claim that Moodle handles big grader reports efficiently. There were some discussions in this forum on similar topics:
- Moodle grader report editing extremely slow
- New Moodle Slow saving grade changes
If you want to follow that argument, you need to collect data on the database queries the operation generated. (The DBMS have special tools for that.) And file a bug report.
It's interesting you say the database and webservers need to be back-to-back on their own whatever. That is how we used to have it set up. Both were on the same hypervisor, whatever that is. But we changed this recently and noticed a drop in performance. We changed over to a database cluster using Percona and at that time we separated the two systems. We also did not change any of the defaults to the Percona configurations, since advice out there is to accept the defaults. My tests were intended to see if we did need to change some of the defaults.
I'm sure my set-up here is pretty low end. My test was simply to try to put some load on the database and see if any tweaking to the configuration would have an effect. But there was no effect with any of the config changes. Your comments lead me to believe that the sysadmins need to look elsewhere.
My other goal was to "break" the database, which I was not able to do even with an upload of 100,000 grade items (this was a moodle-generated course so I don't think the gradebook is misconfigured.)
Thank you for your help. I'll point the systems guys to this thread and maybe it will help them.
Yours seems to be a big institution with its own sophisticated IT infrastructure. Yes, it is sensible for the "systems guys" to get acquainted with Moodle!
Re-reading our discussion, it would have been more efficient, if you told us at the beginning what you want to achieve rather than what you are doing. The original clearly said you want to _test the performance_ [of your test setup] by importing a huge gradebook and came across prohibitive execution times. The cause is identified as the underpowered (and over complicated?) test rig. Obviously just "tweaking" is not going to help.
Your other goal, 'to "break" the database', in the previous post is not clear to me. Do you mean divide the work between two database servers or to crash the database server? The former is not to be achieved by easy means. The latter, your database administrators should know.
Sorry I was on vacation.
As far as "breaking" the database, we had a real-life situation where a large lecture hall class was taking the final exam and the database locked up when several of the students reached the time limit. There was row-locking on the database that resulted in numerous students receiving database errors during their quiz.
I attempted to set up a test that might repeat this, but it did not. I thought perhaps if the gradebook had to recalculate a lot of grades at the moment the quiz closed that might trigger a similar row-locking problem but it did not. (This is the reason I was trying to upload all those grades that took 6 hours to upload.) Other than that, we were hoping that there was some way to trigger some kind of database problem, but I wasn't able to get anywhere near the limits of the system using jmeter. We simply wanted to see the database reach its limits to see if any settings changes would indicate that we could avoid the problem we had with the quiz again. In the end, nothing I could get jmeter to do would cause enough stress and none of the settings changes the systems guys tried improved anything.
In the end, I think our performance problems aren't going to be figured out by jmeter and a test system so much as our old tried-and-true incremental changes out on production. (i.e. we make a small change on production and see if there's an improvement, then make another small change.)