performance: normal load + quiz + grading assignments

performance: normal load + quiz + grading assignments

Justin Haaga -
Колькасьць адказаў: 38
Been running moodle in production for just over 2 months. Just upgraded to
moodle 1.9.3 (20081029) 2 weeks ago, before that was on prior 1.9.3 version.

School-
300 students, 60 courses
concurrent - aprx 30-50

Config is as follow:
apache2, php5, xcache (webserver/moodledata) - Xeon 2ghz quad core, 8GB mem
mysql 5.0, win 2003 server (db server) - Dual Xeon 2.33ghz, 4GB

Performance has been solid since install over 2 months ago, recently (shortly after upgrade to 20081029) we have had a problem with high cpu utlization on the apache2 service. It seems to be related to when multiple quizes are generated, taken and graded ontop of regular traffic.

My question is this:
What is to be expected in terms of performance with the following:
1) Normal concurrent sessions (30)
2) 1 quiz to a group of 20 students
3) Teacher grading assisgnments

Would one expect the resources to max out in my configuration? Memory is fine, disk is fine, mysql server resources fine, it's just the apache2 service that peaks out the CPU which then causes exterme slowness rendering pages.

both servers have been tuned per moodle documentation.



Сярэдняе рэйтынгаў: -
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Here's some more information, this performance check was ran on the dev moodle box what has inferior hardware:

Moodle System Benchmarks
Note for Moodle 1.7 and above:
Disable the record cache (Site Admin -> Server -> Performance) for realistic results!
Processor performance
Function calls 909000 966000
Regular expression replaces over 1KB of text 13300 12800
Disk performance
16KB files read from disk (cache) 10500 6500
16KB files written to disk (cache) 3600 1700
Database performance
Get_record calls on the course table 500 1300
Insert_record calls on the course table 80 80
Update_record calls on the course table 70 100
Maximum concurrent users (approx): 52

Even these numbers are pretty adequate for my needs, however, there's been two instances where the apache2 service has been able to max out all 4 core processors. How could this be possible with 20-30 concurrent connection?

Is the quiz module that CPU intense?
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
more info, these results are from the production box:
Moodle System Benchmarks
Note for Moodle 1.7 and above:
Disable the record cache (Site Admin -> Server -> Performance) for realistic results!
Processor performance
Function calls 1422000 1495000
Regular expression replaces over 1KB of text 13000 13000
Disk performance
16KB files read from disk (cache) 13900 13900
16KB files written to disk (cache) 4500 4500
Database performance
Get_record calls on the course table 970 1030
Insert_record calls on the course table 330 330
Update_record calls on the course table 80 80
Maximum concurrent users (approx): 77

Again, this is a dual server config, webserver is a separate box and mysql a seperate box. So any ideas why the apache2 service would be maxing out?

Also, anyone know a way to gather active concurrent connections?
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Sam Black -
I'm experiencing similar problems with my service.

Every time I give a quiz, I get all sorts of Internal Server Errors and Timeout Request Errors. My hosting provider has told me that I'm maxing out on my CPU cycle usage every time I give a quiz.

My guess is that there is something pretty intensive that was done with quizzes when they went from 1.8 to 1.9 as I never had the problems before I made the upgrade to 1.9 last fall.

I'm hoping for things to be a little cleaner in 2.0. And, for our dev folks, I'm really not meaning that disrespectfully. If I had some free time, I'd be more than happy to poke my head around inside and lend a hand myself.
У адказ на Sam Black

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
That is an interesting observation. I recall any major changes in the inner workings of the quiz in 1.9. At least not anything that would have a major impact on performance, just bug fixes.

Oh, except that one of the changes was MDL-15494, which made the quiz check to see if an error occurred, rather than just assuming that every call to update the database succeeded and carrying on blindly. That might be revealing a problem that was previously happening, but not noticed.

Any time you can spend poking around, and trying to work out what is actually happening would be greatly appreciated. This is very hard to reproduce in a development environment.
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Sam Black -
Well, it's not ALL quizzes. Just the ones I put a password on and have delivered in a secure window -- the ones I count as quiz and exam grades, instead of just review and practice. I've been using my Moodle to support my classes for close to 5 years now and am very VERY pleased. It's only been this school year, after I upgraded to 1.9 over the summer, that problems started creeping up. And, more specifically, it's been since Thanksgiving that it's gotten almost out of control. The worst meltdown was for our semester exams. And you should have been in the room. I'm surprised there wasn't some sort of news story with all the flailing of arms and gnashing of teeth that went on those two days.

I'm almost at the point of deciding to back up everything I can over spring break and then wipe everything clean and start anew, just to see if there's something stuck in its craw. But from what my webhost's tech support folks are telling me, it's those "secure" quizzes that are sending my site's CPU usage off the charts.

As for poking around, PHP is a language I'm just starting to play around with. I'm actually a high school computer programming teacher with experience in Java, C++, Pascal, FORTRAN, among other languages. What's one more, right? cool
У адказ на Sam Black

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
What is very difficult to know is whether the significant thing is the change 1.8 -> 1.9, or just that your database is getting bigger with time, which will gradually place more and more strain on the server.

I don't really understand why secure mode/password would make such a significant difference.

If you know real programming languages (particularly ones with similar syntax like Java and C++) then PHP is easy to pick up, it is just horrible to work with in comparison падморгваньне

The quiz forum is probably the best place to post if you want help following the code. There is some docs about it in the Moodle docs wiki, although of varying degrees of out-of-date-ness.
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Eric Hagley -
Also check if your school has software checking for web site security. My school put in some new security that checked websites for malicious content and the quiz secure mode sent the server in a spin as soon as that software was introduced. Prior to it and after the test of it had finished - no problems. I hope they don't start using it again!
У адказ на Eric Hagley

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Here's some more information:

The database is not the bottleneck. The bottleneck is the apache2 threads maxxing out the CPU which then causes all request to get backed up until these threads are complete.

From what I can tell, it's when the quizzes get submitted and are graded is the actual CPU intense part. It depends on how long the quiz is, how many questions that are multiple choice ect, and when the students submit the quiz. If students finish at the same time this can be a problem.

Is this normal behaviour? And if so, how would we go about running the quiz module on a seperate server?

I've upped the apache started threads to help absorb the load, and while my benchmark values go up, it does not really help with the quiz module.

<IfModule mpm_prefork_module>
ServerLimit 350
StartServers 180
MinSpareServers 180
MaxSpareServers 190
MaxClients 300
MaxRequestsPerChild 2000
</IfModule>

У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
What you say matches other things I have heard about Moodle, and more specifically quiz, performance.

So, it is 'normal', even though it is undesirable, but there are no easy solutions.

Tuning your system as much as possible will help. Running a quiz on a separate server: not so easy.

A very low-tech, but effective answer, is to try to make sure that all your students don't start a timed quiz at exactly the same time, so they don't all finish at the same time. I remember a post in the quiz forum some time ago, from someone who was running proctored English tests for 300 students at a time, and they had a proctor walking round the exam hall getting people to start one after the other with a small gap between each.

One day, we will need to get to the bottom of this problem. But as I say, it is not easy.
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Thanks Tim. So this is normal? I am surprised more people do not complain of this as in our situation, it makes our moodle site unavailable until the apache processes can free up on resources.

It can be unpredictable because not all quizzes seem to generate the same type of result, it may of course factor on when the students take the quiz and when they submit - if of course, they submit all at the same time then this is no good.

Another note I should mention: I am virtualizing the linux server with vmware esx 3.5. It's shared with 1 other Windows sever which has low resource needs. I don't see this being a problem because I do have dedicated resource pools assigned and I can see on the vmware console that CPU resources are maxxing out which correlate with what I see on TOP. I've spent a great deal of time tweaking the server from the apache end to the mysql end, so I don't think there is much more I can do there.

I think at this point my plan is to setup another virtulize server and setup a seperate resource pool for it. Unforutantly, we don't have a SAN, but I can share off the volume of moodledata for this new server to plugin to, as well as the database, that's on a seperate physical server anyway. We will still have the problem of the quiz module becoming unavailable due to saturated resources, however the entire moodle site in general will be available since I now will be able to control how much resources they can take.

I did see your thread on the other forum. Hopefully we can make some imporvements in the year.
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Martin Dougiamas -
Выява Core developers Выява Documentation writers Выява Moodle HQ Выява Particularly helpful Moodlers Выява Plugin developers Выява Testers
Apart from different database formats ... you could try and make sure the quizzes are split up into less questions per page ... this will spread out the database accesses into more but smaller chunks.

And obviously you could get more CPU, more RAM, faster disks ... высунуты язык
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Here's an update, I think we have discovered the problem. The cause is still unknown. Perhaps you can shed some light Tim.

Problem:
On occasion the quiz module will cause a lock on the database which will then cause all the apache threads to keep rolling until that lock is released, this is what causes the high CPU resources and essentially they are thrown for a loop. Here is what we see on the mysql side (see image)

mysqlpt2.th.jpgmysqlpt2.th.jpg
Any thoughts on this?
Укладаньне mysql.jpg
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
Sorry, I don't know much about MySQL inner workings. Too a first approximation, the quiz issues a bunch of simple SQL queries throughout processing, just like other parts of Moodle. So I don't know why this might cause MySQL to choke, but if have time to investigate this at all, and perhaps find what is going on, that would be great. Do you have slow query logging turned on, or anything? It would be interesting to find out if it is one particular query that hangs, or if it looks more random than that.
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Taylor Judd -
We ran into a similar problem with mysql and quiz issue back with 1.5. It may or may not be the same problem. Basically whenever there was heavy use of a quiz table it, sections of it would become corrupt. This corruption made mysql hang and spike CPU usage as it searched for records without indexs(the source of corruption). We would run myisamchk(http://dev.mysql.com/doc/refman/5.0/en/myisamchk.html) on the table, repair it, and it would be fine for a few days/weeks. Then we'd get another spike in usage and the indexes would become corrupt again causing further problems. It is at this point we decided to switch our engine to INNODB. While INNODB has a higher overhead due to its transactions based nature, the benefit from ACID compliance and lack of corruption more than make up for this. Our database which is now 25 gigs, ~5 related to quiz, has been humming along with less than 20% CPU usage even during peak times.

My suggestion is to first check for corruption if you haven't already. If you find it corrupt, repair it. If it continues to become corrupt consider a switch to INNODB. For larger installs and loads, heavy on update/insert, it is much more stable.

Let me know if this helps.
У адказ на Taylor Judd

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
Postgres might be an even better option than INNODB. Anyway, very useful to know that INNODB makes a difference.

By the way, if anyone wants to play dangerous games: Hacking_the_Moodle_2.0_database_transfer_script_to_convert_a_Moodle_1.9_site. That is very much 'use at your own risk'.
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Martín Langhoff -
That makes a bit more sense -- mysql locking mechanism on isam/myisam is very problematic for "multiple updater" scenarios.

It'd be interested to hear more about what insert/update is causing things to bottleneck. With that info... perhaps we can come up with a few alternatives...

As Tim said, Pg can be much better when you have high-resource-contention situations. In those cases, everybody suffers (some more gracefully, like Pg and InnoDB). so it is a good idea to try and disentangle the knot anyway...
У адказ на Martín Langhoff

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
When you submit a quiz, three tables have to be updated (in addition to log and user_lastaccess).

mdl_question_states - one row added for each question in the quiz. Regrettably, the row is first added, then later updated, in most cases.

mdl_question_sessions - there is one row here for each question that has to be updated.

mdl_quiz_attempts - there is one row here for the whole attempt. I think to code is organised, so that it only gets updated once for the whole submission.

Now the rows in these tables for each student are separate, so Postgres or InnoDB's row level locking will work nicely, while myisam will have the students contending with each other horribly. We don't use transactions at all, which probably helps myisam, but is probably not really a good idea, since people tend to care quite a lot about the integrity of their quiz data.

I think the writes to _states and _sessions are interleaved.

I would be interested to see the sequence of writes that a quiz submissions does. I have a fairly good idea from knowing the code, but I have never actually bothered to look in the database logs to see it (blush).

The current code is pretty delicate. That is, it works reliably only because I have fixed lots of bugs as dedicated users found and reported them, every since the last major quiz refactor in Moodle 1.6. (And fixed the regressions I introduced while doing so падморгваньне)


To make things significantly better, I think we need a major refactoring of this code, which I have been starting to plan in detail, but with no specific time-scale for implementation in mind: Development:Changing_the_Moodle_Question_Engine
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Martín Langhoff -
Reading the DB logs has surprised me a few times падморгваньне

I agree with your suspicions... the INSERT/UPDATE sequence on mdl_question_states is the most suspect operation... isam/myisam should hold reasonably well under 'INSERT storms' as long as there aren't many indexes on the table. IIRC it's UPDATEs that hit badly...
У адказ на Martín Langhoff

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Hi Guys few updates:

mysql is already using innodb with binlog enabled. We suspected there might be a bug on the mysql 5.0 windows version that we were running so we move the database over to the webserver (debian) running mysql 5.0.32. Same problem happened.

We are analyzing the apache and mysql logs so we'll get back with more info if we find anything.

What is weird is after it hangs for 10 mins or so it then is fine, those 40 students can jump back in and retake the quiz like there is no problems.

On the linux machine saw the mysql service with relatively low CPU usage (20%) but apache processes were going crazy, I took 1 screen shot but on avg you would see 20-40 apache threads open all maxing the cpu resource.

So is this really normal or not? I can't imagine that it is because if it was the quiz module is really useless for any quiz over 20-30 people. When this occurs it brings the whole entire site down because there is no processor available. I would imagine we should see alot more posts on the forum with complaints if this was the case?

Also, I should note that we can apache bench and seige 150-200 concurrent connections with no problems - so it's for sure something with the quiz module specific and not any optimization items.


We did find this in the logs:

[Thu Feb 26 08:59:01 2009] [error] [client 10.225.5.94] ADODB Error: Lost connection to MySQL server at 'reading authorization packet', system error: 0
[Thu Feb 26 08:59:15 2009] [error] [client 10.225.6.8] ADODB Error: Lost connection to MySQL server at 'reading authorization packet', system error: 104, referer: https://lms.neumont.edu/
[Thu Feb 26 08:59:16 2009] [error] [client 10.225.5.196] ADODB Error: Lost connection to MySQL server at 'reading authorization packet', system error: 104, referer: https://lms.neumont.edu/login/index.php
[Thu Feb 26 08:59:53 2009] [error] [client 10.10.7.20] ADODB Error: Lost connection to MySQL server at 'reading authorization packet', system error: 104


After a quick google search I found this MySQL bug report which appears to be exactly the problem we are having:
http://bugs.mysql.com/bug.php?id=28359

The solution is to increase the "connect_timeout" value of mysql so that the server will wait longer before dropping the connection. This makes sense, if apache suddenly gets lots of hits, it is going to take longer for each apache thread to respond to the mysql server. So mysql drops the connection, but each apache thread continues to wait.

We are going to try connect_timeout = 15

Also by default in the config.php the $CFG->dbpersist = false; Would it help in anyway to try to turn this on?
Укладаньне top.jpg
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
persistent connections and setting the connect_timeout value did not help. Problem still persists.

40 students taking a quiz with 5-10 questions (mostly multiple choice).

Is this really normal?!
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Taylor Judd -
It may be related or unrelated. We just had an issue with quiz where one course of 100 students was bringing down our site of 30,000. The issue was tracked to a bug with Quiz and the Gradebook specifically around calculations.

There is a moodle bug and I can't find it right now where if you have a custom calculation in the gradebook the table mdl_grade_grades gets populated with 1000s of rows of superfluous data. When a quiz then runs it tries to join this grade_grades table with others and causes a large spike in load thus causing both quiz and the site to suffer.

First check and see if you have any calculations on that quiz item in your gradbook. If you do the following may be helpful:

We're working on a larger fix. But in the meantime we isolated a specific problem query:
SELECT DISTINCT go.userid FROM grade_grades go JOIN grade_items gi ON gi.id = go.itemid LEFT OUTER JOIN grade_grades g ON (g.userid = go.userid AND g.itemid = 4965) WHERE gi.id <> 4965 AND g.id IS NULL;

We changed one of the joins to get a significant performance bump:
SELECT DISTINCT go.userid FROM grade_grades go STRAIGHT_JOIN grade_items gi ON gi.id = go.itemid LEFT OUTER JOIN grade_grades g ON (g.userid = go.userid AND g.itemid = 4965) WHERE gi.id <> 4965 AND g.id IS NULL;

Check your mysql slow log and mdl_grade_grades table to see if you are suffering from the same problem. We're looking at fixing the core issue of the superfluous data. If you want to test the mod you can find the query in \lib\grade\grade_item.php

Cheers.
У адказ на Taylor Judd

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
hmm that is interesting... we are doing some calculations in the gradebook. I'm searching around for that bug report but can't find it, if its not too much trouble could you locate it for me?

Here's what I see in the query_log (anything over 3 secs)

# User@Host: moodle[moodle] @ localhost []
# Query_time: 11 Lock_time: 0 Rows_sent: 15 Rows_examined: 15
SELECT * FROM mdl_grade_grades WHERE itemid=1977 AND userid IN (425,62,408,394,410,404,419,68,69,397,27,395,393,413,418);
# Time: 090227 7:57:05

# User@Host: moodle[moodle] @ localhost []
# Query_time: 5 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
SELECT * FROM mdl_grade_grades WHERE itemid=2402 AND userid IN (451);
# Time: 090227 7:59:26

# User@Host: moodle[moodle] @ localhost []
# Query_time: 18 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
SELECT * FROM mdl_grade_grades WHERE itemid=2101 AND userid IN (451);
# Time: 090227 8:02:00


from 8:02 and on the query_times just start going crazy and the timeouts begin.

Not sure this data represents the problem you are describing. Also, if that was our case I would expect the mysql process to be pegged, where in our case the apache processes are the ones causing the resources to max.

I actually am taking full mysql logs so i'll dig though them in more detail to see if it reveals anything.

Taylor:
Your instructors run quizzes often and if so other than the problem you've had has it cause very high cpu resource consumption?
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Taylor Judd -
Hi Justin,

Our instructors do use quiz heavily. Our major example is one professor who has ~1300 students and allows for unlimited retakes of weekly quizzes. We average around 8000 attempts for each of these quizzes and as is no surprise most of these attempts come in the hours before the quiz is due.

We've had multiple quiz issues over the years from the recent one involving calculations to previous problems including myisam corruption of quiz tables. In most of those cases the problem is database related but often manifests itself in the apache workers spiking CPU as they try to connect to the database and get backedup.

None of that though is likely to help with your problem.

How large is your mdl_grade_grades table? Number of records? During one of your spikes do you do a show full processlist? If so notice any locking problems or recurring quarries either creating large temp tables or otherwise eating resources? If the problem is database bound and your running INNODB then a show innodb status can also be useful.

If you're suffering our problem find the id of the item in mdl_grade_items that relates to calculations on your quiz. You can do this by looking at the URL when viewing the gradebook item.
Check that it is the right item:
select * from mdl_grade_items where id=XXXXX;
Then do:
select count(*) form mdl_grade_grades where itemid=idfrom-mdl_grade_items;
Depending on the quiz setup the count should equal the number of actual attempts at the quiz. If it is higher than that then it is populating with superfluous data. In our case our 114 students in this course generated 30,000+ grade entries. After we cleaned it up we were down to 114 entries.


To look at your specific possibly problematic quarries you can do:

mysql>explain SELECT * FROM mdl_grade_grades WHERE itemid=1977 AND userid IN (425,62,408,394,410,404,419,68,69,397,27,395,393,413,418);
This will show you if its using indexes, temp tables, or other problem areas.

I did look through our slow query log and see the queries you are speaking of. However they occur very rarely for us. (I have 16 instances... other slow queries occur over 1600 times in the same time span.) I'm wondering if this is the source of your problem? Just how often does this query appear in the slow query? Do you see it in the process list when experiencing the problem?

A final diagnostic suggestion. If apache processes are hanging you might try to strace them.
http://linux.die.net/man/1/strace
http://tldp.org/LDP/LGNET/132/vishnu.html
http://www.opensourcetutor.com/2007/08/04/troubleshooting-apache-with-mod_status-strace-apachetop-iftop-top/

Those explain a few different ways of using strace. Our most common use is simply finding the PID that is the oldest in our apache server, likely the culprit of the slowdown, and attaching strace:
strace -p -s 3000
It creates a lot of data, most of which I don't follow, but if you look close enough you can often find the source of the problem.

Hope some of this helps.
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Wen Hao Chuang -
Hi Tim:

Our server had a hiccup few days ago, due to a course that has around 100 students taking the quiz that has 60 questions (49 multiple choice questions and 11 description type of questions), and possibly submitting the quiz at the same time (or at least very close to each other). As the instructor designed the quiz in a way that it would only open for less than 2 hours (and I guess most students were not able to finish all the questions with that time frame), so most of them might have waited until the last minute to submit (or just let the quiz to auto submit for them), thus created a performance problem.

I took a look at our MySQL server "slow query" log, and this is what I found:

(a)
# Time: 090421 10:10:38
# User@Host: cranberry[cranberry] @ xxx.sfsu.edu [xxx.xxx.xxx.xxx]
# Query_time: 50 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO mdl_question_states ( ATTEMPT, QUESTION, ORIGINALQUESTION, SEQ_NUMBER, ANSWER, TIMESTAMP, EVENT, GRADE, RAW_GRADE, PENALTY ) VALUES ( 529087, 88901, 0, 1, '279771', 1240333751, 2, 0, 0, 0.1 );
=> This query took 50 seconds!

(b)
# Time: 090421 11:24:54
# User@Host: cranberry[cranberry] @ xxx.sfsu.edu [xxx.xxx.xxx.xxx]
# Query_time: 8 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO mdl_cache_text ( MD5KEY, FORMATTEDTEXT, TIMEMODIFIED ) VALUES ( '
a9eb4bd4759d77b5a5dfb91145926f44', 'Unlike a storage warehouse, a distribution center is primarily for?', 1240338286 );
# User@Host: cranberry[cranberry] @ xxx.sfsu.edu [xxx.xxx.xxx.xxx]
# Query_time: 235 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
DELETE FROM mdl_grade_items_history WHERE timemodified < 1208802056;
=> This query took 235 seconds!

Some facts:

1. Most (but not all) of the slow query happened during that specific time frame seem to be related to inserting into the mdl_question_states table.

2. Our mdl_question_states table has grown to be about 2.5GB in size (data_length 1.5GB, index_length 1GB), and the mdl_grade_grades_history table is 3.2GB (data_length 1.5GB, index_length 1.7GB).

3. On our live site, we added an extra index on "originalquestion" based on MySQL SUPPORT's recommendation (more details see MDL-16484). This has helped with the performance for the past couple of months, but it seems that this tweak did not solve the root problem.

4. For (b) "DELETE FROM mdl_grade_items_history" I suspect this is related to MDL-13982 and I'm about to do some testing about this. But for (a), I'm still looking into the codes to get to the bottom of this.

My questions:

1. Tim have you written any testing tool that would quickly populate the mdl_question_states, mdl_question_sessions, and mdl_quiz_attempts tables for testing purposes? For the quiz codes do you usually test it against a really large database (e.g. more than 100,000 questions in the mdl_question table, > 3GB of mdl_question_states table, etc.)?

2. Have anyone looked into the issue why inserting a new row into mdl_question_states (while it's big) would create some quiz performance problems? Any thoughts or comments?

3. Have anyone done such testing to have a quiz with > 60 questions and > 100 students, and just let moodle to auto-submit all the questions for them at the last minutes? Does the current quiz code handle this gracefully?

Any input would be much appreciated!

Wen
У адказ на Wen Hao Chuang

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
Fact 3: Actually, that column is not used for anything, I think you will find that it is always contains NULL. You can just drop it. (It is gone in Moodle 2.0 - MDL-17608 - in Moodle 1.9, just adding the index is probably safer.)

Question 1: I haven't, but Jamie Pratt did when he was working on the new quiz reports. See cvs://contrib/tools/generators/qagenerator/ (May be Moodle 2.0 only, but probably quite easy to back-port)

Question 2: You are on INNODB right, I thought that dealt with most of the lock contention problems, but perhaps not. Are you sure you can trust what MySQL is telling you? I find
Query_time: 50 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
very hard to believe. I think it is much more likely that most of that time is spent waiting for a lock (perhpas the lock on the sequence that generates the unique ids for the id column? I am guessing).

Question 3: No I have never done this, but this sort of testing is quite high on my list of things I must do when I get time (right behind the very long list of more urgent things журбота). However, there is a limit to what the the quiz can do if the database takes 50 seconds to do a simple insert_record into a table. Of course, it does not have to be me. Any testing that you or anyone else can do is a very valuable contribution.
У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Wen Hao Chuang -
Hi Tim, thanks for the info about the qagenerator, I just tried it but it was giving me some errors out of the box so I'm trouble-shooting the issues. Once I get it to work I think this would be a very useful testing tool. Thanks!

I started to look into the quiz submission codes, and found that in the /mod/quiz/attempt.php starting from around line #221 (/// Load all the questions and states needed by this script), this could be problematic when you have a really big mdl_question_states table (ours has 60 million rows and approaching 3GB in size). This also might be related to the questionslib.php for the function of :

question_process_responses()
save_questions_session()

I'm still doing some testing, but here are some thoughts for brain-storming purposes:

1. Would it be possible to refactor the quiz codes a little bit to make sure that the INSERT to mdl_question_states table would only happen when inserting necessary info is absolutely necessary (final states that is)? Right now it just seems that INSERT happen a lot.

2. Another possible solution is maybe use a temp table to store some of these INSERTS?

I think maybe short term solution for us is to try to trim down the mdl_question_states table. But I will have to look into how to do this without affecting other quiz tables. Any thoughts or comments on this?

Thanks! smile
У адказ на Wen Hao Chuang

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
I think you already know I am intending to do a major refactor of the quiz when I get the time (Development:Changing_the_Moodle_Question_Engine). Since I am currently in the middle of the huge Navigation 2.0 changes, that is all I will be able to do for Moodle 2.0, so Moodle 2.1 is the earliest we will get a refactored quiz, but that is, of course, up to my lords and masters at that OU.


I don't understand what is wrong with large database tables. The whole point of databases is that they can store large amounts of data with good scalability when necessary, and here we are not really storing any unnecessary data. I think we have the right indexes so that everything we do with those question tables will be no worse than O(log n) in the table size.

1. I agree the code is quite hard to follow, but that does not mean it is inefficient*. Do you have any actual evidence of unnecessary inserts?

2. No. I guess you suggest this because some people have found that works for the log table. Well, those are completely different cases. The log table is written far more often than it is read. During a quiz attempt, the question states has to be both read and written all the time during a quiz attempt.

You already use INNODB tables, right? It is the case that MYISAM lock-the-whole-table-on-write is a disaster for high-load quizzes.


* I think David Mudrak got it spot on in this blog post about the workshop module: "Before I learned how the calculation actually works, I proposed to rewrite it from scratch. Interesting. Is it just me tending to push own solutions instead of trying to understand someone else?" I have definitely been guilty of that.

У адказ на Tim Hunt

Re: performance: normal load + quiz + grading assignments

Wen Hao Chuang -
Hi Tim, first thanks again for everything. You have been big help for our trouble-shooting.

We still suspect that by trim down the size of the mdl_question_states table (at least by clearing out the "middle-states" events), this should help with our current quiz performance problem in some way. So here are my new questions:

1. Here is the result after I did

select event, count(*) from mdl_question_states group by event;
on our DB:

event count(*)
0 8082439
2 2203020
3 501154
6 7667252
7 5435
8 38855
9 14285

Now, based on the /lib/questionlib.php that I'm reading, it seems that only the event 0 is the "starting state" and event 8 (and 9) are the "end state". For event 2 (The student has requested that the responses should be saved but not submitted or validated), and event 6 (Moodle has graded the responses. A CLOSE event can be changed to a CLOSEANDGRADE event by Moodle), if we are to trim down the table based on the "timestamp" column and "event" column, those event 2 and event 6 rows should be safe to delete, without any side-effect or affecting other quiz tables. Am I correct on this? Thought or comments?

2. For event 3 ~ 7, do we really need to keep all these info in this table? I guess if we don't have time to do a total refactoring, at least we could do something so that those "miiddle events" will not be recorded in this particular table or will be removed once certain event (state) has been reached. After all, this table right now is the 2nd largest table in our LIVE DB (the largest table is mdl_log). There should be some sort of quick-n-dirty fix or work-around type of thing that we could do for the 1.9.x quiz codes so that we could keep this table (rate of growth in size that is) under better control. Thoughts or comments?

Thanks again for everything! усьмешка

Wen

У адказ на Wen Hao Chuang

Re: performance: normal load + quiz + grading assignments

Tim Hunt -
Выява Core developers Выява Documentation writers Выява Particularly helpful Moodlers Выява Peer reviewers Выява Plugin developers
I really don't know what would happen if you tried that.

I expect you might have to renumber the values in the seq_number column.

But, it seems like really dangerous thing to try.



Here is an alternative to consider, that is also hugely dangerous and scary, but seems more sensible to me:
  1. Create question_states_archive table, with exactly the same columns and indices as question_states.
  2. Write a cron job that find quiz attempts that have been closed for more than n days, and moves all the associated rows from question_states to question_states_archive
  3. Modify every single query that touches from the question_states table so that, if you are part of an active quiz attempt, then you use the current code, but if you are running a report that may include data from more than n days after the quiz closed, use a UNION of the question_states_archive and question_states tables (I wonder if INNODB handles views well enough that you could use a view? then you would only have to change question_states to all_question_states_view in the query if some condition was true.)
This proposal would let you move 90% of the rows out of the current states table, whereas your proposal only achieves about 50%.

My grep finds 77 references to question_states table outside the upgrade code.
У адказ на Eric Hagley

Re: performance: normal load + quiz + grading assignments

Wen Hao Chuang -
Hi Eric, just curious about what type of new security (software?) your school put in (that check websites for malicious content) that would cause the quiz secure mode to send the server in a spin? Do you have more details about this and did you finally able to resolve this? Also did you get a chance to file a bug report about this on Moodle bug tracker? Thanks! smile

Wen
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Some interesting info Wen. We are still suffering from the problem and I have not found a solid solution. By having the instructors stagger the quizzers has help the load a little bit, but the weird thing is that a quiz can be given out with 40 students and work fine, then one with 10 causes the lock-ups. I have not been able to find a consistent way of testing this which makes recreating the problem very difficult.

My plan of attack is going to be setting up a haproxy by deploying some left over low-end servers (or some virtual servers) and setting up a cluster of apache2 servers. This should help alleviate the site becoming unavailable, but not really solve the underlying issue. Hopefully, the mysql db won't be the bottleneck with this config, if so then I will cluster that as well.

I'm really hoping 2.0 will have improvements, or 2.1...but we probably won't move to that for a while until it's matured a little, so it could be a year or more before we hit that point.

stay tuned...

btw, still on 1.9.3 What version are you running Wen?
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Jonathan Moore -
Justin have you tried running a PHP accelerator? That is the first thing I recommend when I hear that CPU utilization is running high for the web process. Both eAccelerator and APC seem to work well with Moodle.
У адказ на Jonathan Moore

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Yup running eAccelerator on debian 5.0 lenny now. I can run a benchmark on the system and it preforms really well responding to php/mysql requests pulling random pages, but whatever cause the quiz module to freak out makes my apache processes just get caught in a loop trying to process the requests.
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Wen Hao Chuang -
Hi Justin, sorry to get back to you a bit late (just came back from a vacation!). We are running a heavily customized 1.9.0 codebase here at SF State (with selective patches that concerns us). We will be upgrading to 1.9.5 codes in late July this year. Hope this helps!

Wen
У адказ на Wen Hao Chuang

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
OK good to know.

Update on this- a webfarm of some p4 servers is in place using HAproxy and lighttpd. Separate mysql server and NFS for moodledata. Rsync is used to sync up the www dir to all webservers. Evidence thus far is we got alot of power. The issue I think we had will go away with this config. Though, the underlying issue will remain (perhaps a 1.9.3 still?).

Goes live in a few weeks so I'll post an update in a few months.

У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Justin Haaga -
Well, my follow up and the news is great. Problems are 100% subsided. Http cpu load still peaks when the quizzes are ran but with the combo of 5 web servers spread and lighttpd method of processing php (fastcgi) there are no problems of multiple quizzes ran with other normal traffic load.

Here's a quick diagram of my configuration. Basically here's break down of the config:

Server#1-modern HP DL360 server, quad core 2.0ghz xeon, 12GB memory, 5 drives @ 10k sas
Running vmware ESX 3.5 update 4
haproxy - 2 cpus, 1.5gb memory, 20gb hd - ubuntu 8.0.4 w/ vmtools - no paravirtulization caused locks on kernel
nfs server - 2 cpus, 2gb memory, 150gb hd- ubuntu 8.0.4 w/ vmtools -paravirtulization caused locks on kernel
mysql server / backup webserver - 2 cpus, 2gb memory, 20gb hd- ubuntu 8.0.4 w/ vmtools -paravirtulization caused locks on kernel

web farm:
6 year old IBM 305 eservers, 1 p4 2.4ghz, 1-2gb memory, single sata 7200 rpm 80GB hd
debian lenny 5.0

The old configuration was:
same server#1 but:
webserver/nfs-moodledata - 4cpus, 8gb memory, 150gb hd, debian 4.0/5.0

server#2
windows 2003 server - mysql - xeon dual 2.33ghz cpus, 4gb memory, 300gb SCSI 15k

I think a big difference is apache vs lighttpd, but also obviously being able to spread the load over 4 different servers, even though the total processing power is about the same. P4 vs a Xeon...in theory the xeon should be faster, but it appear the OS software can still be the bottleneck.

If you got some old server laying around, or even virtualized a few descent server you can have similar HA configuration. Oh and secondary HA proxy is off the the side as a old pos eserver also.


note: mis print on the diagram it's not apache, final setup is lighthttpd, though apache is configure as a backup. But benchmarks proved lighttpd was much more efficient and faster.

snip from apache vs lighttpd in this config:

apache2-

Server Software: Apache/2.2.9
Server Hostname:
Server Port: 443
SSL/TLS Protocol: TLSv1/SSLv3,DHE-RSA-AES256-SHA,1024,256

Document Path: /
Document Length: 50825 bytes

Concurrency Level: 250
Time taken for tests: 33.324 seconds
Complete requests: 700
Failed requests: 0
Write errors: 0
Total transferred: 36709423 bytes
HTML transferred: 36135271 bytes
Requests per second: 21.01 [#/sec] (mean)
Time per request: 11901.542 [ms] (mean)
Time per request: 47.606 [ms] (mean, across all concurrent requests)
Transfer rate: 1075.76 [Kbytes/sec] received

lighthttpd
Server Software: lighttpd/1.4.19
Server Hostname:
Server Port: 443
SSL/TLS Protocol: TLSv1/SSLv3,AES256-SHA,1024,256

Document Path: /
Document Length: 50825 bytes

Concurrency Level: 250
Time taken for tests: 13.606 seconds
Complete requests: 700
Failed requests: 0
Write errors: 0
Total transferred: 36342698 bytes
HTML transferred: 35862831 bytes
Requests per second: 51.45 [#/sec] (mean)
Time per request: 4859.177 [ms] (mean)
Time per request: 19.437 [ms] (mean, across all concurrent requests)
Transfer rate: 2608.53 [Kbytes/sec] received

this configuration is a little overkill for us at the moment, but we had the hardware so not really hurting us.



Укладаньне webfarm.jpg
У адказ на Justin Haaga

Re: performance: normal load + quiz + grading assignments

Jared Chapman -

I am having this problem now. Is the solution really to get 5 servers?

 

This entry appears in my slow query log 3858 times in 1 hour

# Fri Mar 22 17:00:01 2013
# Query_time: 4.097737  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 0
SET timestamp=1363993201;
SELECT GET_LOCK('livegrea_moodle-mdl_-session-44834', 120)