performance impact of massive use of question attempt step data

performance impact of massive use of question attempt step data

by Jean-Michel Védrine -
Number of replies: 9

Hello Tim,

Do you have any idea if I use a lot of different question attempt step data will it have a performance impact ?

And when I say a lot, I really mean a lot, don't forget each formulas question can have an indefinite number of global variables, of local variables of subquestions, of answers per subquestion, of trials, ... !

For instance just my small tests from this morning trying to use separate data have created thousands of mdl_question_attempt_step_data records !!

Of course in Moodle 1.9 all this was serialized (no choice) so all the functions to serialize/ unserialize, recreate data structures from serialized strings are already done ! should I continue that way an only use one string per attempt ?

Should I separate each variable only serializing what is not a string (for instance the global variables instanciation must surely be serialized in a string like x0=1;y0=2;vx=4;vy=-4;tr=12;ta=18;)

Should I take a mid way using a fixed number of data serializing all similar values for all subquestions in one data for instance ?

I have absolutely no idea. All the question attempt step data is unknown territory for me. I don't even have a clear idea what names begining with -_ (or is it _- ?) are for ? I must clearly re-read what you have written in docs and the comments in the code !

Average of ratings: -
In reply to Jean-Michel Védrine

Re: performance impact of massive use of question attempt step data

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

The prefixes are easy to explain:

If the data comes directly from the user as part of a form submission, there is no _ prefix. If it was created in Moodle them there must be an _ prefix.

If the data belongs to the question type then there is no - prefix. If it was created in Moodle then there is a - prefix.

So, answer is a variable that came from the bit of the form output and processed by the question type. -submit is a variable that came from the bit of the form output and processed by the behaviour. _choiceorder is a variable that was computed internally be they question type (probably in start_attempt) and stored for later use. -_try is a variable that was computed internally by the behaviour, and stored for later use.

I don't think you need to worry even up to 100 data items. Remember that you only have to store the variable definitions in the first step. I think it would be unlikely to have more than 20 data variables, and 20 inputs that takes the student 4 tries to get right.

I have to say that load_questions_usage_by_activity is currently using a very simplistic approach, and I have in mind some ideas about how it could be made more efficient. It just hast no proved necessary to try to optimise it yet.

In reply to Tim Hunt

Re: performance impact of massive use of question attempt step data

by Janet Smith -

I've been running Moodle 2.2 for just under 6 months and I am concerned by the size of the question_attempt_steps and question_attempt_steps_data tables. They seem to be growing at an alarming rate.

Currently question_attempt_steps is just under 8 million rows and question_attempt_steps_date is just under 11 million rows. I have a large Moodle site which sees a significant amount of traffic, but it is not a huge site by any regard.

Is there any plan to manage the data in these tables to prevent them from continuing to grow? I'm concerned that over time the size of them will start to interfere with my ability to backup the database in a timely manner.

In reply to Janet Smith

Re: performance impact of massive use of question attempt step data

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Number of rows is not a good metric. How many MB is that table? How does that compare with other DB talbes in your site?

I just can't get excited about this. Databases are good at handling lots of data.

In reply to Tim Hunt

Re: performance impact of massive use of question attempt step data

by Janet Smith -

Thank for the quick reply.  You might be right that this is a nonissue.  However these 3 tables represent about 30% of the database:

Mdl_question_attempts = 1.6GB
Mdl_question_attempt_steps = 1.3GB
Mdl_question_attempt_step_data = 1.4 GB
Total Moodle db size = 15GB

Is that to be expected?

Average of ratings: Useful (1)
In reply to Janet Smith

Re: performance impact of massive use of question attempt step data

by Paul An -

I am running into the same issue. We have thousands of questions and for each question attempt, it store a big question summary in question_attempt table which is pretty much same as the question. It also stores the complete option text in the right_answer column.

These two columns seem to be the main culprits that are boating the db up. Does anyone know what these two columns are used for? Is it safe to modify the code to store nothing in these two column?

thank you!

In reply to Paul An

Re: performance impact of massive use of question attempt step data

by Marcus Green -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers

Can you describe your issue further. How many students taking how many questions (simultaneously) using how much RAM (on web server and database server) with what size of CPU and what performance issues are you experiencing.  If you are using Moodle 2.4 have you taken a look at the page at

/moodle/cache/admin.php

 

In reply to Marcus Green

Re: performance impact of massive use of question attempt step data

by Paul An -

Hi Marcus,

 Thank you for responding.  I am evaluating moodle for our certification products and we plan to host it on shared hosting servers. We have thousands of questions and thousands of users. So we expect to run up to a million question_attempts in a short time.

  So while running simulation on a local desktop system where everything is on the same box, it seems slow. I also looked at the database structure because backing up user data also needs to be done regularly and any redundant data will impact the backup jobs and size of archive files.
 
I am not sure what is the purpose of these two columns and whether they can be eliminated (or at least store empty in these columns.) 
 
I am looking at the cache settings page right now and I am not sure how it relates to this table. 
 
So basically, I would like to understand the data that is generated during its usage before we productionalize it. Understanding this table is where I am at right now.
 
thank you!
In reply to Paul An

Re: performance impact of massive use of question attempt step data

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Those columns are used in the quiz reports. Specfically Quiz -> Results -> Responses.

If you are just using multiple choice questions, then these columns are somewhat redundant, but once you get to more interesting question types like Calculated or STACK, then they are necessary.

If you don't need the quiz responses report, then you can safely do something like

UPDATE mdl_question_attempts SET questionsummary = '', rightanswer = '', responsesummary = ''

to wipe out all that data. That should not break anything other than the one report, but you should test that.

Average of ratings: Useful (1)
In reply to Tim Hunt

Re: performance impact of massive use of question attempt step data

by Paul An -

Thank you!