Quiz

 
 
Picture of Richard Lobb
Regrading of adaptive quiz questions in CodeRunner
 

[This was originally an email to Tim Hunt, reposted here at his request so he can answer in a public forum. It relates to a question-type plug-in I've written called CodeRunner, for which the answer is program code to be evaluated against a set of test cases. Moodle Version: 2.5]

Hi Tim

I wonder if you'd mind helping me with a small design question relating to my CodeRunner question type, please?

First a bit of (slightly embarrassing) background, if you don't mind.

You may recall that I always use CodeRunner in an adaptive mode, as students need to check the code of each question as they go. Since I wish to record the results of all the test cases, I modified my question::grade_response() method to return all the test case output in addition to the usual grade and question state. I then subclassed qbehaviour_adaptive to qbehaviour_adaptive_adapted_for_coderunner so that it stored those test results via a call to $pendingstep->set_qt_var(...). This allows me to render the full results output whenever the page gets drawn or redrawn. So far so good. But ...

I also attempted to re-use any existing test results if the question answer had not changed. I've now discovered, a year or two later, that that bit of my logic was badly broken and in fact I was never re-using test results. Rather, the usual 3 lines of code in the default behaviour's process_submit, viz

if ($this->question->is_same_response($response, $prevresponse)) {
        return question_attempt::DISCARD;
}

were ensuring that grade_response didn't get called on multiple submits of the same answer. However, grade_response was being called when the entire quiz was finally submitted, since process_finish doesn't carry out the same check for an unchanged answer. I didn't notice even this regrading cost because it turns out that the cost of grading C or Python programs isn't that much greater than the cost of rebuilding the response page in PHP. However, that serendipity does not apply to matlab programs, because matlab is a monstrous resource hog. Consequently when I ran a couple of matlab labtests, each with 90 or so students, the server ground to a near standstill at the end of each test, though it did come alive again and all was well in the end (thankfully).

With regard to re-use of previous results, two other issues need to be mentioned.

  1. If the question author changes the test case or test data, the is_same_response function prevents regrading. Hence, in order to trigger a regrade during an adaptive quiz sitting, one has to do something silly to the answer, like add a space character on the end of a line, rather than just clicking check.
  2. In a web programming course in which the question "answer" is just a link to the real answer (a student website, which my test code then inspects), the same problem occurs: students have to do something silly to the link to force regrading, rather than just clicking the check button.

I'm now trying to figure out my best strategy for dealing with all the above, which brings me to my three questions for you, please:

  • There's a bit of code in the default adaptive behaviour's process_finish function that says:

        if ($laststep->has_behaviour_var('_try')) {
                // Last answer was graded, we want to regrade it. Otherwise the answer
                // has changed, and we are grading a new try.

I don't understand this. Why do we want to regrade the question in this case, but not within process_submit?

  • As far as I can see, process_submit is called (in the context of an adaptive question) only for a specific question when that question's 'check' button is clicked, whereas process_finish is called for every question when the student finally submits and closes their quiz. Do I have this right and can I depend on that behaviour? Are there any other situations when either of those methods gets called?
  • [Related to the above two questions ...] What will go wrong if I simply regrade the question whenever a student clicks check (i.e., I remove the test for same_response from process_submit) or when the quiz is finally submitted if and only if the answer has changed or hasn't yet been graded?

Hope those questions aren't too dumb, and thanks for hanging in there till the end.

-- Richard

 

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
What is going on in adaptive behaviour process_finish?
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

There are really major issues here, and I think I will answer them one at a time in separate posts:

1. The subtlety about how how adaptive behaviour works in process_finish (Your first question.)

2. The problem with not being able to use $step ->set_qt_var

3. What to do about this, which I will anwer by telling you what we did in STACK.

4. Your other two questions.

So, process_finish. Let us proceed by considering a simple example question: "What is 1+1?" a student may interact with it as follows:

  1. Start quiz.
  2. Input 3 and Click check.
  3. Read feedback. Input 2 and Click check.
  4. Later, get to the end of the quiz, and click Submit all and finish. In this case, the student's latest response has already been graded, so we don't need it grade it again.

Compare that to this less common scenario, which must non-the-less be handled correctly.

  1. Start quiz.
  2. Input 3 and Click check.
  3. Read feedback. Input 2 but don't click check yet. Just click Next expecting to come back to this question.
  4. Get to the end of the quiz, and click Submit all and finish without going back to the question. In this case, the student's latest response has not yet been graded, so we must grade it now.

The code you hilight is to distinguish these two scenarios and handle them correctly and efficiently.

To really understand this, there is not substitue for staring at:

If you are trying to implement a new question type or question behaviour, you are really strongly advised to write plenty of tests like walkthrough_test.php. STACK question type has a particularly find set https://github.com/maths/moodle-qtype_stack/tree/master/tests that have more than paid for themselves since they were written.

 
Average of ratings: -
Picture of Richard Lobb
Re: What is going on in adaptive behaviour process_finish?
 

Firstly, thanks for the wonderfully thorough response, Tim. There's lots of great stuff to ponder in there.

However, for this particular part of the question, I didn't explain myself clearly. I'm not at all surprised that the question needs to be graded if an existing response has been edited but not submitted. What puzzles me is that process_finish, unlike process_submit, doesn't have the call to is_same_response to see if the same answer is being submitted. Consequently, grade_response gets called regardless. It is this fact that caused such a high load at the end of my lab tests using matlab questions. I just checked with breakpoints in the code that this is what happens, at least with my subclass of adaptive behaviour. Have I broken something, or am I misunderstanding yet again, or is it a bug?

Richard

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
Re: What is going on in adaptive behaviour process_finish?
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

Just to make sure I understand, using my previous example, the situation you think is wrong is:

  1. Start quiz.
  2. Input 2 and Click check.
  3. Later, get to the end of the quiz, and click Submit all and finish. This calls grade_response with answer => 2 a second time, which is unnecessary.

Is that what you mean? If so, I agree it is a slight bug. Please report it in the Moodle Tracker.

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
Why is there no way for question types to use not being able to use $step ->set_qt_var?
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

When I implemented the question engine (for Moodle 2.1, MDL-20636) I implemented all the necessary back-end so that question types would have a place to store the results of expensive computations during grading, etc.

Except that, when I was designing the API for question types to talk to the rest of the question engine, I omitted to provide any way for the question type to access that storage.

The short explanation for this is that I made a mistake.

There is, however, a longer explanation for why I have not hurried to fix this mistake. The point is that if you write purely functional code, then it is much easier to get it right. Now, in fact, question types are not purely functional, in particular start_attempt and apply_attempt_state are allowed to manipulate the class's state, but most of the important API that question behaviours use (grade_response, is_same_response, is_gradable_respononse, is_complete_response) are state-less. They just take one or more responses represented as arrays, and computer a particular response based on them.

The state-less-ness makes it much easier for people creating behaviours (which do have to worry about the order things happen in, and states.) As I said above, they make it much easier to reason about the correctness of quetion types; and they also make it easier to unit-test these critical methods.

So, I am reluctant to change this. In the next post I will explain how, none-the-less, you can avoid re-computing expensive calculations. At least, I will explain what we did in STACK.

The other reason for not changing the question_type API to fix this now is that I am very reluctante to make a non-backwards compatible change here, unless I really have to. However, that is not absolutely a blocker. We could make only a minimal change to the currently API by passing a class that implemented ArrayAccess where currently we pass an array to the question_type methods, and that class could also implment ->set_qt_var passing it thorugh to the pending step. The down-side of doing this (in addition to the point above about state-less methods being better) is that ArrayAccess interface is a lot slower than a real array.

 
Average of ratings: -
Picture of Richard Lobb
Re: Why is there no way for question types to use not being able to use $step ->set_qt_var?
 

Yep, all understood. Thanks.

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
How to handle expensive computations in your question type
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

So, if you question type needs to do expensive calculations as part of the grading process, how can you avoid repeating the same costly calculations repeatedly?

I had to solve this while working on STACK, a question type that wraps around the Maxima computer-algebra system. Sending some stuff to Maxima, and getting the results back is slow, so we want to compute any given thing once, and then re-use the same answer whenever we get the same expression to evaluate.

When thinking about the problem, I came to realise that we don't acutally want to cache things as part of the student's attempt. If two students get an identical question, and submit an identical response, then we don't need to re-compute. We have done that computation already. So, $pendingstep->set_qt_var is the wrong place to cache information anyway.

The code-flow looks like this.

[ other parts of the question engine ]
           |
           |   Calls to qtype methods like grade_response, render, etc.
           v                      / assembles bits of the
[ qtype external API methods ]   {  questoin and resonse
        (*)|                      \ and determines a calculation
           |   Call to underlying expensive computation
           v
[ core computation-peforming code ]

The point is that the place you want the cache is (*). That is, you just want to cache the raw calculation. Deriving the calcuation to perform form the students respoonse, and which qtype API method was called, is relatively fast, and it is OK to repeat it whenever an API method is called.

This is also a place where it pays to write pure-functional code. That is, the results of the back-end calculation should depend only on the input to that specific funcition call. Pure functions are the easiest thing to cache (or memoize as a functional programmer would call it.)

In STACK, this is implemented in the connector classes here. https://github.com/maths/moodle-qtype_stack/tree/master/stack/cas. In particular, https://github.com/maths/moodle-qtype_stack/blob/master/stack/cas/connector.dbcache.class.php is a nice use of the adaptor pattern to do the caching. It wraps a real CAS connection, and then for each request first tries to get the answer from the cache, and if it is not there, calls the undrelying connection and caches the result before returning it.

Having said that you should make the back-end calculation a pure-function, that is actually a lie. In the stack case, one naively thinks that the computation we should perform is just a fucntion of the question, the random seed, and the student's response. In fact, the result also depends on the version of Maxima used, and which version of STACK's Maxima code libraries were used. What we have done to handle this is to get STACK to return the version number of the things used in the calculations. Then we can compare that agains the version that was expected, and warn if there is a mismatch.

(Heading off-topci, here is a good quote about funcitonal programming from Philip Wadler (1998) 'Why no one uses functional languagesACM SIGPLAN newsletter

"Advocates of functional languages claim they produce an order of magnitude improvement in productivity. Experiments don't always verify that figure - sometimes they show an improvement of only a factor of four. Still, code that's four times as short, four times as quick to write, or four times easier to maintain is not to be sniffed at. So why aren't functional languages more widely used?"

The paper gives some good answers to that.)

 
Average of ratings: -
Picture of Richard Lobb
Re: How to handle expensive computations in your question type
 

Yes, I think your STACK solution is beautiful. However, since I've already used $pendingstep->set_qt_var, and the probability of two students submitting byte-for-byte identical code is pretty slim (someone's almost certainly cheating, and very blatantly), I don't think it's a big win for me to change the code at this stage. I'm keeping it in mind for the ultimate beautification programme when everything else is perfect smile

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
Re: Regrading of adaptive quiz questions in CodeRunner
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

And finally ...

As far as I can see, process_submit is called (in the context of an adaptive question) only for a specific question when that question's 'check' button is clicked, whereas process_finish is called for every question when the student finally submits and closes their quiz. Do I have this right and can I depend on that behaviour? Are there any other situations when either of those methods gets called?

You are right, but this is just a convetion. It is all down to the behaviour itself. Anything that happens to a question gets translated into a call to $behaviour->process_action, and it is up the the behaviour to response appropriately. Here is the code in qbehaviour_adaptive: https://github.com/moodle/moodle/blob/56cc9b387ed095a18a6ee4df724dabf490f93df6/question/behaviour/adaptive/behaviour.php#L89

has_behaviour_var('submit') means that the Check button was pressed. has_behaviour_var('finish') means that something like Submit all and finish in the quiz happened.

What will go wrong if I simply regrade the question whenever a student clicks check (i.e., I remove the test for same_response from process_submit) or when the quiz is finally submitted if and only if the answer has changed or hasn't yet been graded?

Nothing will go wrong, I think. What is important is to think about what behaviour will work best for the students trying to use this tool to learn.

However, do not think that this way of forcing repeated evaluation is the right way to solve the "author changes the test case or test data" scenario. It isn't.

When the question is altered, the right way to get attempts updated to take acount of the new definition is to run a regrade (in Results -> Grades). This re-plays the sequence of student responses to each question to re-generate the marks, etc. Of course, if you have caching going on, you will need to clear any cache between editing the question and running the re-grade.

For the "'answer' is just a link to the real answer" problem, well the problem there is that you have broken a fairly fundamental assumption built into question engine, that the students real answer will be submitted, and stored, so that it can be later re-played, and re-graded, etc. In fact, this might be a case for ->set_qt_var. What I mean is that the grading process really falls into two phases when what is submitted is a link to something in the real world, rather than the response itself.

  1. Sample the linked thing, go determine the properties that will be important in grading.
  2. Do the grading based on the result of the sampling.

The only way to fit into Moodle's question system is to do the sampling when the student clicks Check, and store that using ->set_qt_var. Then all the grading can proceed from the stored response as usual. Of course, there is no way to do this right now.

Well, the one way you could do this is to do the sampling in the student's web browser, in JavaScript, and write the sampled information into hidden inputs that are then submitted to Moodle.

 
Average of ratings: -
Picture of Richard Lobb
Re: Regrading of adaptive quiz questions in CodeRunner
 

Yet again, many thanks. Good thoughts.[ I really should have looked at the original behaviour class, rather than my subclass, before asking that question.]  However, I think for now I'll do a forced regrading on check, regardless of whether the answer is the same as before. That solves most of my problems. I'll consider if I should add a special flag to my question type to say "always regrade" so that I regrade such questions on final submission, too. The only time I really have to avoid regrading already-graded questions - at least with the current sandbox and scale of question - is at the end of a test when all quizzes get submitted.

 
Average of ratings: -
Tim at Lone Pine Koala Sanctuary
Re: Regrading of adaptive quiz questions in CodeRunner
Group DevelopersGroup Documentation writersGroup Particularly helpful Moodlers

Yes. With luck you should be able to do this by changing your implementation of is_same_response to (almost) always return true.

 
Average of ratings: -