Numerical question answer processing.

Numerical question answer processing.

by Tim Hunt -
Number of replies: 29
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
In my question engine re-write, I have just got to the bit where I have to re-implement unit processing in numerical questions. Well, I did not think I would have to re-implement it, until I looked at the existing code.

The current code does roughly this:
  1. Remove all white-space from the student's answer.
  2. If the student's answer contains a single ',' and no '.', convert the the ',' to a '.'. Otherwise, remove all ','s.
  3. Use a regular expression to check that the answer if a valid number, followed by an optional unit.
I don't like this for several reasons.
  • Stripping all whitespace may corrupt some units - though I cant' quite think of a specific example at the moment.
  • The bit with ',' is a valiant attempt to support Europeans who like to write 3,142 intead of 3.142 (and we must support that). However, I think it is terrible that '3,000,000' is treated as 3000000; '3,000' is treated as 3; '3,000.' is treated as 3000.
This was getting rather long, let me split it into separate posts. More in a minute ...


Average of ratings: -
In reply to Tim Hunt

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
So, then I started thinking about how it should work, but it is tricky, so I decided to look what answers our students had actually given in response to numerical questions in the past. Therefore I ran the following query:

SELECT qst.answer, COUNT(1) as num
FROM mdl_question_states qst
JOIN mdl_question q ON qst.question = q.id
WHERE q.qtype = 'numerical' AND qst.answer <> ''
GROUP BY qst.answer
ORDER BY num DESC

It turns out that we have had a total of 189514 responses (because of things like adaptive mode, there there will be several responses stored per question) which group into 6123 distinct responses. (Note that this whole analysis ignores numerical questions that were selected randomly.)

The most popular response is 0, which was recorded 3868 times!

The huge majority of responses are given without using , as a thousand separator. (I am in the UK, remember.)

The first negative answer is -157000, recorded 2619 times.

The first weird entry is 'none', given 1639 times. 'None' scores 514, 'None.' 68 times, 'NONE' 35 times

The first answer with a decimal point is 24.8, given 851 times, so for some reason most Numerical question at the OU have integer answers!

The first answer with a unit is 71%, given 174 times. (71 without a unit given 222 times). % is by far the most common unit entered.

The next most popular unit is £, starting with £2.50 recorded 22 times, but there are plenty more that start £. Of course, Moodle cannot cope with units like that.

'pass' was given 91 times. '?' 74 times, 'xxxxxx' 50 times, 'dunno' 40 times, 'xxxxx' 40 times, 'xxxxxxx' 36 times, '-' 32 times.

'a' was given 63 times. I have no idea what is going on there. That was given in answer to a lot of different questions, and having looked at them there is nothing to lead a student to enter a. (Ah! perhaps the student was just putting in rubbish to see the feedback.)

The first response that uses , as a thousand separator is -157,000 54 times, then 157,000 40 times. (Clear what is going on there!) presumably those ended up being marked wrong, even though the student was right. Oh dear.

'one' was recorded 39 times, some other written out numbers were also given, but rarely.

'completed' 34 times, 'Completed' 28 times.

'i' 32 times - looking at some questions, I don't think that is anything to do with complex numbers.

'nil' 28 times

The first time there is an explicit '+' sign is +100 given 16 times.

Down in the tail, among the rare answers are all sorts of strange things like 'half past two', '2:30' also features. The first rude word, which I will not repeat, was recorded 12 times.

The first time you see a 'real' unit it is among those answers given 12 time. It is 'Angstroms', but the way.

there are a few places where the student has written something like '105000 debit'. I guess seeing that, the teacher could decide to make that a unit with multiplier -1.

Even further down, you start to see people trying to write negative numbers like (1234). Bloody accountants! You also start seeing things like '4.32 million'.

The first time you see a ',' being used as a decimal point by OU students is '1,42' given only 4 times.

I will add, that having looked a some of these questions, a lot of them have the warning

Enter your answer to the nearest whole £ into the blank space provided below.

[Note: Do not enter commas or spaces or a £ sign within the number you enter.]

at the top, which I would guess is affecting student behaviour.
Average of ratings: Useful (2)
In reply to Tim Hunt

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
OK, so what do I conclude from all that.
  1. Supporting ',' as a decimal point is not worth doing among UK students.
  2. Supporting ',' as a thousand separator is not essential, but probably is worth it.
  3. Stripping spaces from the answer is irrelevant. Probably best not to.
  4. We should support units like £ in future, that come before the number.
  5. We should consider supporting (1234) meaning -1234 in future.
Of course, it is vital to support ',' as a decimal point for languages that need it.


Now, as part of the translation system, each language that Moodle supports defines the correct decimal point and thousand separator characters. Therefore I propose that numerical questions should work like this.
  1. Take the student's answer and strip off all leading and trailing whitespace.
  2. Remove all occurrences of get_string('thousandssep') that are followed by a digit.
  3. Replace get_string('decsep') with '.'.
  4. Use a regular expression to separate the number part from the unit, based on the formal spec for PHP floating point numbers.
  5. Trim any leading of trailing whitespace from the unit.
Does that seem sensible? Can anyone forsee any problems with this approach?

Drat! yes. I can already. At the point where you try to regrade a question, you don't know the current language, so get_string('thousandssep') won't return the right thing. Ah! so I just need to store the user's language when they start the question.

In reply to Tim Hunt

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Hmm. Should we try supporting input like 3x10^8, 3*10^8, 3*10**8, ...?

Also, I am now thinking we should only process thousand separators the come before any decimal point.
In reply to Tim Hunt

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
The problem with trying to support units like £ is that some people write £-100, and others write -£100. Hmm.
In reply to Tim Hunt

Re: Numerical question answer processing.

by Joseph Rézeau -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators
Excel writes this as -£100. I would suggest having a look at how Excel (or OOCalc) write those figures, according to the various languages.
In reply to Joseph Rézeau

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Well, anyway, ignoring that problem for a moment, I think I have some code that works with just sensible units like m and cm.

The code is http://github.com/timhunt/Moodle-Question-Engine-2/blob/new_qe/question/type/numerical/questiontype.php#L442 and the unit tests are in http://github.com/timhunt/Moodle-Question-Engine-2/blob/new_qe/question/type/numerical/simpletest/testanswerprocessor.php.

If anyone feels like code-reviewing that, or suggesting some more test cases, I would be most grateful.
In reply to Tim Hunt

Re: Numerical question answer processing.

by Jeff Forssell -
Loved your analysis of student answers! Thanks for that.

And my heartfelt sympathy for entering this formidable jungle of formats! I've been trembling at the edge for ages. sad

I have a bunch of my hopes for various recognizable formats in Numerical questions which I've described here: http://docs.moodle.org/en/Numerical_question_units_and_intervals

I support recognizing 1.2*10**-3 1,2*10^+02 as numerical answers (even though I encourage students to use the compact 1.2E+2 style which works directly).

I understand wanting to avoid the thousand comma if there is no decimal. And personally I don't have that problem in Sweden. But it seems a shame to give wrong for 152,000 if 152000 is right. Sometimes I wonder about built-in JS feedback for things that aren't recognized as "numerical" by our engine that pops up a description of allowed formats.

Even though I don't need it often I think allowing rational numbers like "2 3/4", where the space is needed, would be good to support (as does Excel).


In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
After the week-end, home should stay as the more peacefull place on earthsmile.

Pierre
In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
Unload the last code and reinstall clean but cannot create questions?
I could on the december installation...

Pierre

P.S. I had to delete opaque directory to complete the installation.
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Pierre Pichet -

Fatal error
: Call to private question_state::__construct() from context 'question_state' in C:\moodle\moodle\site\tim\question\engine\lib.php on line 268
When trying to create questions (in a course) or even when trying to run the test in question/type/numerical

change private to public
public function __construct() {
}
Seems to work...

Pierre
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Pierre Pichet -
This morning (last posts were done at 1:00 AM local time), I trace the preview problems with new database tables I have to create.
As this is weekendwink, more news tomorrow or tuesday.

Pierre

In reply to Pierre Pichet

Re: Numerical question answer processing.

by Pierre Pichet -
Curiosity is a strong driving forcewink.
database set , first test OK.

Pierre
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Pierre Pichet -
line 372 of engine/datalib
delete_records_select('question_usages quba', $where);
gives SQL
DELETE FROM mdl_question_usages quba WHERE quba.id = 14

which MYSQL do not understand ( I try also in MYSQL Query Browser)

Pierre


In reply to Pierre Pichet

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Grrr! I had forgotten that bit of MySQL stupidity.

I am, of course, using Postgres for development. I ought to test on MySQL myself some time.

Anyway, thanks for the bug report. I think I have now fixed it. Please can you test. Thanks.
In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
I download and test OK .
The attempts are deleted correctly.
I test your unit detection setting a unit E (i.e Einsteinwink)
unit E-1 multiplier 10
unit E+1 multiplier 1
unit E1 multiplier 1

the good numerical answer was 65

0.65E+2 was OK
6.5E1 was bad as E1 multiplier is 1
65E1 was OK as E1 multiplier is 1
6.5E+1 was bad as E1 multiplier is 1
65E+1 was OK as E1 multiplier is 1
6.5E-1 was OK as E1 multiplier is 10
6500E-2 was OK as its should using E notation.

All this woriks correctly with the code as if a unit is recognized, it is subtituted by its multiplier before grading as a numerical.

All tests were done in preview.
Incidently the bug in engine/lib.php line 262 is real, the function should be set to public to work with preview

public function __construct() {
}

Pierre



In reply to Pierre Pichet

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Pierre. That is just an evil thing to do. Torturing the poor code I wrote wink

Still, interesting to find out that happens when you try it.

However, for now I am not worried if units like E+/-1 work. I did consider it briefly, but I don't think there are any real units like that.

Also, I am sure that the constructor of the question_state class should be private. What error are you getting?
In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
jsut click on preview of a numerical question
Fatal error: Call to private question_state::__construct() from context 'question_state' in C:\moodle\moodle\site\tim\question\engine\lib.php on line 269
change to public and everything does fine.
I just copy your entire code except for calculated

Pierre


This is not related to calculated which does not work for another reason.
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
That error is weird. It is saying it is trying to call question_state::__construct from within the question_state class. That should be allowed by private.

Which version of PHP are you using?

In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -

PHP Version 5.2.4

Pierre

In reply to Pierre Pichet

Re: Numerical question answer processing.

by Pierre Pichet -
Incidently, looking at 2,0 requirements, I need to migrate to a newer versionwink.

Pierre
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
Ah, there is a bug in my code, and there is a bug in more recent versions of PHP that mean that my buggy code works for me, even though it shouldn't! I reported it as http://bugs.php.net/bug.php?id=50911

The correct fix is the change the constructor to protected. I'm about to commit that fix.
In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
"I don't think there are any real units like that"
I agree but units can be quite tricky
Just an example of the units of the universal gas constant (from wikipedia)wink


Valeurs de R Unités
8,314472 Jmol-1K-1
0,0820578437 LatmK-1mol-1
8,20574587×10-5 m3atmK-1mol-1
62,3637 LTorrK-1mol-1
83,14472 LmbarK-1mol-1
1,987 calK-1mol-1
6,132439833 lbf·ft·K-1·g·mol-1
10,7316 ft3·psi·°R-1·lb-mol-1
8,63×10-5 eV·K-1·atom-1
0,7302 ft3·atm·°R-1·lb-mole-1


cm, m are simple one but just think of area cm2 or volume cm3
How student can set this correctly.

This is why my proposal for 2,0 to include a multichoice option for units where the teacher can edit these units in a correct HTML format.
Ideally the unit field should be HTML editor...or the instruction text can contain the complex formula with a letter as a subtitute for the choice.

Pierre

P.S. another case in chemistry always from Wikipedia

For a chemical reaction where substance A and B are reacting to produce C, the reaction rate has the form:

Reaction: A + B → C
\frac{d[C]}{dt} = k(T)[A]^{m}[B]^{n}

k(T) is the reaction rate constant that depends on temperature.

[C] is the concentration of substance C in moles per volume of solution assuming the reaction is taking place throughout the volume of the solution (for a reaction taking place at a boundary it would denote something like moles of C per area).

The exponents m and n are called orders and depend on the reaction mechanism. They can be determined experimentally.

A single-step reaction can also be written as

\frac{d[C]}{dt} = Ae^\frac{-E_a}{RT}[A]^m[B]^n

Ea is the activation energy and R is the Gas constant. Since at temperature T the molecules have energies according to a Boltzmann distribution, one can expect the proportion of collisions with energy greater than Ea to vary with e-Ea/RT. A is the pre-exponential factor or frequency factor.

The Arrhenius equation gives the quantitative basis of the relationship between the activation energy and the reaction rate at which a reaction proceeds.

The units of the rate coefficient depend on the global order of reaction:

  • For order zero, the rate coefficient has units of mol·L-1·s-1
  • For order one, the rate coefficient has units of s-1
  • For order two, the rate coefficient has units of L·mol-1·s-1
  • For order n, the rate coefficient has units of mol1-n·Ln-1·s-1
In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
You get a more complex analysis because you are trying to reroduce the number decoding used by spreadsheet or other sofware that need to handle number with regionalization.
The task is also greater because units could be also present in the string which is the actual 1,9 code on which you are working

In 2,0 units should be separated from the number using 2 text input elements, which will simplify the task (MDL-20296).


Pierre


P.S. Is the decoding used in spreadsheet is copyright ?

In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
As you are re-engineering the question handling to give robustness and flexibility, we should consider that teachers often asked as much flexibility in e-learnig as they have in a classroom.
Some want strict number rendering and other just the good numerical value.

In my proposal for units handling, I try to answer all requests I have seen in the forum.

Could you build a first solution where we can afterward add supplementary features like
  • fraction as 1/3,
  • various number formats chosen by the teacher.
  • penalty applied to format like for units if not in the format asked by the teacher
  • etc.
?

Pierre
In reply to Pierre Pichet

Re: Numerical question answer processing.

by Ray Lawrence -
Yes, rather than trying to program this for all eventualities could the format be built into the construction of the question i.e. selected by the teacher?

The default option(s) presented might be driven by the language in use for the course (although I suppose the rest need to be avaialble too).

Another neat option (although probably too much to ask) would be to give the teacher the option to add a message in the question specifying the required format for the student's entry into the answer field.
In reply to Ray Lawrence

Re: Numerical question answer processing.

by Pierre Pichet -
"Another neat option (although probably too much to ask) would be to give the teacher the option to add a message in the question specifying the required format for the student's entry into the answer field."

The actual code in moodle HEAD (before engine rebuilding) offers such option.
See MDL-20296

Pierre

In reply to Tim Hunt

Re: Numerical question answer processing.

by Pierre Pichet -
Just a "saturday" like commentwink.

Your proposal seems OK at first sight.


"We should support units like £ in future, that comes before the number."

This is why I proposed to be able to set the unit before or after the number.
This will allow questions where the responses can be given in $ or £ with different multiplier.

Pierre


P.S. Is your solution can handle a US student that follows an Open University course ?



In reply to Tim Hunt

Re: Numerical question answer processing.

by Phil Butcher -
Tim,
 
1    splitting the number from the unit = good
2    how to handle decimal and thousands separators in numbers = good (previous method v. bad)
3    whether to handle 1x10^24? For simplicity I'd stick with 1e24 and get students to use it. We only ask for scientific format with 'x10' and a real superscript when we're testing this as a learning outcome. Of course I would like 1x1024 (i.e. real superscripts) but I suspect implementing superscript input is a different question type.
4    analyse units separately = good
5    I haven't looked at how complex the units go.
 
But my over-riding comment would be such a major change that I've always thought that we'd need a new question type. The reason is that the questions our academics write fall into two categories
    - require just a number and we specify the units the students should use
    - or require a number and units and we specify that the units must be provided and then we only mark as right if both are present and correct and we comment if one or the other is wrong. Once we ask for, and test, units then it does become necessary to allow for 1 kg or 1000 g but things rapidly become very complex.
 
I believe that we hardly use the units at all as a multiplier.
 
So once the question engine rewrite is done for what is currently available there are further ideas I'd like to pursue.
 
Phil 
In reply to Phil Butcher

Re: Numerical question answer processing.

by Pierre Pichet -
Thanks for your comments.

In the actual code you can
  1. use the units and a multiply factor
  2. ignore units by just setting the first unit field empty
    1. this option is used in cloze questions
The proposal already on Head and that will migrate to the new engine just extends the first option i.e.
  1. use the units and a multiply factor
    1. the student write the unit which can be either
      1. at left of the number
      2. at right of the number
    2. the teacher prepare a list of units to choose from as a multichoice subquestion
      1. at left of the number
      2. at right of the number
  2. ignore units by just setting the first unit field empty
    1. this option will continue to be used in cloze questions
The new options are set by a more complex creating interface inside the same qtype.

Having a single qtype allows the teacher to create a copy of the question and change the unit handling at will.
However, there is no apparent sign (question type icon) of the unit handling in the question bank interface.
It remains to the teacher to create a personal way to code the units handling, for example in the question name to handle the various options.

Pierre


P.S. the old questions remain valid