Numerical question answer processing.

Numerical question answer processing.

Tim Hunt -
回帖数:29
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
In my question engine re-write, I have just got to the bit where I have to re-implement unit processing in numerical questions. Well, I did not think I would have to re-implement it, until I looked at the existing code.

The current code does roughly this:
  1. Remove all white-space from the student's answer.
  2. If the student's answer contains a single ',' and no '.', convert the the ',' to a '.'. Otherwise, remove all ','s.
  3. Use a regular expression to check that the answer if a valid number, followed by an optional unit.
I don't like this for several reasons.
  • Stripping all whitespace may corrupt some units - though I cant' quite think of a specific example at the moment.
  • The bit with ',' is a valiant attempt to support Europeans who like to write 3,142 intead of 3.142 (and we must support that). However, I think it is terrible that '3,000,000' is treated as 3000000; '3,000' is treated as 3; '3,000.' is treated as 3000.
This was getting rather long, let me split it into separate posts. More in a minute ...


回复Tim Hunt

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
So, then I started thinking about how it should work, but it is tricky, so I decided to look what answers our students had actually given in response to numerical questions in the past. Therefore I ran the following query:

SELECT qst.answer, COUNT(1) as num
FROM mdl_question_states qst
JOIN mdl_question q ON qst.question = q.id
WHERE q.qtype = 'numerical' AND qst.answer <> ''
GROUP BY qst.answer
ORDER BY num DESC

It turns out that we have had a total of 189514 responses (because of things like adaptive mode, there there will be several responses stored per question) which group into 6123 distinct responses. (Note that this whole analysis ignores numerical questions that were selected randomly.)

The most popular response is 0, which was recorded 3868 times!

The huge majority of responses are given without using , as a thousand separator. (I am in the UK, remember.)

The first negative answer is -157000, recorded 2619 times.

The first weird entry is 'none', given 1639 times. 'None' scores 514, 'None.' 68 times, 'NONE' 35 times

The first answer with a decimal point is 24.8, given 851 times, so for some reason most Numerical question at the OU have integer answers!

The first answer with a unit is 71%, given 174 times. (71 without a unit given 222 times). % is by far the most common unit entered.

The next most popular unit is £, starting with £2.50 recorded 22 times, but there are plenty more that start £. Of course, Moodle cannot cope with units like that.

'pass' was given 91 times. '?' 74 times, 'xxxxxx' 50 times, 'dunno' 40 times, 'xxxxx' 40 times, 'xxxxxxx' 36 times, '-' 32 times.

'a' was given 63 times. I have no idea what is going on there. That was given in answer to a lot of different questions, and having looked at them there is nothing to lead a student to enter a. (Ah! perhaps the student was just putting in rubbish to see the feedback.)

The first response that uses , as a thousand separator is -157,000 54 times, then 157,000 40 times. (Clear what is going on there!) presumably those ended up being marked wrong, even though the student was right. Oh dear.

'one' was recorded 39 times, some other written out numbers were also given, but rarely.

'completed' 34 times, 'Completed' 28 times.

'i' 32 times - looking at some questions, I don't think that is anything to do with complex numbers.

'nil' 28 times

The first time there is an explicit '+' sign is +100 given 16 times.

Down in the tail, among the rare answers are all sorts of strange things like 'half past two', '2:30' also features. The first rude word, which I will not repeat, was recorded 12 times.

The first time you see a 'real' unit it is among those answers given 12 time. It is 'Angstroms', but the way.

there are a few places where the student has written something like '105000 debit'. I guess seeing that, the teacher could decide to make that a unit with multiplier -1.

Even further down, you start to see people trying to write negative numbers like (1234). Bloody accountants! You also start seeing things like '4.32 million'.

The first time you see a ',' being used as a decimal point by OU students is '1,42' given only 4 times.

I will add, that having looked a some of these questions, a lot of them have the warning

Enter your answer to the nearest whole £ into the blank space provided below.

[Note: Do not enter commas or spaces or a £ sign within the number you enter.]

at the top, which I would guess is affecting student behaviour.
回复Tim Hunt

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
OK, so what do I conclude from all that.
  1. Supporting ',' as a decimal point is not worth doing among UK students.
  2. Supporting ',' as a thousand separator is not essential, but probably is worth it.
  3. Stripping spaces from the answer is irrelevant. Probably best not to.
  4. We should support units like £ in future, that come before the number.
  5. We should consider supporting (1234) meaning -1234 in future.
Of course, it is vital to support ',' as a decimal point for languages that need it.


Now, as part of the translation system, each language that Moodle supports defines the correct decimal point and thousand separator characters. Therefore I propose that numerical questions should work like this.
  1. Take the student's answer and strip off all leading and trailing whitespace.
  2. Remove all occurrences of get_string('thousandssep') that are followed by a digit.
  3. Replace get_string('decsep') with '.'.
  4. Use a regular expression to separate the number part from the unit, based on the formal spec for PHP floating point numbers.
  5. Trim any leading of trailing whitespace from the unit.
Does that seem sensible? Can anyone forsee any problems with this approach?

Drat! yes. I can already. At the point where you try to regrade a question, you don't know the current language, so get_string('thousandssep') won't return the right thing. Ah! so I just need to store the user's language when they start the question.

回复Tim Hunt

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
Hmm. Should we try supporting input like 3x10^8, 3*10^8, 3*10**8, ...?

Also, I am now thinking we should only process thousand separators the come before any decimal point.
回复Tim Hunt

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
The problem with trying to support units like £ is that some people write £-100, and others write -£100. Hmm.
回复Tim Hunt

Re: Numerical question answer processing.

Joseph Rézeau -
Core developers的头像 Particularly helpful Moodlers的头像 Plugin developers的头像 Testers的头像 Translators的头像
Excel writes this as -£100. I would suggest having a look at how Excel (or OOCalc) write those figures, according to the various languages.
回复Joseph Rézeau

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
Well, anyway, ignoring that problem for a moment, I think I have some code that works with just sensible units like m and cm.

The code is http://github.com/timhunt/Moodle-Question-Engine-2/blob/new_qe/question/type/numerical/questiontype.php#L442 and the unit tests are in http://github.com/timhunt/Moodle-Question-Engine-2/blob/new_qe/question/type/numerical/simpletest/testanswerprocessor.php.

If anyone feels like code-reviewing that, or suggesting some more test cases, I would be most grateful.
回复Tim Hunt

Re: Numerical question answer processing.

Jeff Forssell -
Loved your analysis of student answers! Thanks for that.

And my heartfelt sympathy for entering this formidable jungle of formats! I've been trembling at the edge for ages. 伤心

I have a bunch of my hopes for various recognizable formats in Numerical questions which I've described here: http://docs.moodle.org/en/Numerical_question_units_and_intervals

I support recognizing 1.2*10**-3 1,2*10^+02 as numerical answers (even though I encourage students to use the compact 1.2E+2 style which works directly).

I understand wanting to avoid the thousand comma if there is no decimal. And personally I don't have that problem in Sweden. But it seems a shame to give wrong for 152,000 if 152000 is right. Sometimes I wonder about built-in JS feedback for things that aren't recognized as "numerical" by our engine that pops up a description of allowed formats.

Even though I don't need it often I think allowing rational numbers like "2 3/4", where the space is needed, would be good to support (as does Excel).


回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
Grrr! I had forgotten that bit of MySQL stupidity.

I am, of course, using Postgres for development. I ought to test on MySQL myself some time.

Anyway, thanks for the bug report. I think I have now fixed it. Please can you test. Thanks.
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
Pierre. That is just an evil thing to do. Torturing the poor code I wrote 眨眼

Still, interesting to find out that happens when you try it.

However, for now I am not worried if units like E+/-1 work. I did consider it briefly, but I don't think there are any real units like that.

Also, I am sure that the constructor of the question_state class should be private. What error are you getting?
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
That error is weird. It is saying it is trying to call question_state::__construct from within the question_state class. That should be allowed by private.

Which version of PHP are you using?

回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

Re: Numerical question answer processing.

Tim Hunt -
Core developers的头像 Documentation writers的头像 Particularly helpful Moodlers的头像 Peer reviewers的头像 Plugin developers的头像
Ah, there is a bug in my code, and there is a bug in more recent versions of PHP that mean that my buggy code works for me, even though it shouldn't! I reported it as http://bugs.php.net/bug.php?id=50911

The correct fix is the change the constructor to protected. I'm about to commit that fix.
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复删除的用户

Re: Numerical question answer processing.

Ray Lawrence -
Yes, rather than trying to program this for all eventualities could the format be built into the construction of the question i.e. selected by the teacher?

The default option(s) presented might be driven by the language in use for the course (although I suppose the rest need to be avaialble too).

Another neat option (although probably too much to ask) would be to give the teacher the option to add a message in the question specifying the required format for the student's entry into the answer field.
回复Ray Lawrence

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复Tim Hunt

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。
回复Tim Hunt

Re: Numerical question answer processing.

Phil Butcher -
Tim,
 
1    splitting the number from the unit = good
2    how to handle decimal and thousands separators in numbers = good (previous method v. bad)
3    whether to handle 1x10^24? For simplicity I'd stick with 1e24 and get students to use it. We only ask for scientific format with 'x10' and a real superscript when we're testing this as a learning outcome. Of course I would like 1x1024 (i.e. real superscripts) but I suspect implementing superscript input is a different question type.
4    analyse units separately = good
5    I haven't looked at how complex the units go.
 
But my over-riding comment would be such a major change that I've always thought that we'd need a new question type. The reason is that the questions our academics write fall into two categories
    - require just a number and we specify the units the students should use
    - or require a number and units and we specify that the units must be provided and then we only mark as right if both are present and correct and we comment if one or the other is wrong. Once we ask for, and test, units then it does become necessary to allow for 1 kg or 1000 g but things rapidly become very complex.
 
I believe that we hardly use the units at all as a multiplier.
 
So once the question engine rewrite is done for what is currently available there are further ideas I'd like to pursue.
 
Phil 
回复Phil Butcher

本讨论区帖子已移除

本讨论区帖子的内容已移除,无法再访问。