Testing CBM

Testing CBM

by Isabelle Langeveld -
Number of replies: 28

I would really like to test the Certaininty Based Marking option in the 2.1 Quiz Module. How do I go about that? Do I have to download something. I am totally new at testing matters, but I am very curious about the way CBM is going to be handled. I want to compare it to Tony GM's code I have already installed.

Average of ratings: -
In reply to Isabelle Langeveld

Re: Testing CBM

by Jean-Michel Védrine -

Hello Isabelle,

If you want only to test CBM, you can go to http://qa.moodle.net wich is running the latest code. You will not need to install anything so it's easy but be warned that all work you will do will be erased after a moment, because every hour (on the hour) the database and files are reset.

I am sure Tim is very intrested with comments from users familiar with CBM.

In reply to Jean-Michel Védrine

Re: Testing CBM

by Tim Florian -

Where is it at on the site? I took the quizzes and did not see CBM questions.

In reply to Tim Florian

Re: Testing CBM

by Jean-Michel Védrine -

You need to login first as a teacher and create a quiz with CBM (you don't need to add many questions), then after that you can login as a student and see the student side.

CBM is part of what is now called a behaviour. (Sorry developper doc, user doc is not ready I think, see also Tim discussion on the same page to know if behaviour is a quiz property or a question property)

For instance at quiz creation you can choose "deferred feedback with CBM" or "immediate feedback with CBM".

You need to test differents behaviours to see if they suit your needs.

It have the added benefit that you can use all these behaviours with preexisting questions smile

Average of ratings: Useful (1)
In reply to Jean-Michel Védrine

Re: Testing CBM

by Isabelle Langeveld -

This works absolutely great. The feedback is presented so clearly. And working with variable marks per question is not a problem at all.

But, the difference with Tony's code is that you cannot raise your score by choosing the right answer AND choosing high in certainty. If the mark is 1 and you choose high, you still only get 1 point and not 3. In this way it is very easy to end up with a total score of -3. That I don't really like. I would have to set every question at a mark of 3 to adjust the score.I think you should get a bonus if you are right AND you are very certain.

Can you explain why you have chosen this scoringmethod? Or is it something I can influence when I set up a quiz?

I very much like the UI for 2.1. The colors make it so easy to read the feedback page.Also I like the column left of the question with all the marking information.

I also like the possibility of adding Hints. This is new too! I am so pleased.

Tim, keep up the good work! Are you still releasing by the end of june?

In reply to Isabelle Langeveld

Re: Testing CBM

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

The constraint is really the Moodle gradebook. It wants 'maximum grade' to mean maxiumum grade. So, if, with CBM, the maximum possible mark for a question is 3, then that is how it needs to be set up in the quiz. Set each question to be marked out of 3.

Now I realise that makes the psychology slightly different - if you are not very certain, but get the question right, you mark is presented as 1 out of 3 - as opposed to if you get the question right with full certainty your mark is presented as earning three marks for a one-mark question. I don't know what effect, if any, that has.

I am a mathematician by traning, and that part of me knows that the numbers are the same however you present them. Anyway, the answer to your question is that I did what I did to fit into the general pattern that the rest of Moodle follows.

And the plan is still to release Moodle 2.1 by the end of this weeks. I had better get back to fixing bugs wink

In reply to Tim Hunt

Re: Testing CBM

by Isabelle Langeveld -

Indeed, the psychocoly is different but the numbers are the same. It will be a matter of explaining the students how to interpret their grades.

If this is the 'general pattern' and everybody is okay with this method I will follow...

In reply to Jean-Michel Védrine

Re: Testing CBM

by Tony Gardner-Medwin -
Picture of Particularly helpful Moodlers Picture of Plugin developers

This trial site is very useful! Thanks.

There are some issues with the implementation of CBM that will be I think important to deal with to avoid confusing people.

(1) A simple one - the criterion for choosing the highest certainty option should be P>80% not P>85% (i.e. if you estimate a probability of being correct >80%, then you will on average get a better mark by choosing C=3 rather than C=2). This is illustrated in graphs at http://www.ucl.ac.uk/lapt/ , or you can solve the equations if you prefer!

(2) Judging how likely you are to be correct (P, above) only makes sense if your response is being marked right or wrong. It doesn't make sense as a single judgment if there are several parts to your answer, as in MCQs with more than one response allowed, or in multiple matching Qs (as it happens, the first two examples on the trial site). Ideally (as in LAPT) multiple response Qs like these would ask for a certainty judgment for each separate response - you may be quite sure of the capital of France and very unsure about the capital of Estonia, or vice versa. Without introducing that complication I think it is probably best not to use CBM at all for this type of multiple response Q. I'm sure this could be managed easily in the code, though unfortunately I haven't yet got the 2.1 code to run on my Windows PC to take a look.

3) I rather go along with Isabelle's concern about giving people .33, .67 or 1 mark instead of 1,2,3 if there are default settings (=1) for the grade on each question. It's important that the mark scheme be easily understood and transparent. In my Moodle implementation  (see http://www.ucl.ac.uk/lapt/moodle19/moodle) I prefer to simply over-ride any weight settings for individual Qs when CBM is switched on, and mark every response on the 1,2,3 / 0,-2,-6 scale. I label the response buttons C=1,2,3 to help with this, and the appropriate probabilities are exhibited if you hover over the buttons. It is much better (in my view) that a student should see "Mark=-2" and know immediately "yeah, wrong, and I did say I was fairly sure - why didn't I think?" instead of Mark=-1.33 "What's that? Oh, I see, I was wrong, and since it's weighted x2 and I said I was fairly sure, I would have got 2/3 of 2 if I was right - that's 1.33, but I wasn't right so it's -1.33. Got it. Now what was the question again?". Since CBM is  most valued in formative assessment (self-tests), to help learning, it's v important to minimise any distractions from thinking about the Q&A and the lessons to be learned.

I haven't looked at the 2.1 gradebook tables yet in relation to CBM. If people go along with the simplicity arguments I advocate then one can easily handle these tables for maximum value, as I have tried to do in my CBM code for 1.9 and 2.0, for example marking both in relation to the entire quiz and in relation to the subset of Qs that a student may have chosen to test herself/himself on. 

Average of ratings: Useful (1)
In reply to Tony Gardner-Medwin

Re: Testing CBM

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

1. Doh! I can't believe I calculated this wrong the first time. Will fix. (MDL-28108)

 

2. I agree that Moodle 2.1, but allowing all possible combinations, allows some silly combinations. Therefore we are relying on teachers just to use the sensible combinations of options.

If you wanted to change the code, so that you could have CBM only apply to the dichotomous items in your test, you would need to hack the code a bit. Two ways you could do that:

a. override the make_behaviour method in some question classes, to return deferredfeedback bahaviour when deferredcbm is asked for: the https://github.com/timhunt/moodle/blob/master/question/type/questionbase.php#L641

b. change the deferredcbm behaviour to work exactly like deferredfeedback for non-dichotomous items. (You can probably work out if an item is dichotomous using the get_possible_responses function, or some other heuristic.

 

3. Similarly, for now, I will leave it for teacher to set this up right.

If you want a simple code-change to make it work the way you think it should, I would suggest that in quiz_add_quiz_question, in https://github.com/timhunt/moodle/blob/master/mod/quiz/editlib.php#L147, you add a 3 * to the line

$instance->grade = $DB->get_field('question', 'defaultmark', array('id' => $id));

if $quiz->preferredbehaviour includes 'cbm'. That is, help the teacher set things up right by default, but don't change anything more complex than that.

In reply to Tim Hunt

Re: Testing CBM

by Isabelle Langeveld -

I think I'll just try it out first as it is with the teachers in the organisation I'm setting this up for to see if they get it with a simple explanation beforehand.

In reply to Isabelle Langeveld

Re: Testing CBM

by Isabelle Langeveld -

Hi,

I upgraded one of my sites tot 2.1 and tried out the CBM-options with a quiz I already made in 2.0. I found out that the option CBM with deferred feedback works fine but the option CBM with immediate feedback does not. The feedback just does not appear.

What could be wrong?

In reply to Isabelle Langeveld

Re: Testing CBM, immediate feedback

by Isabelle Langeveld -

Just 'immediate feedback' doesn't work either..

In reply to Isabelle Langeveld

Re: Testing CBM, immediate feedback

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

On the quiz settings form, what are the Review options for During the attempt set to?

In reply to Tim Hunt

Re: Testing CBM, immediate feedback

by Isabelle Langeveld -

And yes, there lay the problem. I checked the boxes under During the atempt and now it works fine. Thanks for your patience with me.

Just to check, it would be rather useless to combine CBM with multiple tries would'nt it? Because, if your first try is incorrect and your were unsure, then you would be pretty sure at the next try on the same question wink

In reply to Isabelle Langeveld

Re: Testing CBM, immediate feedback

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

I agree.

In reply to Tim Hunt

Re: Testing CBM, immediate feedback

by Isabelle Langeveld -

I've found out that it is very easy to score a negative grade for a test if you work with CBM. I got a -0,4 on a quizmaximum of 10.

This grade does not appear in the Gradebook. It just shows a 0,0. Is this how you meant it to be? Or is this unexpected?

In reply to Isabelle Langeveld

Re: Testing CBM, immediate feedback

by Jean-Michel Védrine -

Hello Isabelle,

You remember Tim response "The constraint is really the Moodle gradebook. It wants 'maximum grade' to mean maxiumum grade."

You have discovered another constraint of the gradebook : it wants all grades to be at least 0. No negative grade allowed in the gradebook.

Maybe none of your students will object big grin

In reply to Jean-Michel Védrine

Re: Testing CBM, immediate feedback

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

The question is, should we apply the same constraint in the quiz (min total grade for an attempt is 0)?

In reply to Tim Hunt

Tests

by Manish Verma -

If someone does get negative marks in total, then they should be shown at least to the teacher role onwards in the quiz/gradebook. That way teacher knows where exactly the student stands and can act upon that.

In reply to Manish Verma

Re: Tests

by Jean-Michel Védrine -

I think this is not a small change in the gradebook code !!

No chance such a change happend on a stable branch like 2.0 or 2.1 ! Maybe if enought people agree with your idea you can create a tracker issue for a future release of Moodle.

Tim's question was about what is displayed in the quiz module wich is quite another story.

My personnal opinion is that the quiz module's display should not be limited by the gradebook's constraints, but of course once in the gradebook all grades must obey gradebook's rules.

In reply to Jean-Michel Védrine

Re: Tests

by Isabelle Langeveld -

Thanks guys, I had not seen the response to the problem of a 0 not showing up in gradebook if the score is say -4. I think a 0 should show up like a 0 for the teacher as well as the student.

But the real question is: do you want your students to score  -4?

I had a lot of quizzes tested by the teachers. They tested eachothers quizzes. They made q's with variable marking or grading, whats the right term?

There is a quiz with 5 Q's. I made a table to show you what happens with different grading systems in this Qz. In the first column I put the nrs of the Q's, the value of each Q and the answer given. Whether you use the grading system of the LAPT-OU or Tims grading system, this student scores a 0, but he has only 1 Q incorrect! This Q happens to have a value of 3 and he chose C=3, but even if you set every Q to 1 point, the student fails the Qz.

Q no + value + inc/c

Score System OU

Score System Hunt

Each Q value 1 system OU

Each Q value 1 + milder system

Variable value mild grading

1  (3) inc/c3

-18

-6

-6

-3

-6 (3)

2 (2) c/c3

6

2

3

3

4 (2)

3 (1) c/c3

3

1

3

3

3 (1)

4 (1) c/c3

3

1

3

3

3 (1)

5 (2) c/c3

6

2

3

3

6 (2)

endscore

0 on 27

0 on 9

6 on 15

9 on 15

10 on 30

My client is dealing with students who are not used to online learning. Most of them are not very keen on taking the course. It's obligatory to keep their license. The client is afraid of massive protest and with reason I think.

What to do?

  1. I want to keep the CBM.
  2. I am thinking of setting all the Q's to value 1 (the questiondev's have great difficulty to distinguish levels of comlexity!) NB: can I set all the already imported Q's to 1 in one go?
  3. I want to make the grading (punishment) milder.

This would give the result in the green column in the table.

But this means of course f#cking with the code. My techie partner is not delighted because of the upgrades.

What do you think?

In reply to Isabelle Langeveld

Re: Tests

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Be very careful changing the scoring rules. You need to draw the graph like the one 2/3 of the way down http://www.ucl.ac.uk/lapt/ to make sure your scoring rules 'work'. There needs to be a range of certainlty probabilities where each level of certainty is 'best'.

If you are really sure you want to change the scoring rules, then the easy way is to edit the code at https://github.com/timhunt/moodle/blob/master/question/behaviour/behaviourbase.php#L603. The safe way (with regard to future upgrades) is to make a new behaviour plugin (question/behaviour/gentlecbm or something. Docs at http://docs.moodle.org/dev/Developing_a_Question_Behaviour)

 

The other key point is that you must give your students some pratice (formative) test during the course, before they have to do one that counts towards passing the course. It takes practice to be able to accurately judge your own certainty on the three-points scale. People worry about whether CBM penalises certain people (e.g. women are naturally more timid so they will be at a disadvantage) - in fact we have seen that in the tread above, if I recall correctly. If you look at the CBM scoring rules you will see that they penalise both over-confident and under-confident people. To maximise your score you have to accurately assess your certainly. The research shows that the first time someone encounters CBM, these naturaly biases do affect the scores, but after some practice all students learn to deal with the system, and there is no bias.

In reply to Tim Hunt

Re: Tests

by Isabelle Langeveld -

Thanks for your swift comments guys. I have to think and talk it over.

About practice: the students do get a lot of practice because most of them make 18! quizzes with CBM. And the quizzes are really for practice/help to understand and memorize the material and not for pass/fail the course. I used the wrong words. We work with a 10-pointscale for the endscore on a quiz. If a student has 4/5 Q's right he should expect to score at least a 6. If he scores a 5 with his 6 CBM-marks it feels like 'insufficient'/failing.

I think you're also in for these kind of problems if your quiz only has 5 Q's. The effect of 1 Q which you answer wrong with C=3 is heavier then if you do this once in a quiz with 10 Q's. Or is this nonsense? (I have had very traumatic experiences with math in highschool and still start to sweat if its about figures and calculus.)

Would'nt it be great to show the quiz result with AND without CBM? If its just for practice... Impossible to program probably.

About rounded grades: I really think this is important. Showing people that their grade is adjusted to 0,32 is unnecessarily complicated.

So what to do: if we put the LAPT-code back in, instead of Tims code, would all the imported questions go to weight =1 in one go?

In reply to Isabelle Langeveld

Re: Tests

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

It would not be impossible to program "showing the quiz result with AND without CBM" - at least in certain places.

The place it would be easiest to do is at the top of the quiz review page. It could be part of the table that display various summary information about the attempt like the total mark and the time taken.

As well as score without CBM, the other obvious thing to display is, for each certainly level, the number of quetsions answered at that level, and the percentage of them that were right.

 

It is something I would like to do when I get the time, but I don't kown when that will be.

In order to implement this, what you need is a new function/method in the behaviour which takes the attempt object ($quba) and returns an array of extra data to include in the table. (If you look at the code https://github.com/moodle/moodle/blob/master/mod/quiz/review.php#L163 you will see that the table is already built as an array of data that is then displayed.)

In reply to Isabelle Langeveld

Re: CBM Tests

by Tony Gardner-Medwin -
Picture of Particularly helpful Moodlers Picture of Plugin developers

Isabelle - The things you want are implemented in the CBM code I wrote for Moodle 2.0, but would still require significant work as Tim says to set up into 2.1. I did warn you about this a couple of months ago. The overall CB score for your example student would indeed be around 60%  as you felt the students would expect (depending in detail on what type of Qs they are - e.g. it would be 61% if they were MCQs with 5 options). Students would get full breakdown in 2.0 (or LAPT) showing how they performed on Qs they answered at the different certainty levels.

Running things on LAPT (www.ucl.ac.uk/lapt) is right outside Moodle, but gives well optimised feedback to both students and staff. At UCL and Imperial College in London we run CBM tests by using simple links within Moodle to the LAPT software, which does away with any problems when there are upgrades to Moodle. Nevertheless, it would be nice to get CBM code as student-friendly as possible inside Moodle so that people can try it out with just a flip of the switch within their quizzes. Your comments about 2.1 are very helpful, but maybe you aren't using the best system at present for your students.

In reply to Tony Gardner-Medwin

Re: CBM Tests

by Isabelle Langeveld -

Well Tony we'll talk about this soon.

I must say I found the feedback with LAPT too difficult to show to the student in the VLE I'm working on now.

Tim, hopefully you do find the time soon to show those results in the table. And maybe you can change the marking to round numbers. That does look so much better I think.

In reply to Isabelle Langeveld

Re: CBM Tests

by Tony Gardner-Medwin -
Picture of Particularly helpful Moodlers Picture of Plugin developers

You say:  "I must say I found the feedback with LAPT too difficult to show to the student in the VLE I'm working on now."

The immediate f/b with CBM (mark=1,2,3 or 0,-2 or -6 if incorrect) is simple, so is not I think what you're referring to here. I guess you are talking about the overall scores on a test - which incidentally in a formative self-test are I think of relatively minor importance because (unlike the immediate Q f/b) they don't contribute much to learning - just to the student feeling good or bad.

I agonise a lot about how best to present overall scores, and I think if you use CBM it is important to understand the issues. In a sense the problem lies with conventional scores, since guesses can give conventional marks that are perceived as quite high. In a typical exercise with MCQs and TF Qs (like your "Basis 2") chance might give on average 38% correct (3.8 / 10). Since CBM rewards correct guesses less than knowledge it reduces this effect: guesses (acknowledged with C=1) would give on average only 13% of the maximum possible score (3.8 / 30). There is an immediate problem that if you present CBM scores in this way (as Tim does in the code for Moodle 2.1) then the students may feel that CBM is bad because it tends always to score them lower. A typical student on a typical exercise gets (from LAPT data) about 70% correct and an average CBM mark of about 1.2 (40% of the maximum possible).

One simple way of bringing the scores more in line and making the comparison more psychologically positive (as currently in the code for Moodle 2.0) is to calculate the CBM percentage in relation to "all correct at C=2". This means that a typical CBM score becomes around 60% rather than 40%, and the maximum possible score (all correct at C=3) is 150%. The idea of 150% jars with some people but can be seen simply as a bonus for not only getting everything right, but also knowing you could justify being sure about everything.

An alternative strategy to bring the scores more in line (used in LAPT) is firstly to convert conventional scores to a more sensible scale (denoted "percent knowledge" or "percent above chance") in which guesses yield 0% on average, and all correct gives 100%. Typical scores (and typical passmarks) then convert to around "50% knowledge".  CBM scores treated in the same way (0% for guesses at C=1 to 100% for all correct at C=3) are more in line, but can be brought still closer in line for average students across the whole range of ability by using an equation that boosts the lower scores a bit while leaving the maximum at 100%. The near equivalence of these two types of score on average is shown in the graph at http://www.ucl.ac.uk/lapt/laptlite/sys/lpscoring.htm  and is both important for anyone involved in standard setting and constructive for its psychological impact on students: students can see whether their insight about the reliability of their answers is better or worse than for other typical students getting the same percentage of questions correct.

In reply to Isabelle Langeveld

Re: Tests

by Tony Gardner-Medwin -
Picture of Particularly helpful Moodlers Picture of Plugin developers

Hi Isabelle,  Several points:

(1) I agree you shouldn't weight Qs differently when you use CBM. Too complicated, ill-defined, and doesn't help with the use of self-tests for learning.  I think Moodle 2.1 should (like my code for 1.9 and 2.0: www.ucl.ac.uk/lapt/moodle19/moodle/mod/forum/discuss.php?d=2 ) ignore weight settings on individual Qs when using CBM and treat all Qs the same (wt=3 in Tim's 2.1 code, I think), so that marks come out as whole numbers. There are serious dangers otherwise, as you suggest. This would lead to your column 4. This student has got 4/5 answers (80%) correct and has claimed to be pretty sure (C=3: P>=80%) that each of these answers is correct. Fair enough! S/he gets 6 CBM marks on 5 Qs (4x3 + 1x-6 ) but would clearly have done better if s/he had recognised uncertainty for what sounds like it may have been classed as a particularly difficult Q that she got wrong. She would have got 10 marks if rating this one C=2 (or 12 marks with  C=1 if P(correct<67%). Weighting Qs differently is always a contentious issue: What's the capital of France, What's the capital of Uzbekistan? Should you weight the 2nd higher because on the whole it would have required more study to know it, or the 1st because (with all respect to Uzbekistanis) it's often more important in life that you know the answer. The essential thing with CBM is that the student focusses on identifying which answers are or aren't (given his or her knowledge) reliable. In feedback you can then maybe point them at why it's really important to know about Tashkent (or Paris!).

(2) I'm not sure if your client wants to use this to help students to challenge themselves and learn, or as pass/fail assessment. As Tim says, no way should you be doing the second without the first, using a marking scheme that - however inspired! - is unpractised. If you wean the students with the first strategy you will have the best chance of getting them (students and client) to see the value of CBM. If the client wants to use CBM for assessment then they can always use conventional grading strategies alongside it and assure the students that they will always pass if they get more than XX%  of answers right, even if they are hopeless at identifying which bits of their knowledge are reliable. I wouldn't like to employ somebody like that myself - but that is the kind of thing it's up to your client to judge! LAPT offers an algorithm for an overall CB score that helps equate standards with and without taking account of certainty ratings. (This is also in the code available for Moodle 1.9, 2.0 but not, I think, 2.1).

(3) Re negative scores: Negative overall scores mean the student has serious misconceptions, about content or about reliability of their knowledge. Or alternatively perhaps the Qs or As are just bad. The student has done worse than would a monkey on the test, since monkeys (with zero content knowledge and only able to make total guesses) would rapidly learn that they accumulated more peanuts in the long run by always pressing C=1, guaranteeing a non-negative score. I tend to truncate scores at zero, since I don't think it's constructive to tell the student anything more quantitative than that their performance is worse than expected for total guesses.

Good luck! And say if you want to use LAPT, which still has many advantages for CBM!
Tony

Average of ratings: Useful (1)