How the heck did you work that out?

How the heck did you work that out?

by Mark Drechsler -
Number of replies: 6
Hi all,

Getting my head around the calculations that happen within the 2.0 workshop, and for the most part I've got it sorted...

... except for how the 'Grade for assessment' is calculated.

Consider the example below:

Huh?

I get that for the Grades Received the score is a weighted average of the scores received, weighted to be out of the maximum mark (80) - no problem. Seems though that the grades given attempts to calculate using the same process, but leaves the total mark as being out of 80 rather than weighting it back to 20, and so the maximum mark truncates at 20, giving everyone an impressive score for their assessment efforts...

Any clues on what I'm doing wrong?

Settings are as follows:

Settings

Loving the new interface on the whole - if I can just get my head around this calculation bit I'm sorted!

Mark.

Average of ratings: -
In reply to Mark Drechsler

Re: How the heck did you work that out?

by Mark Drechsler -
Ok,

I've made a fundamental mistake - I think.

Am i right in saying that the score in the final column 'Grade for assessment' is a reflection of the accuracy of that student's assessment of the other grades? So getting a 20/20 means that their assessment of the other students' submissions was very close to the 'correct' mark?

If so, then is there a formula somewhere which explains this in more detail? Is it relative to the teacher's mark, the mean of the student marks or a combination of both?

If this is correct then I'd love to know the formula which calculated that final column, as I've tried changing the 'Comparison of assessment' value and it seems to cause significant variations in the final results, which is probably right except I have no idea what the underpinning maths is that's driving it all and its sending me batty...

All guidance welcome!

Mark.
In reply to Mark Drechsler

Re: How the heck did you work that out?

by David Mudrák -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Plugins guardians Picture of Testers Picture of Translators

Hi Mark

First of all, thanks for the testing. I will try to explain using your screenshot as the example.

I can see there are three students participating in this workshop. Each of them was asked to assess the work submitted by the other twos. In Workshop 2.0, every participant gets two grades into the course gradebook. The first one is the grade for the submission that measures the quality of the submitted work. In your workshop, this grade for submission can be out of 80. The second grade is a grade for assessment that tries to estimate the quality of assessments that the participant gave to the peers. The calculation of this grade for assessments (also known as "grading grade") is calculated by the artificial intelligence hidden within the workshop as it tries to do teacher's job. In your workshop, the grade for assessment is out of 20.

Now, let us look at the figures displayed. Student1 submitted her work called "fgbgfhy". This submission was assessed by her peers - Student2 and Student3. Student2 gave 51/80 and Student3 gave 80/80. The total grade for Student1's submission is the average, that is 51+80/2 = 65 (rounded - you can increase the number of decimals displayed to get the more precise values). The values from which the total grades are calculated are in bold for every participant.

Student1 was asked to assess two submissions. She gave 40/80 to Student2's submission and 59/80 to Student3's submission - see the column "Grades given". The symbols < and > should help you to realize if the grade comes from a peer or is to given to a peer.

Now, during the Grading evaluation phase, workshop tried to calculate grades for assessment. These grades are put in the braces () in the table. In your workshop, every student got full 20/20 for all their assessments and there their average is 20 again. So the total grade for assessment for Student1 (in the last column) is calculated as the average of two grades 20. Look at one before the last column - the Grades given - for Student1. There are 40(20) and 59(20). You should read this as: Student1 gave grade 40 and the grade for this assessment was calculated to 20. The same for 59(20). Only the bold values - 20 - are used for the average calculation in the last column.

The last thing to explain is why workshop decided that all participants should have full grades for assessment, that is 20/20? There is not a single formula but the process is deterministic and I am going to describe it in Workshop documentation ASAP. Shortly, Workshop picks one of the assessments as the best one - that is closest to the mean of all assessments - and gives it 100% grade. Then it measures a "distance" of all other assessments from this best one and gives them the lower grade, the more different they are from the best (given that the best one represents a consensus of the majority of assessors). The parameter of the calculation is how strict we should be, that is how quickly the grades fall down if they are different from the best one.

But in your case, you have just two assessments per submission. So Workshop can not decide which of them is "correct". Imagine you have two reviewers - Alice and Bob. They both assess Cindy's submission. Alice says it is rubbish and Bob says it is excellent. There is no way how to decide this. So workshop simply says - ok, you both are right and I will give you both 100% grade for you assessment. That is why all your students have 20/20. To fix it, you have two options:

  • Either you must provide an additional assessment so the number of assessors (reviewers) is odd and workshop will be able to pick the best one. Typically, you as the teacher come and provide your own assessment of the submission to judge it
  • Or you may decide that you trust one of the reviewers more. For example you know that Alice is much better in assessing than Bob is. In that case, you can increase the weight of Alice's assessment, let us say to "2" (instead of default "1"). For the purposes of calculation, Alice's assessment will be considered as if there were two reviewers having the exactly same opinion and therefore it is likely to be picked as the best one.

You can combine these two options, obviously.

I hope this helped to clarify a bit. See http://docs.moodle.org/en/Workshop_module for more information, I am going to update it soon.

p.s. by the way, in Workshop 1.x this case of exactly two assessors with the same weight is not handled properly and leads to wrong results as only the one of them is lucky to get 100% and the second get lower grade. p.p.s. note that the calculation of grades for assessment is now pluggable in workshop so there may be more advanced/different methods implemented in the future, even the custom ones p.p.p.s. it is good to hightlight that there is nothing special about teacher assessments. If you want or need, just set a higher weight for your own assessment (as the teacher's one).

Average of ratings: Useful (3)
In reply to David Mudrák

Re: How the heck did you work that out?

by Mark Drechsler -
Hi David,

Thanks for taking the time to respond to this - the pieces of the puzzle are slowly coming together! Thanks in particular for the description of what happens when there are only two participants - makes sense.

I've got another case which doesn't seem to agree with your description though. Check out the screenshot below:

Still don't quite get it

I've set this up to be a little easier to look at. My three students have submitted 'poor', 'average' and 'good' responses, and gone through the process of marking each other's work. Reta has not done well in assessing the work of others - she gave the poor response 72/80 and the good response 28/80, and so her marks for assessing the submissions are poor as you'd expect.

When I look at the allocation of the 'best' score for both Britney and Ryan it all looks fine - the markers who are close have got good marks, Britney has marked way off so she is penalised - all good. Its the responses for Reta's work which are difficult for me to understand though. She has been allocated three scores - 40, 44 and 48 - so I'd have assumed that the score of 44 given by Mark would be the closest to the mean and hence the 'best' score, and so Mark would get 20 for his assessment of the piece, probably with both Britney and Ryan getting 18 or 19 out of 20. See what I mean?

Could be another statistical anomaly given that there are only three submissions, but would be good to know so I can explain these sorts of interesting results when asked.

My buy for the beers when we eventually meet by the way - I can't wait to show off the Workshop to as many people as I can as soon as I understand some of these calculation issues.

Mark.
In reply to Mark Drechsler

Re: How the heck did you work that out?

by David Mudrák -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Plugins guardians Picture of Testers Picture of Translators

Britney has marked way off so she is penalised

I guess you meant Reta is penalised right?

To explain the grades for assessment of Reta's work in details, I would need to know what grading strategy was used and how does the assessment form looks like. The point is that the grading evaluation component (which is the workshop subplugin that calculates the grades for assessments - currently only mod/workshop/evaluation/best is implemented) does not compare the given grades themselves but it compares responses to all assessment form dimensions (criteria). Then it calculates the distance of two assessments, using the variance statistics.

Let me demonstrate it on example. Let us say you use grading strategy Number of errors to peer-assess research essays. This strategy uses a simple list of criteria and the reviewer (assessor) just checks if the given proposition is true or false. Let us say you define the assessment form using three criteria:

  1. Does the author state the goal of the research clearly? (yes/no)
  2. Is the research methodology described? (yes/no)
  3. Are references properly cited? (yes/no)

Let us say the author gets 100% grade if all criteria are passed (that is answered "yes" by the assessor), 75% if only two criteria are passed, 25% if only one criterion is passed and 0% if the reviewer gives "no" for all three statements.

Now imagine the work by Daniel is assessed by three colleagues - Alice, Bob and Cindy. They all give individual responses to the criteria in order:

  • Alice: yes / yes / no
  • Bob: yes / yes / no
  • Cindy: no / yes / yes

As you can see, they all gave 75% grade to the submission. But Alice and Bob agree in individual responses, too, while the responses in Cindy's are different. The evaluation method ''Comparison with the best assessment'' tries to imagine, how a hypothetical 100% fair assessment would look like. In the Workshop 2.0 specification, I refer to it as "how would Zeus assess this submission?" and we estimate it would be something like this (we have no other way):

  • Zeus 66% yes / 100% yes / 33% yes

Then it tries to find those assessments that are closest to this theoretically objective assessment. It finds that Alice and Bob are the best ones and give 100% grade for assessment to them. Then it calculates how much far Cindy's assessment is from the best one. As you can see, Cindy's response matches the best one in only one criterion of the three so Cindy's grade for assessment will not be much high, I guess.

The same logic applies to all other grading strategies, adequately. The conclusion is that the grade given by the best assessor does not need to be the one closest to the average as the assessment are compared at the level of individual responses, not the final grades.

I guess the documentation page I wanted to work on can simply link to this thread as the key issues are explained here smile But I like this style of support by answering questions, it is more constructivistic than writing and reading some Workshop manual. And it reminds me the wonderful times I spent on thinking about all these aspects, sketching formulas and meditating on what the assessment actually is and how teachers do their job.

You may be interested in http://docs.moodle.org/en/Development:Workshop_2.0_specification#Grade_for_assessment for technical details of the calculation.

Average of ratings: Useful (1)
In reply to David Mudrák

Re: How the heck did you work that out?

by Mark Drechsler -
Ahem, yes - Reta was the one I meant smile

This is brilliant - makes perfect sense now, I was just working on the assumption that it was the overall 'Zeus' grade which was used as the tool to determine how the rest of the grades were calculated rather than each individual component.

Thanks so much for taking the time to explain - if you need a hand documenting any of this then let me know, I'd be happy to help.

Cheers,

Mark.