Dear Jayesh, Joshua, Tim,

We use Moodle for high-stakes online-exams in secure environments with safe exam browser at our university (~40 exams with ~5000 grades given) and would much appreciate an ordering question-type with an improved mechanics in the scoring of partially correct answers, as we do not use the current ordering question type for exams at all due to this reason.

We have conducted a brief literature review on ordering question types, however we could not find any empirical validation studies on scoring schemes in ordering questions (search in research databeses or e.g. Haladyna & Rodriguez, 2013).

Absent research based recommendations, I would also support the scoring method "pairwise comparison", with the following reasoning:

An ordered set of N elements contains (N(N-1)/2) pairwise comparisons. In a set ordered by a candidate, any number of these pairwise comparisons may be correct or incorrect. Thus, I would agree that the number of correct pairwise comparisons in the ordered set should be the basis for a metric for awarding subpoints.

example 1: (1, 2, 3, 4, 5) -> 100%

example 2: (5, 4, 3, 2, 1) -> 0%

example 3: (5, 1, 2, 3, 4) -> 40%

etc.

The random guess score of 50%correct is not a problem per se from a psychometrical point of view (it simply needs to be accounted for when defining the grading scale), but typically finds little acceptance with examiners and students. To resolve this issue I would suggest making the option of a transformation of the calculated %correct score in a similar fashion to the 'kprime' scoring method for multiple true-false questions available (

Krebs, 1997):

method 1: Linear mapping:

Points are mapped to the range from 50%correct to 100%correct pairwise comparison scores;

<50%correct scores are not awarded any points

i.e.: points_awarded = (%correct - 50%) * 2

This has the advantage over squaring of being a linear transformation, with the expected value of random guess scores being 0.

method 2: All or nothing (see Tim's post)

method 3: Manual

Examiners can define how many (sub)points they want to award for any number of %correct pairwise comparison ranges.

Scoring methods I think should not be used:

- Number of items in the right position:

This method leads to absurd results: (5, 1, 2, 3, 4) -> 0 points; while (5, 4, 3, 2, 1) -> 0.2 points - Other position based methods:

e.g.: points = (max_distance - distance)/max_distance

(5, 1, 2, 3, 4) -> 33%, while also (5, 2, 3, 4, 1) -> 33% *[the former is clearly the better solution]*

- Subsequence based methods, e.g. [(subsequence length - 1)/(n-1) ]:

These methods do not offer any advantages over pairwise comparisons, they do however have certain disadvantages because they award getting the details right and do not award getting the bigger picture right:

(3, 1, 4, 2) -> 33%, while also (4, 1, 3, 2) -> 33% *[the former is clearly the better solution]*

(2, 1, 4, 3, 6, 5, 8, 7, 10, 9) -> 22.2%, while (6, 7, 8, 9, 10, 1, 2, 3, 4, 5) -> 44.4% *[the former is the better solution]* - Any non-linear transformations:

In contrast to linear transformations, non-linear transformations are hard to justify, explain and communicate to students, examiners or an appeal's board (!). Furthermore, from a psychometric point of view they are a very complex and hotly debated topic and require proper justification.

These are my thoughts. I hope they are helpful and would like to thank you, Jayesh, very much for sharing your project and offering the opportunity for giving input via the forum.

Best,

Tobias