Adaptive Quiz: CAT (Computer-Adaptive Testing) implementation for Moodle

Activities ::: mod_adaptivequiz
Maintained by Adam Franco, Vitaly Potenko
Create tests that efficiently measure users' abilities by adapting the questions difficulty to the estimation of user's ability.
Latest release:
573 sites
1k downloads
83 fans
Current versions available: 7

The Adaptive Quiz activity enables a teacher to create tests that efficiently measure the takers' abilities. Adaptive tests are comprised of questions selected from the question bank that are tagged with a score of their difficulty. The questions are chosen to match the estimated ability level of the current test-taker. If the test-taker succeeds on a question, a more challenging question is presented next. If the test-taker answers a question incorrectly, a less-challenging question is presented next. This technique will develop into a sequence of questions converging on the test-taker's effective ability level. The test stops when the test-taker's ability is determined to the required accuracy.

The Adaptive Quiz activity uses the "Practical Adaptive Testing CAT Algorithm" by B.D. Wright published in Rasch Measurement Transactions, 1988, 2:2 p.24 and discussed in John Linacre's "Computer-Adaptive Testing: A Methodology Whose Time Has Come." MESA Memorandum No. 69 (2000).

This Moodle activity module was created as a collaborative effort between Middlebury College and Remote Learner. Later on it was adopted by Vitaly Potenko to keep it compatible with new Moodle versions and enhance with new features.

Below you'll find short documentation on the plugin to explain its essential concepts and flows.

The Question Bank

To begin with, questions to be used with this activity are added or imported into Moodle's question bank. Only questions that can automatically be graded may be used. As well, questions should not award partial credit. The questions can be placed in one or more categories.

This activity is best suited to determining an ability measure along a unidimensional scale. While the scale can be very broad, the questions must all provide a measure of ability or aptitude on the same scale. In a placement test for example, questions low on the scale that novices are able to answer correctly should also be answerable by experts, while questions higher on the scale should only be answerable by experts or a lucky guess. Questions that do not discriminate between takers of different abilities on will make the test ineffective and may provide inconclusive results.

Take for example a language placement test. Low-difficulty vocabulary and reading-comprehension questions would likely be answerable by all but the most novice test-takers. Likewise, high-difficulty questions involving advanced grammatical constructs and nuanced reading-comprehension would be likely only be correctly answered by advanced, high-level test-takers. Such questions would all be good candidates for usage in an Adaptive Test. In contrast, a question like "Is 25¥ a good price for a sandwich?" would not measure language ability but rather local knowledge and would be as likely to be answered correctly by a novice speaker who has recently been to China as it would be answered incorrectly by an advanced speaker who comes from Taiwan -- where a different currency is used. Such questions should not be included in the question-pool.

Questions must be tagged tagged with a 'difficulty score' using the format 'adpq_n' where n is a positive integer, e.g. 'adpq_1' or 'adpq_57'. The range of the scale is arbitrary (e.g. 1-10, 0-99, 1-1000), but should have enough levels to distinguish between question difficulties.

The Testing Process

The Adaptive Test activity is configured with a fixed starting level. The test will begin by presenting the test-taker with a random question from that starting level. As described in Linacre (2000), it often makes sense to have the starting level be in the lower part of the difficulty range so that most test-takers get to answer at least one of the first few questions correctly, helping their moral.

After the test-taker submits their answer, the system calculates the target question difficulty it will select next. If the last question was answered correctly, the next question will be harder; if the last question was answered incorrectly, the next question will be easier. The system also calculates a measure of the test-taker's ability and the standard error for that measure. A next random question at or near the target difficulty is selected and presented to the user.

This process of alternating harder questions following correct answers and easier questions following wrong answers continues until one of the stopping conditions is met. The possible stopping conditions are as follows:

  • there are no remaining easier questions to ask after a wrong answer
  • there are no remaining harder questions to ask after a correct answer
  • the standard error in the measure has become precise enough to stop
  • the maximum number of questions has been exceeded


Attempt graph


Test Parameters and Operation

The primary parameters for tuning the operation of the test are:

  • the starting level
  • the minimum number of questions
  • the maximum number of questions
  • the standard error to stop

Relationship between Maximum Number of Questions and Standard Error

As discussed in Wright (1988), the formula for calculating the standard error is given by:

Standard Error (± logits) = sqrt((R+W)/(R*W))

where R is the number of right answers and W is the number of wrong answers. This value is on a logit scale, so we can apply the inverse-logit function to convert it to an percentage scale:

Standard Error (± %) = ((1 / ( 1 + e^( -1 * sqrt((R+W)/(R*W)) ) ) ) - 0.5) * 100

Looking at the Standard Error function, it is important to note that it depends only on the difference between the number of right and wrong answers and the total number of answers, not on any other features such as which answers were right and which answers were wrong. For a given number of questions asked, the Standard Error will be smallest when half the answers are right and half are wrong. From this, we can deduce the minimum standard error possible to achieve for any number of questions asked:

  • 10 questions (5 right, 5 wrong) → Minimum Standard Error = ± 15.30%
  • 20 questions (10 right, 10 wrong) → Minimum Standard Error = ± 11.00%
  • 30 questions (15 right, 15 wrong) →  Minimum Standard Error = ± 9.03%
  • 40 questions (20 right, 20 wrong) →  Minimum Standard Error = ± 7.84%
  • 50 questions (25 right, 25 wrong) →  Minimum Standard Error = ± 7.02%
  • 60 questions (30 right, 30 wrong) →  Minimum Standard Error = ± 6.42%
  • 70 questions (35 right, 35 wrong) →  Minimum Standard Error = ± 5.95%
  • 80 questions (40 right, 40 wrong) →  Minimum Standard Error = ± 5.57%
  • 90 questions (45 right, 45 wrong) →  Minimum Standard Error = ± 5.25%
  • 100 questions (50 right, 50 wrong) →  Minimum Standard Error = ± 4.98%
  • 110 questions (55 right, 55 wrong) →  Minimum Standard Error = ± 4.75%
  • 120 questions (60 right, 60 wrong) →  Minimum Standard Error = ± 4.55%
  • 130 questions (65 right, 65 wrong) →  Minimum Standard Error = ± 4.37%
  • 140 questions (70 right, 70 wrong) →  Minimum Standard Error = ± 4.22%
  • 150 questions (75 right, 75 wrong) →  Minimum Standard Error = ± 4.07%
  • 160 questions (80 right, 80 wrong) →  Minimum Standard Error = ± 3.94%
  • 170 questions (85 right, 85 wrong) →  Minimum Standard Error = ± 3.83%
  • 180 questions (90 right, 90 wrong) →  Minimum Standard Error = ± 3.72%
  • 190 questions (95 right, 95 wrong) →  Minimum Standard Error = ± 3.62%
  • 200 questions (100 right, 100 wrong) →  Minimum Standard Error = ± 3.53%

What this listing indicates is that for a test configured with a maximum of 50 questions and a "standard error to stop" of 7%, the maximum number of questions will always be encountered first and stop the test. Conversely, if you are looking for a standard error of 5% or better, the test must ask at least 100 questions.

Note that these are best-case scenarios for the number of questions asked. If a test-taker answers a lopsided run of questions right or wrong the test will require more questions to reach a target standard of error.

Minimum Number of Questions

For most purposes this value can be set to 1 since the standard of error to stop will generally set a base-line for the number of questions required. This could be configured to be greater than the minimum number of questions needed to achieve the standard of error to stop if you wish to ensure that all test-takers answer additional questions.

Starting Level

As mentioned above, this usually will be set in the lower part of the difficulty range (about 1/3 of the way up from the bottom) so that most test takers will be able answer one of the first two questions correctly and get a moral boost from their correct answers. If the starting level is too high, low-ability users would be asked several questions they can't answer before the test begins asking them questions at a level they can answer.

Scoring

As discussed in Wright (1988), the formula for calculating the ability measure is given by:

Ability Measure = H/L + ln(R/W)

where H is the sum of all question difficulties answered, L is the number of questions answered, R is the number of right answers, and W is the number of wrong answers.

Note that this measure is not affected by the order of answers, just the total difficulty and number of right and wrong answers. This measure is dependent on the test algorithm presenting alternating easier/harder questions as the user answers wrong/right and may not be applicable to other algorithms. In practice, this means that the ability measure should not greatly affected by a small number of spurious right or wrong answers.

As discussed in Linacre (2000), the ability measure of the test taker aligns with the question-difficulty at which the test-taker has a 50% probability of answering a question correctly.

For example, given a test with levels 1-10 and a test-taker that answered every question 5 and below correctly and every question 6 and up wrong, the test-taker's ability measure would fall close to 5.5. Remember that the ability measure does have error associated with it. Be sure to take the standard error amount into account when acting on the score.

Screenshots

Screenshot #0

Contributors

Adam Franco (Lead maintainer): Former maintainer
Please login to view contributors details and/or to contact them

Comments RSS

Comments

  • Vitaly Potenko
    Fri, 26 Jan 2024, 9:34 PM
    Hi Ahsan,
    I'd suggest to create an issue in the plugin's tracker (find the link in the plugin's description above). It's much more convenient to run the discussion there. Thank you!
  • Vitaly Potenko
    Fri, 26 Jan 2024, 9:39 PM
    Hi Sanvidha,
    The quiz-takers are not supposed to see any detailed reports. They can see only the general ability measure value (though even this is arguable and can be configured per each quiz). This is a debatable question and is covered in those articles linked in the plugin's description. Anyway, there are no plans to introduce reports for students in the plugin, CAT is not intended to provide such reports as first thing.
  • Kennedy Kinyua
    Fri, 9 Feb 2024, 3:10 AM
    Requesting for adaptive quiz plugin for Moodle 4.2?
  • Vitaly Potenko
    Fri, 9 Feb 2024, 10:41 PM
    Hi Kennedy, yeah, the work is running.
  • Paulo Paclibar
    Mon, 11 Mar 2024, 10:49 AM
    Hi @Vitaly. Just want to ask if this plugin will also work on SCORM package? Is it possible to make a scorm package a Computer adaptive test?
  • Paulo Paclibar
    Mon, 11 Mar 2024, 3:29 PM
    @Vitaly
    Also, How are we going to set the standard error if we want to stop the test if the test taker answered 85 correct answers and vice versa, will stop also with 85 wrong answers? We are using 150 questions overall and has 4 levels of questions. We followed the rule on this documentation but it is always stopping on the 40th question. and sometimes, it is not stopping at all.
  • Vitaly Potenko
    Tue, 12 Mar 2024, 4:28 AM
    Hi Paulo,
    Regarding SCORM - no, it's not possible, it's an entirely different thing.
    As for the quiz config - please, check the 'The Testing Process' section in the plugin's description above, in particular, where it says 'The possible stopping conditions are as follows:..' and the bullet list below. There you'll find out when the testing process stops.
    Also, check the 'Test Parameters and Operation' section. You should have configured the minimum amount of questions and the maximum amount of questions. The maximum amount is when the test may stop (one of the possible conditions).
    So, in general, the test may stop when the standard error for the ability estimation is within the configured boundaries, or when the maximum amount of administered (that is, presented to user) questions has been reached.
    You cannot stop the test when 'the taker answered 85 correct/wrong' questions, this is not how it works.
  • Paulo Paclibar
    Tue, 12 Mar 2024, 1:32 PM
    Hi @Vitaly,
    Thank you for your response. I think I kinda getting it now but I still need guidance with this. Correct me if I'm wrong:
    1. Is the test stopping on the 40th question because it is out of question to pull on that level? If that's the case, I need to upload more questions per level.
    2. How the test would stop if the test taker reach the maximum correct answer? Is that possible to set or the test will just go until the maximum question reached even if the test taker is answering the questions correctly?
  • Vitaly Potenko
    Tue, 12 Mar 2024, 5:55 PM
    @Paulo
    1. It may stop because you might have set the 'maximum questions' parameter to 40 in the adaptive quiz settings. Check what value you have there.
    2. There's no such stopping condition as 'maximum correct answer'. No matter how the taker is answering the questions, the test will stop 1) when the standard error value matches the one you configure in the quiz settings, 2) the maximum number of questions has been reached, again, you configure it in the quiz settings.

    You can see the stoppage reason for each taker's attempt when viewing quiz report as a manager. It'll give you understanding of why the test has stopped in this particular attempt.

    I'd recommend to follow those links in the plugin's description above to learn how CAT works, in particular the one with Linacre (2000) title.
  • Paulo Paclibar
    Tue, 12 Mar 2024, 6:12 PM
    @Vitaly
    Before I further read the study by Linacre, I will just ask if what you mean on your answer is that the questions will just continue until the end if the test taker is answering the questions without any wrong answer? Is that right?
  • Vitaly Potenko
    Wed, 13 Mar 2024, 1:05 AM
    @Paulo
    It doesn't matter how the taker answers questions - correct or wrong, the stoppage conditions always remain the same, event if all questions are answered correctly or all are answered wrong.
  • Paulo Paclibar
    Wed, 13 Mar 2024, 12:38 PM
    @Vitaly
    Thank you so much for your responses. It is a great help.
  • Vitaly Potenko
    Fri, 5 Apr 2024, 2:00 AM
    New version has just been released. Now the new version is used across all Moodle versions from 4.1 to 4.3 (and is expected to be compatible with the upcoming 4.4 as well and become an early bird there, but we'll see). Also, see the release notes to know more about the fixes included. Cheers!
  • Daniel Dshajani
    Mon, 15 Apr 2024, 10:37 PM
    Hi Vitaly, many thanks for maintaining this plugin! Maybe you can assist - I am trying to show the end user a text based on their ability after completing the test. So for example, if ability is 1, show text A, if ability is 2, show text B. Maybe I am missing something? Many thanks!
  • Vitaly Potenko
    Tue, 16 Apr 2024, 4:28 AM
    Hi Daniel, currently the plugin isn't capable of this. However, this is a great idea which other users may benefit from as well. There's some thinking on how feedback for test-takers can be enhanced in future, perhaps, what you described can be considered as well.
Please login to post comments