How to clean up a site with massively duplicated question categories?

How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Number of replies: 21
Picture of Particularly helpful Moodlers Picture of Translators

Hi all

I am faced with the question on how to clean up a site with massively duplicated question categories. I am talking of about 400 courses with the number of questions varying from 500 to 9,500! Almost all courses have duplicate question categories. A screenshot from one of the worst courses is attached.

I can't figure out how this happened. This has been happening over years, I believe, at every semester roll-out. The teachers have "template" courses, kind originals from which they pull resources including quizzes to the running courses around with the Sharing cart. Every teacher owns a course category (Manager role) and can duplicate courses at will. The group leaders are Managers of the parent category where there are other "template" courses with resources and question categories - the originals of those multiplied question categories. The teachers run multiple classes for the same/related subjects and share the same pool of question categories. Yeah, the same "test" is being run with random questions every time.

Now my question is how do I clean up this mess? The automated course backups (on PHP CLI) started failing for the bigger courses and their number goes up on a weekly basis.

Even drastic measures like a start with a completely new site, new 4.5 LTS, at the beginning of the next academic year and to leave this site, 4.1 LTS, as an archive, are possible. But a) the new begin is with course backup/restore is pointless, if they carry the mess forward. b) even if we start with "lean" courses, it is pointless, if this growth continues there.

I tend towards that solution, even to the point of investing on manual migration, provided we stop this growth in the new site.

Current site: Moodle 4.1 LTS, always latest update. PHP had been 7.4 until end of December 2024, now running 8.1. MariaDB has been upgraded from 10.4 to 10.11 also in December. 

I am aware of the many running discussions on similar problems and some notable tracker issues. Still prefer to make a new start without mixing up with those cases.

Attachment Screenshot_2025-02-03_21-43-04.png
Average of ratings: -
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
I was re-reading the forums on duplicating question categories and found these in one place, on a slide Tim presented in the MUA Town Hall on Moodle 4.6 and its quiz question sharing feature (attached to the post):
- MDL-41924 Minimise questions included with quiz backups (or duplication)
 
- MDL-38870 Questions duplicated during import/restores
 
- MDL-75854 Duplicate questions in system wide question bank after course restore into Moodle 4
 
- MDL-69306 Duplicated quiz uses the same questions from the question bank: can confuse teachers if they expect questions were copied too
 
I'm not saying that all are related to my issue. I'm trying to understand.
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by koen roggemans -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Translators
Hi Visvanath,

You missed MDL-61267 Remaining data relating to qtype_random instances should be removed because the random question feature is now implemented differently
in your overview of things to check.

75% of the questions in my database were random questions - a data type that doesn't exist anymore in the 4.x branch of Moodle. One course had nearly 1M of them. Cleaning up these questions solved my problem. My server didn't crash, but backup and restore and duplicating quizes didn't work anymore for a lot of courses. That works now again.
Average of ratings: Useful (4)
In reply to koen roggemans

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Thanks Koen! That operation eliminated 12 k 'random' questions. No big relief compared to the total of 2 M questions we have. Still something.
In reply to koen roggemans

Re: How to clean up a site with massively duplicated question categories?

by Jerry Krop -
In my situation, this operation resulted in the deletion of 9 million questions. I initially believed this would resolve the issue; however, accessing the course question bank remains problematic. The loading time is excessively long, ultimately leading to a ‘time out’ error.
In reply to Jerry Krop

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
That simply means that the cause of your problem is not the cause of my problem. I understand, you must be desperate. But entering this (wrong, closed) discussion from all sides will only confuse everybody. If you don't know, it is known as thread hijacking and frowned upon. Please start a new thread or message the moderator to split your thread as suggested here!
In reply to Jerry Krop

Re: How to clean up a site with massively duplicated question categories?

by Przemek Kaszubski -
Picture of Particularly helpful Moodlers Picture of Testers
What I've been doing with some of our problematic courses is to create trash quizzes and move categories with unused questions there - I'll be doing more of this as long as that option remains available (it will NOT be after the upgrade to Moodle 5 - cf. https://moodle.org/mod/forum/discuss.php?d=466039 and https://tracker.moodle.org/browse/MDL-84575 ).
Such operations slim down the course-level question bank and make it more usable.
My own approach to quizzes these days is - keep "shared question" to the necessary minimum. If there's a quiz that requires its own restrictive set of questions not to be shared with other quizzes, move these questions into the Quiz's local question bank.

PS. Sometimes, verifying whether a question is used or not can be tricky - cf. this "fix" https://tracker.moodle.org/browse/MDL-77625 ("Repeated quiz restoration to same course references original course questions"), which: (1) names the problem and aspires to prevent it from happening but does not cure pre-existing problem case unless you involve in deep-level manual fidgeting [which some of the discussion that ticket tackles]; (2) apparently spawned more problems than it solved. Ergo, I'm now waiting for this issue to be finally resolved - https://tracker.moodle.org/browse/MDL-83541 ("error_question_answers_missing_in_db when duplicate or restore quiz") - and will then decide how to proceed before the next academic year..
Average of ratings: Useful (2)
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
And the sizes of the tables:
+----------------------------------+----------+
| table_name | SIZE_MB |
+----------------------------------+----------+
| mdl_question | 10034.89 |
| mdl_question_answers | 3103.63 |
| mdl_files | 1655.77 |
| mdl_logstore_standard_log | 1621.50 |
| mdl_qtype_match_subquestions | 1331.30 |
| mdl_question_attempts | 1287.50 |
| mdl_book_chapters | 948.66 |
| mdl_question_attempt_steps | 847.48 |
| mdl_grade_grades_history | 792.75 |
| mdl_question_attempt_step_data | 723.83 |
| mdl_qtype_essay_options | 296.11 |
| mdl_question_bank_entries | 207.95 |
| mdl_grade_grades | 200.45 |
| mdl_question_versions | 200.09 |
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators

The DB crashed on two consecutive days during the peaks. Both the sameé mariadb starts taking memory and block the whole server, throwing "site not availabe" errors! Caught it hopefully by configuring MariaDB differently. Still, direly in need of answers to the questions in the OP.

In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
I believe, it is MDL-38870. Left a comment in the tracker. Appreciate any insight in to this.
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Rick Jerz -
Picture of Particularly helpful Moodlers Picture of Testers

I have been watching this discussion.

Which of these features would you prefer? (As far as I know, these are not available.)

  1. Show me a list of all questions that are not connected to any course quizzes, and then allow me to either pick which ones I want to delete, or provide a button to "delete all."  (You better know what you are doing to "delete all.)
  2. Find any duplicated question and point the quiz question to whichever duplicated question is deemed the "master." Then, delete the duplicate. In this case, would you want to be the one to pick the "master" question, or are you fine letting Moodle pick it?
  3. (Similar to #2) Find any question that is nearly the same as another, and let me decide which one is the correct version of the question.  (Some would want an AI engine to decide, but I wouldn't.)
In reply to Rick Jerz

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Rick, thanks for the thoughts.

The operations you mentioned, are they on the (Moodle) GUI? I know, if properly programmed and tested, they are the safest. But the question is, are do they exist? If they exist, then the next question is, are they scalable? Don't forget the numbers. I am talking of 200+ courses with varying number of duplicated question categories. The worst ones have close to 10k questions each in I don't know how many categories! (See the number of pages in the screenshot in my post at the top.) I don't know how many question categories are used and how many not.
 
But now the DB server has started crashing, I am ready to walk on fire. sad
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Rick Jerz -
Picture of Particularly helpful Moodlers Picture of Testers
No, the three features do not exist anywhere. However, I am just trying to help figure out what we would want to happen. Perhaps I am at a different level. Perhaps you (and others) would want "Can you just clean my question bank?" Maybe my three items address more of the "how" issue.

Item #1, my first sentence, the "show me" might be doable with some custom SQL.
In reply to Rick Jerz

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
I know that no such tools exist. My question was, if to create such tools, will they be GUI tools? I was worried about the number of clicks required to clean the massive number of duplicates. SQL, directly manipulating the database, would be more targeted but then dangerous as well. Of course the operation will be done on a clone first.
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Rick Jerz -
Picture of Particularly helpful Moodlers Picture of Testers
Cleaning up a cluttered question bank is complex, so a GUI tool would be the way to go. However, I envision that it will still take a lot of clicks and manual intervention. My question remains: if you have what appears to be a dozen of the same question, how will you know if some have minor changes, and how would you know if it is safe to delete one?
In reply to Rick Jerz

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
> Cleaning up a cluttered question bank is complex, so a GUI tool would be the way to go. However, I envision that it will still take a lot of clicks..

Depends on how many clicks in total. If one click per question, then the top 10 contenders alone will give me roughly 80k clicks!
 
> and manual intervention.
 
Sure! I am looking for those very instructions.

> My question remains: if you have what appears to be a dozen of the same question, how will you know if some have minor changes, and how would you know if it is safe to delete one?

I don't know. That is why I am asking!
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
mysqltuner reports:
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +Aria +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in Aria tables: 32.0K (Tables: 1)
[--] Data in InnoDB tables: 72.2G (Tables: 6315)
[OK] Total fragmented tables: 0

Although a beefy server, it doesn't have 72 GB to give the DBMS - and why, because the question categories exploded?
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
The database crashes (exits) regularly when the quiz load ramps up. DB restart helped. But this can not be normal. See https://moodle.org/mod/forum/discuss.php?d=464421#p1870340
 
I've posted detailed information in the original post. If more required, please ask!
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories?

by Jerry Krop -
Hi Visvan i have the same issue. i have a platform with 13millions of questions. The large number of records makes it impossible to duplicate the course, as it would take an enormous amount of time. Additionally, when trying to use the question bank, the system cannot handle the volume and returns a ‘time out’ error.

I have read several threads related to your issue, but I am still unsure if it is possible to clean records in the database. Do you know if the development team has any solutions for this?
Average of ratings: Useful (1)
In reply to Jerry Krop

Re: How to clean up a site with massively duplicated question categories? [NOT SOLVABLE]

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
No! If I knew, I wouldn't have asked. I'm in disbelief. Apparently there are no answers, or too late for me for an answer - when a server regularly crashes two weeks are an eternity. I hope the information posted in the thread will help you ultimately to find a solution to your version of the problem. AFAIC the thread is CLOSED marked NOT SOLVABLE.
In reply to Visvanath Ratnaweera

Re: How to clean up a site with massively duplicated question categories? [NOT SOLVABLE]

by Nathan Lind -
If this is the Tracker, you will want to create steps to reproduce the problem in the current versions of Moodle: MDL-38870
In reply to Nathan Lind

Re: How to clean up a site with massively duplicated question categories? [NOT SOLVABLE]

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
You must be talking to Jerry. For me the discussion is CLOSED.

@the moderator: Maybe, it'd be better to split the discussion at Jerry's post with a changed subject, at a minimum taking the [NOT SOLVABLE] out. Then I will close the original discussion replying to my last post.