Moodle in English: Specifications for bulk extractions in Moodle

Hello,

I am co-leading a work group on Moodle archiving, composed of archivists, instructional engineers, and IT specialists from french universities. We are all collectively working on a methodology for extracting materials from the platform. Indeed, some material present on Moodle needs to be retrieved for their historical value. Currently, the retrieval is done manually by the archivists. We would like to automate this process by creating a plugin that would function as a website scraper. We could then configure the scraper to only retrieve the HTML pages that interest us, without having to retrieve each element individually. Unfortunately, we do not have the capability to develop this tool internally. Therefore, we would like to suggest the technical specifications we have put together, hoping that someone might take it on or offer us additional ideas for development or workarounds.

Thank you and have a great day everyone!

Aurelia Ducci-Bouvier

archivist (INSA LYON)

GT_Moodle_Specifications_bulk_extractions_Moodle_EN.pdf

評点平均: -

Re: Specifications for bulk extractions in Moodle

2025年 04月 9日(水曜日) 23:52 - Michael Hughes の投稿

Hi Aurelia,

I think this is a fascinating proposal...but I'm wondering if web scraping is the way to go.

Any given Moodle site is really experienced by a "user", so your scraper agent would need to be a "user" of some sort (presumably a student...but that's not a given), and it will be subjected to the rules that apply to a student. So this could mean that there would be content your user isn't eligible to see, and someone would need to ensure that it is given all of the "correct" set up for the content you'd like to archive.

On top of this a scraper's going to pick up all of the Moodle UI in addition to the actual content.

You wouldn't need to create the scraper as a "Moodle" plugin either, in a traditional sense it would simply follow all of the links *that it can see* and index the content.

Creating a *plugin* however does mean that in theory you'd have the direct access to the raw content in all of the Moodle database, and you wouldn't need to do "scraping". However somehow your plugin would need to be aware of how every plugin (that you're interested in) represents it's data internally...

If the scraper is able to grab all of the HTML and CSS and related files, I suspect that it still wouldn't capture the backend processes that run that would enable a "standalone" copy to "function" (i.e. anything that makes further calls via AJAX). From what I can see from the Quiz archiver, this essentially gets the quiz attempt and is effectively re-writing the quiz attempt data into a "new" static format, which means it has some understanding of how the quiz module works internally (it was just short squint at the code).

It may be worth looking at Moodle's features around Subject Access Requests (this supports the GDPR requirement to be able to give a specify user all of *their* data), and there could be some interesting approaches in that sub-system that could be mapped over to your requirements, which is (as I understand) not about the "user's" content but more arbitrary content.

Every moodle is "obliged" to provide in it's privacy code (https://moodledev.io/docs/5.0/apis/subsystems/privacy) an "export_user_data()" function, there isn't as far as I'm aware an "export_activity_for_archive()" function, but maybe there should be so that each module must be inherently responsible for implementing a representation of it's activities in a form that is suitable for archival purposes...

Archival of the assignment submitted through the assignment activity is an interesting one, as there's a question as to whether "you'd" have the copyright to hold a copy of that work. I'm assuming if the "you" is the instution that owns the Moodle it may have a copyright assignment made as part of the student's submission but that could vary between institutions, but for instance we have the position that the student retains the copyright to their submission by default, so any further use of it needs to be clarified (just mention this as this feature for student submitted work may be well served with a switch to turn it on / off depending on the executing instution's situation).

I think option 2 has it's own issues as well, as you'd be capturing just 1 user's playback, or potentially many users playbacks, but I don't see that you'd get all potential user playbacks, so it would always be representative rather than definitive.

Anyway, I'm not entirely sure if this is entirely useful, but I look forward to hearing more about this activity!

評点平均:Useful (1)

Re: Specifications for bulk extractions in Moodle

2025年 04月 10日(木曜日) 15:19 - Séverin TERRIER の投稿

Hi,

Just indicating this discussion is also open in french.

Séverin

評点平均: -

General developer forum

Specifications for bulk extractions in Moodle

Specifications for bulk extractions in Moodle

Re: Specifications for bulk extractions in Moodle

Re: Specifications for bulk extractions in Moodle

Empowering educators to improve our world

Moodle

Support

Get Involved

Contributions

Downloads

Tracker

Development

Empowering educators to improve our world