Manually creating an mbz archive for import into Moodle - files.xml structure

Manually creating an mbz archive for import into Moodle - files.xml structure

by Jerome Di Pietro -
Number of replies: 5

Having unpacked an .mbz Moodle archive (from a course back-up) I can work out how to 'manually' create a template for new courses.

My aim is to use this to automate the process of importing courses (and simple course files such as PDFs) from a bespoke VLE into Moodle.


I'm stuck where it comes to including files (course content). Does anyone know very much about this, or know where I can get documentation?


On a test export of a course, with a single PDF resource, I see that the .mbz archive contains a 'files' folder and within that, a folder named 'b2' and inside that, a file called 'b2d7bad08bf3ccc8d1971a992b5d73445ab1ffe8' with no file extension: 

/files

/files/b2/

/files/b2/b2d7bad08bf3ccc8d1971a992b5d73445ab1ffe8


files.xml is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<files>
  <file id="18973878">
    <contenthash>b2d7bad08bf3ccc8d1971a992b5d73445ab1ffe8</contenthash>
    <contextid>1676311</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>cp-300-400_safety-case-study.pdf</filename>
    <userid>256</userid>
    <filesize>402914</filesize>
    <mimetype>application/pdf</mimetype>
    <status>0</status>
    <timecreated>1461082918</timecreated>
    <timemodified>1461082920</timemodified>
    <source>cp-300-400_safety-case-study.pdf</source>
    <author>Jerome Di Pietro</author>
    <license>allrightsreserved</license>
    <sortorder>1</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
  <file id="18973879">
    <contenthash>da39a3ee5e6b4b0d3255bfef95601890afd80709</contenthash>
    <contextid>1676311</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>.</filename>
    <userid>256</userid>
    <filesize>0</filesize>
    <mimetype>$@NULL@$</mimetype>
    <status>0</status>
    <timecreated>1461082918</timecreated>
    <timemodified>1461082920</timemodified>
    <source>$@NULL@$</source>
    <author>$@NULL@$</author>
    <license>$@NULL@$</license>
    <sortorder>0</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
</files>



By adding a pdf extension to 'b2d7bad08bf3ccc8d1971a992b5d73445ab1ffe8' I can open the PDF file normally. 



After blindly testing the Export/Restore of a course with multiple files, I can tell that you need to create a sub folders for each file you want importing.  Importantly, the contenthash file name needs to start with the first two characters of the subfolder name.

So, reversing the process described above, if I want to import two PDFs into my template course, I removed their .pdf file extension, renamed them 'a1randomhash1' and 'a2randomhash2' and placed each one in their respective folders:

files/a1/a1randomhash1
files/a2/a2randomhash2


For this import, files.xml was changed to be as follows:

<?xml version="1.0" encoding="UTF-8"?>
<files>
  <file id="19005298">
    <contenthash>a1randomhash1</contenthash>
    <contextid>1679208</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>cp-300-400_safety-case-study.pdf</filename>
    <userid>256</userid>
    <filesize>402914</filesize>
    <mimetype>application/pdf</mimetype>
    <status>0</status>
    <timecreated>1461082918</timecreated>
    <timemodified>1461082920</timemodified>
    <source>cp-300-400_safety-case-study.pdf</source>
    <author>Jerome Di Pietro</author>
    <license>allrightsreserved</license>
    <sortorder>1</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
  <file id="19005299">
    <contenthash>da39a3ee5e6b4b0d3255bfef95601890afd80709</contenthash>
    <contextid>1679208</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>.</filename>
    <userid>256</userid>
    <filesize>0</filesize>
    <mimetype>$@NULL@$</mimetype>
    <status>0</status>
    <timecreated>1461324990</timecreated>
    <timemodified>1461324990</timemodified>
    <source>$@NULL@$</source>
    <author>$@NULL@$</author>
    <license>$@NULL@$</license>
    <sortorder>0</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
  <file id="19005391">
    <contenthash>a2randomhash2</contenthash>
    <contextid>1679217</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>Other dummy PDF file.pdf</filename>
    <userid>256</userid>
    <filesize>187262</filesize>
    <mimetype>application/pdf</mimetype>
    <status>0</status>
    <timecreated>1461325379</timecreated>
    <timemodified>1461325380</timemodified>
    <source>Other dummy PDF file.pdf</source>
    <author>Jerome Di Pietro</author>
    <license>allrightsreserved</license>
    <sortorder>1</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
  <file id="19005392">
    <contenthash>da39a3ee5e6b4b0d3255bfef95601890afd80709</contenthash>
    <contextid>1679217</contextid>
    <component>mod_resource</component>
    <filearea>content</filearea>
    <itemid>0</itemid>
    <filepath>/</filepath>
    <filename>.</filename>
    <userid>256</userid>
    <filesize>0</filesize>
    <mimetype>$@NULL@$</mimetype>
    <status>0</status>
    <timecreated>1461325379</timecreated>
    <timemodified>1461325380</timemodified>
    <source>$@NULL@$</source>
    <author>$@NULL@$</author>
    <license>$@NULL@$</license>
    <sortorder>0</sortorder>
    <repositorytype>$@NULL@$</repositorytype>
    <repositoryid>$@NULL@$</repositoryid>
    <reference>$@NULL@$</reference>
  </file>
</files>


What I don't understand is why there is a second 'file' node in files.xml file, one for each file resource? 


Both share the same contextid as the previous node ( 1679208 and 1679217) but the  second seems to list an empty file (there are only 2 file resources in this test course, not 4) so each pair is obviously somehow related but I can't figure out why a single file needs two nodes in the xml.

<file id="19005298">
    <contenthash>a1randomhash1</contenthash>
    <contextid>1679208</contextid>
	...
<file id="19005299">
    <contenthash>da39a3ee5e6b4b0d3255bfef95601890afd80709</contenthash>
    <contextid>1679208</contextid>
	...

<file id="19005391">
    <contenthash>a2randomhash2</contenthash>
    <contextid>1679217</contextid>
	...
<file id="19005392">
    <contenthash>da39a3ee5e6b4b0d3255bfef95601890afd80709</contenthash>
    <contextid>1679217</contextid>


Can anyone help or have any advice or know of any tools for automating the process of importing course (and course files) into Moodle?


Any help much appreciated!

Average of ratings: -
In reply to Jerome Di Pietro

Re: Manually creating an mbz archive for import into Moodle - files.xml structure

by Eric Merrill -
Picture of Core developers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers

The second entry for each is an entry for the "folder" it is in, "/". You can play with this more by making a folder in the resource and placing the file in that folder,then backing it up. 

In reply to Jerome Di Pietro

Re: Manually creating an mbz archive for import into Moodle - files.xml structure

by Davo Smith -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Firstly, the hashes are not at all random, they are a hash of the content of the file in question (so that identical files uploaded multiple times are stored only once on the server, not multiple times).

Secondly, the extra file in each case is the directory in which the file is stored. Every file area in Moodle has its own directory structure. In 99% of cases (not scientifically analysed) there are no subdirectories and so each file area just has one directory '/' and one file in it.

You should carefully read through:

https://docs.moodle.org/dev/File_API_internals

before attempting to manually edit the file part of a backup in this way (and even then, I'd tread very carefully - if the hash is wrong, it could lead to all sorts of confusion within the Moodle file handling).

Average of ratings: Useful (1)
In reply to Davo Smith

Re: Manually creating an mbz archive for import into Moodle - files.xml structure

by Jerome Di Pietro -

I see I have some bedtime reading! Many thanks for the cautionary words, re. manually changing the contenthash values.

Can I just check with you, is "getting the SHA1 hash of a file's content" just the output from this php function? 
http://php.net/manual/en/function.sha1-file.php


Many thanks.

In reply to Jerome Di Pietro

Re: Manually creating an mbz archive for import into Moodle - files.xml structure

by Paul Nicholls -

Hi Jerome,

Yes, the sha1_file() function is how Moodle gets the file's content hash.  For reference, the content hash for the "folder" entries (filename is ".") can be obtained by passing an empty string to sha1() - as that value doesn't change, you can safely calculate it once and store the result for reuse in subsequent folders.


-Paul

Average of ratings: Useful (1)
In reply to Paul Nicholls

Re: Manually creating an mbz archive for import into Moodle - files.xml structure

by Jerome Di Pietro -

Much appreciated. Thanks for sharing you expertise