General developer forum

more efficient use of disc space - VM

 
Picture of Gary Prosser
more efficient use of disc space - VM
 

Context - I'm involved with both dedicated real servers running 10+ moodles each and with clients wanting a moodle install on a VM. In the former case when doing upgrades (copy moodledata folder is advised) the moodledata copy time can be 40 mins+. In the latter case disc space is a significant cost factor.

Example case - one install has a moodledata folder of 21G - assuming my sql is correct 8.4G is duplicated files (this test based on mdl_files.contenthash) perhaps arising from copied courses or same resource in multiple courses.

Could resource file upload process check for duplication (contenthash) and in mdl_files use the same pathnamehash for the new record ?

If, on deletion of a course or resource, could the process check if another file record uses the same contenthash and not remove the file, while still removing the mdl_files record ?


 
Average of ratings: -
Picture of Rex Lorenzo
Re: more efficient use of disc space - VM
Core developersParticularly helpful MoodlersPlugin developersPlugins guardiansTesters

Moodle only does store 1 copy of a file if it is uploaded more than once in one or several courses.

See https://docs.moodle.org/dev/File_API_internals#File_storage_on_disk

 
Average of ratings: -
Picture of Visvanath Ratnaweera
Re: more efficient use of disc space - VM
Particularly helpful Moodlers
Hi Gary

You said:
> one install has a moodledata folder of 21G - assuming my sql is correct 8.4G is duplicated files (this test based on mdl_files.contenthash)

Any documentation of your method?

> perhaps arising from copied courses or same resource in multiple courses.

As Rex already pointed out, the repository is made to keep only a single copy of a file across the whole site. That said, there have been cases of Moodle wasting disk space. Before diving in to that, we need to be certain that your Moodle is not behaving the way it is supposed to.
 
Average of ratings: -
Picture of Gary Prosser
Re: more efficient use of disc space - VM
 

Thanks all for the information and explanations.

I had not understood that the location of a file in moodledata folder was provided by the contenthash. That misunderstanding led to my incorrect claim of duplicated files. 


 
Average of ratings: -
Davo
Re: more efficient use of disc space - VM
Core developersParticularly helpful MoodlersPlugin developers

And just to add to what others have already stated, contenthash is used to determine where the file is stored within moodledata, whereas pathnamehash is a hash of the identifiers in mdl_files and is used to uniquely identify that instance of the file in the Moodle code (the id field is avoided as it would change if the file was deleted and recreated, the pathnamehash would be unchanged if an updated version of a file was uploaded to the same in Moodle with the same filename).

 
Average of ratings: -