How to remove unused files from the moodledata directory?

How to remove unused files from the moodledata directory?

by Zoran Jančić -
Number of replies: 9
Picture of Particularly helpful Moodlers

We did reset on all courses recently to reduce disk usage. Reset process should remove students files uploaded through the assigment activites (that's what takes most of our disk space). When we make a backup of the course now, it is much smaller than before (for example 300MB instead of 6GB). Disk usage, in the other hand, is the same as before. Most of the disk usage is in the moodledata/filedir directory. How can I delete the files that are not used anymore? It is Linux Debian 7.1 system with Moodle 3.1.14 (Build: 20180910), PHP 5.6.38, mySQL 5.5.33.

It looks like reset process didn't delete the files that it should have. I already deleted the cache but without any major changes in disk space usage. I also considered the  File Trash plugin but it is for Moodle up to version 2.9 and is not suitable for 3.1 and newer versions. 


tnx,
Z.

Average of ratings: -
In reply to Zoran Jančić

Re: How to remove unused files from the moodledata directory?

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

The delete process is run in the background by a scheduled task. You need to make sure your cron is running and you need to wait. Depending on the file type it can take several days. It was a bit of a weird design decision but there you go. 

If you are desperate, you can short-circuit the process using Moosh (https://moosh-online.com/)

Average of ratings: Useful (1)
In reply to Howard Miller

Re: How to remove unused files from the moodledata directory?

by Zoran Jančić -
Picture of Particularly helpful Moodlers

I executed cron script manualy. It ran without errors but disk usage is still the same. The following scheduled tasks, that may be relevant, executed via cron:

\core\task\cache_cleanup_task

\core\task\cache_cron_task

\core\task\context_cleanup_task

\core\task\file_temp_cleanup_task

\core\task\file_trash_cleanup_task

\tool_recyclebin\task\cleanup_course_bin

\tool_recyclebin\task\cleanup_category_bin


I installed Moosh, went through commands. 
moosh file-delete --flush didn't help.

I'm currently running moosh file-datacheck. It didn't show anything except a lot of dots for last 30 minutes but is still running. 

I also tried what is suggested here: https://moodle.org/mod/forum/discuss.php?d=204515 but that didn't help at all although cron executed without errors again. 

In reply to Zoran Jančić

Re: How to remove unused files from the moodledata directory?

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Again, the scheduled task only deletes files that are tagged for deletion over a certain amount of time. It's the order of days.

I'm surprised that the moosh command didn't work.

Are you *sure* that these are not duplicate files in use somewhere else? If you have made a copy of a course and then... time passes and you forget it's a copy... delete it. You will see no reduction in disk spaced used. That's because there was no increase when the course was copied. Identical files are stored only once. 

Average of ratings: Useful (1)
In reply to Howard Miller

Re: How to remove unused files from the moodledata directory?

by Zoran Jančić -
Picture of Particularly helpful Moodlers

OK, here's the thing… We have 188 courses with lots of assignment activities where students were uploading their multimedia based assignments for 6 years. Over the years, that's a lot of multimedia uploaded by students. So the moodle data dir grew to the size of 172 GB. So we reset all courses a week ago. We picked few courses to check backup size before and after the reset and noticed that each picked course had about 10 times bigger backup size before reset. After we reset all courses, we made backup of all courses. Total size of all courses backup files is 15GB but moodle data dir is still 172 GB. Today, a full week after we reset all courses, the moodle data folder is still 172 GB. I ran cron.php manually today, just to be sure that cron executes and it did but the moodle data folder is still 172 GB. I'm running out of ideas. Is there any particular moosh command that could help identify files in moodle data dir that are not in use anymore (not referrenced anywhere in the database)? I couldn't find any. There are command for different things but not for that.

I appreciate any help.

Regards,
Zoran

In reply to Zoran Jančić

Re: How to remove unused files from the moodledata directory?

by Ken Task -
Picture of Particularly helpful Moodlers

First, do admin a site that has multimedia courses (team taught ... students in and out of the course, etc.)

Those courses get very large and it's very important to reset properly.   When running reset, all the options should be 'turned down' (ie, displayed).   Assignments is one of those that if NOT turned down where one can see the actions to be taken on reset, are not what one would want to 'clean up' a course.

What's in moodledata/trashdir/

du -h /path/to/moodledata/trashdir

Moosh ... hmmmm ... must have missed it ... there are two that might help ...

file-dbcheck

Check that all files recorder in the DB do exist in Moodle data directory.


moosh file-dbcheck

file-list

Search and list files from mdl_files table. The argument should be a valid SQL WHERE statement. Interesting columns of possible search criterias are: contextid, component, filearea, itemid, filepath, filename, userid, filesize, mimetype, status, timecreated, timemodified.

You can also use some special values:

    course=NNN to list all files that relate to a course


See the moosh commands page for more info on file-list command.
https://moosh-online.com/commands/

'spirit of sharing', Ken


Average of ratings: Useful (1)
In reply to Ken Task

Re: How to remove unused files from the moodledata directory?

by Zoran Jančić -
Picture of Particularly helpful Moodlers

You helped me a lot! Thank you very much! 

file-dbcheck command description is misleading though. It says "Check that all files recorder in the DB do exist in Moodle data directory". It actually does the other way around: checks that all files in the Moodle data directory do exist in DB records. ...or it's just my bad English smile

So, I found a lot of files. The question is: how reliable is Moosh? Is it perfectly safe to delete all those files form the disk, that Moosh claims do not exist in Moodle DB?

In reply to Zoran Jančić

Re: How to remove unused files from the moodledata directory?

by Ken Task -
Picture of Particularly helpful Moodlers

Good question!!!  But then again, how can one trust anything?

Soooo ... make a full backup (tar ball) of moodledata/filedir/ off into some partition where you have room.

Along with that, export a db query of mdl_flies to a text file and save that to the same location of your backup.

Below all on one line:

mysql -u [SUPERUSER] -p'[PASSWORD]' -e "use [DBNAME];select contenthash,filename,filearea,filesize from mdl_files;" > filesinmdfiledir.txt; cat filesinmdfiledir.txt

That will create a filesinmdfiledir.txt file where ever you execute the command.

Do the same query with only contenthash and output to contenthash.txt file.

contenthash.txt file will look like:

fff95bc3223987c470d8b7136c80a81b2e52d598
fff9bbe073fb17a518a3f6655ae183c4a714db5e
fff9cf91a21b1245339f6796f9311643aab6b8a3
fffa63b4213c9d3bc41b062462ca539bd5c5f7f4
fffd0e4b6b8ff3c5b695c5a0247b886d6df68d9a
ffff33f1741c7642b97874bbfcfa57a0c5aa51c3
ffff49b90e578acd220d82274209d44eac5eb23c
ffffc0f1973d4b72070be30da3377382f69c8887
ffffc0f1973d4b72070be30da3377382f69c8887

You then could loop the contents of contenthash.txt file through find.

#!/bin/bash
#
cd /var/www/html/moodle/admin/cli/;
for i in `cat /path/to/contenthash.txt`
do
    echo "Contenthash in que:" $i;
    find /path/to/moodledata/filedir/*/*/$i
done
echo 'Done!';

Then look through contenthash.txt file for errors ... ie didn't find the contenthashed file.

Above queries/scripts offered as is ... feel free to edit! smile

'spirit of sharing', Ken


Average of ratings: Useful (2)
In reply to Ken Task

Re: How to remove unused files from the moodledata directory?

by Ken Task -
Picture of Particularly helpful Moodlers

Should have added that the scripts/queries above might peg your CPU to 9x% (depending) ... they are working on something very massive.   So don't run them during prime usage time for your Moodle ... after hours ONLY.

Also, one might want to direct these lines to a text file:

    echo "Contenthash in que:" $i >> contenthash.txt;
    find /path/to/moodledata/filedir/*/*/$i >> contenthash.txt

Text files created will be very large.

Am certain a better bash shell programmer could make what has been offered better/more efficient, etc.. smile

'spirit of sharing', Ken


Average of ratings: Useful (2)