Strange NFS problem in rare case of deleting/creating same file

Strange NFS problem in rare case of deleting/creating same file

by sam marshall -
Number of replies: 6
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

I think there is probably a super-rare issue with the Moodle filesystem when using NFS, but am unable to satisfactorily reproduce it - that is, I can reproduce it every time with a specific piece of (custom, local) code but can't reproduce it with a simplified test. So I'm giving up on the issue and doing a workaround but thought I would post here in case anyone else has encountered it.

The issue is that an AJAX script of ours sometimes gets the following errors:

Warning: filesize(): stat failed for .../moodledata/filedir/93/bf/93bf66a8d4ca85ce4ab4d357169e8a24871415a7 in .../lib/filestorage/file_storage.php on line 1763
Warning: sha1_file(.../moodledata/filedir/93/bf/93bf66a8d4ca85ce4ab4d357169e8a24871415a7): failed to open stream: No such file or directory in .../lib/filestorage/file_storage.php on line 1766

Here are the lines of code:

        if (file_exists($hashfile)) {
            if (filesize($hashfile) === $filesize) { // 1763
                return array($contenthash, $filesize, false);
            }
            if (sha1_file($hashfile) === $contenthash) { // 1766

Looking at this, you can see that file_exists is returning true, but then the following errors indicate the file does not actually exist.

The error only occurs on a system with two webservers using NFS for a shared moodledata. There is no need to make simultaneous requests to trigger the error - you can trigger it by making one request, waiting until it completes, then making another a second later. (The error doesn't always happen, but it reliably happens after <10 iterations, usually only three or four.)

So the error is consistent with the supposition (I found this information online) that NFS by default caches information about file existence for 3 seconds, but does not cache if you actually open the file, so therefore the file_exists can return true but actually opening it for sha_file will fail, if the file was deleted by a different server.

A certain way of using our script [if you change something then change it back] causes it to modify the contents of a file between two values, i.e. the file contents is A, then using the script it gets set to B, then using it again gets set back to the previous contents A. This will have the effect of repeatedly deleting a file (A) and then creating it again within a short time period.

All quite straightforward so far, right? ;) I was going to file it in the Moodle Tracker and submit a patch...

...But the story now comes to a crashing halt right here, because basically I've been trying to make a short test script that does the exact same thing as my real AJAX script, and calling it in the same way (using a javascript test harness and an iframe) on the same 2-server system where the real one fails, but I cannot for the life of me get that one to fail.

So I think my speculation about the cause of these errors is probably wrong and something else is involved. I've run out of time, but if anyone else has any suggestions, do let me know. smile

--sam

Average of ratings:Useful (1)
In reply to sam marshall

Re: Strange NFS problem in rare case of deleting/creating same file

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Just to add one other data point - I've changed my automated reproduce JavaScript and can now confirm that:

- the error DOES occur if the delay (between the script which deletes the file and the one that then recreates it) is 14.1 seconds.

- the error DOES NOT occut if the delay is 15.1 seconds.

So probably there's a 15 second timeout affecting this. (I don't know what the settings are on our NFS servers.)

--sam


In reply to sam marshall

Re: Strange NFS problem in rare case of deleting/creating same file

by Sergio Rabellino -
Picture of Particularly helpful Moodlers Picture of Plugin developers

You should try with actimeo=1 as NFS mount option.

You can find an interesting explanation here: https://www.sebastien-han.fr/blog/2012/12/18/noac-performance-impact-on-web-applications/

hope this helps.

In reply to Sergio Rabellino

Re: Strange NFS problem in rare case of deleting/creating same file

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

Thanks. I actually already found that article - it's an interesting option and probably would solve the problem, but it does seem to have a performance cost, which seems unfortunate for such a rare case!

The thing really confusing me is how I wasn't able to reproduce the problem with a simple test case...

--sam

In reply to sam marshall

Re: Strange NFS problem in rare case of deleting/creating same file

by Sergio Rabellino -
Picture of Particularly helpful Moodlers Picture of Plugin developers

Are you sticking your client to a chosen web server ? This could help because if you bounce between the two servers for every request, i believe it's sure that you will encounter inconsistencies in a NFS environment.

Then you have two choices, abandon NFS and acquire a parallel Filesystem (GFS or anything else) which does not suffer of latency in file management (but they do locking on synchronization resources, so you'll have some computational costs if compared to NFS) or stick your client sessions to a single web server, balancing the clients between the two  web servers.

The actimeo=1 (or a bigger value) can help, but theoretically do not solve the latency problem in the NFS.


hope this helps.

In reply to sam marshall

Re: Strange NFS problem in rare case of deleting/creating same file

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers

And for the record, here is the workaround code I ended up using (this is in my local code, in the point just before we are going to try to save the new file [which might be identical to a previous one that was recently deleted by a separate request]):

        // WORKAROUND (TFS 137036): On our multi-server systems there appears to be some kind of
        // NFS caching issue when it tries to save the new file, where it caches existence but not
        // whether the file is open or not. (This is probably a general Moodle filesystem issue
        // but I was unable to reproduce it except in this code.) To work around, let's try to
        // open the file, as this causes the cache to be updated.
        $contenthash = sha1($newcontent);
        $filepath = $CFG->dataroot . '/filedir/' . substr($contenthash, 0, 2) . '/' .
                substr($contenthash, 2, 2) . '/' . $contenthash;
        if ($fp = @fopen($filepath, 'rb')) {
            fclose($fp);
        }

This solves the problem, presumably because after attempting the fopen, the file_exists call later on now returns false.

--sam

Average of ratings:Useful (2)
In reply to sam marshall

Re: Strange NFS problem in rare case of deleting/creating same file

by Albert Ramsbottom -

Sounds to me like you are not load balancing with session affinity


What LB are you using

Albert