Allowing modules to handle file requests

Allowing modules to handle file requests

by sam marshall -
Number of replies: 15
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Hi,

This is my first post so let me just say hi. I'm a developer for the Open University, which will shortly be using Moodle. Currently, we are working to build in features so that Moodle can take over course websites from our previous system. In the process we're having to make various changes, some of which affect Moodle core. Where these are generally beneficial, we'd like to contribute them to the community if desired. This post is about one such thing.

Basically the way file download HTTP requests work (as of the version I have) is a bit limited. File requests go through moodle/files.php which then applies security checks and sends the file. This means that if a module needs to add or reduce security checks for a specific class of file, it has to edit files.php. For example, I have a module that, once you've uploaded files to one course, it's possible to share them to other (specific) courses without duplicating the physical file. But this doesn't work without modifying files.php because if a user on course 4 requests the file /3/moddata/mymodule/whatever.dat they won't be allowed. (This is just an example, I also want to restrict people from downloading it in specific cases.)

Rather than modifying files.php every time there's a situation like this, or every time I change the module's logic for whether you are allowed to download the file or not, I decided to modify it once smile My solution does the following:
  • For file requests of pattern /courseid/!via/modulename/..., ask module which file to actually send. (The exclamation character was added because it's legal in URLs and unlikely to be used at the beginning of a genuine existing foldername. I commonly use it to indicate a 'system' magic name.)
  • For file requests of pattern /courseid/moddata/modulename/... (existing standard module data folder), check whether module uses this new mechanism. If it does, deny the request (otherwise handle as before).
  • All other requests handled as before.
So basically, if your module implements the right function modulename_process_file_request, then requests for anything within its moddata are automatically rejected. Instead, all requests must go through the !via URL. (If it doesn't implement that function then everything continues as at present.)

modulename_process_file_request takes as parameters the course ID and the rest of the path components (listed as ... above). It returns a path to the real file (most likely to something in that moddata folder, but could be anywhere) or FALSE if the given arguments don't match a file that the user is allowed to have.

Does that sound sensible and useful? (It works for me smile

If so, I will port the changes (a few lines to files.php and two new functions to filelib.php) into the current moodle HEAD code [we are working on an older version due to various stupid code infrastructure problems], and work up an appropriate patch.

(BTW I also made some other minor changes to filelib.php, basically just allowing reverse lookup of mimeinfo via mime type as well as extension. Those aren't really a big deal though.)
Average of ratings: -
In reply to sam marshall

Re: Allowing modules to handle file requests

by Matt Oquist -
Hi Sam,

I ran into exactly this problem last night.  I'm working on a new portfolios module, and I'm storing the portfolio artifacts in a file hierarchy under $CFG->dataroot.  I thought to myself "...and now I'll just use files.php to...oh, crap."

So since last night I've been hacking at the file_manager module to add some structure and APIs to the code; I've uploaded my first-crack changes to that code in a MyFiles discussion thread if you're at all interested in it.  As you probably know, the MyFiles block allows you to upload and share files among users.  As I mentioned in my latest post in the above thread, there's no reason sharing can't be accomplished on a course level, as well.  (Oh -- looking at that thread, I didn't end up mentioning this, though I meant to.)

I've also been informed that Martin Dougiamas himself is redesigning access-controlly sorts of things, so presumeably there will be a totally capable and cool authentication/access-control/sharing framework coming up real soon.

So I'm personally interested in what you're doing with files.php, though I'm already hoping to use MyFiles to solve my specific portfolio-related problems.  (I really need the fine-grained access control.)  If the file_manager block provided generic APIs so that any other module can create its own file hierarchy under dataroot and provide download URLs with appropriate access controls, wouldn't that meet your needs, as well?
In reply to Matt Oquist

Re: Allowing modules to handle file requests

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
I hadn't looked at it, but I read the thread. Yes, it probably would achieve the desired results, but using separate download code means you may not benefit from other improvements to file.php; it also creates multiple points of failure from a security perspective. I presume the security issue is why there's a single file.php in the first place.

I'd initially been planning to just implement download code within my module, thus sidetracking file.php in a similar way. Then I thought, hang on, why not make file.php not suck? smile I think this way of doing it fits well with the existing module interface, such as it is. Actually, it could also be used to remove most of the existing module-specific hacks in file.php (e.g. for assessment module) at a later date, although I personally certainly don't plan to change anything there.

I think we (OU development team) may get Moodle cvs access... if we do, and Martin Dougiamas (or someone) doesn't object, I'll put this code in there. He might have another plan for achieving the same thing though in which case, well, we'll just keep this code in our local tree. We already have quite a lot of core changes [i.e. not in a module or other separated code area] in our local tree, though, and we've only been developing on it for a month or so! It would be nice to reduce the difference.
In reply to sam marshall

Re: Allowing modules to handle file requests

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hello Sam!

I did all the security hacks and most other improvemens in file.php during the last year. The present code is not optimal at all, but I have many ideas to improve it. The only problem is that I do not have enough free time to implement it all by myself.

I have described it a bit here - seems we are trying to achieve the same goals smile

Anyway I am staying home these days, tired of being ill (some sort of flu) - I guess I could gather some more info about this problem and propose a solution...

skodak
In reply to Petr Skoda

Re: Allowing modules to handle file requests

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Here is my first attempt:
  • module files are excpected to be in "dataroot/modfiles" - separation of mod files and course files is needed very badly; we would need to migrate old data and rewrite backups
  • access control is done from mod/xxxx/lib.php function xxxx_validate_modfile(); please note that the file location is not harcoded in URL, modules can change path in $options object smile
  • future DMS should handle only course files (served by file.php)
TO DO:
  1. add xxxx_validate_modfile()
  2. migrade files from "dataroot/courseid/moddata/x/..." to "dataroot/modfiles/coursemoduleid/..."
  3. fix backups - the hardest part!!
  4. add new admin script for browsing of module files
  5. write docs for module authors
What do you think about it? Should I continue or wait till 2.0?

<?php // $Id$
      // This script fetches files from the dataroot/modfiles directory
      // Syntax:      modfile.php/cmid/filename.ext
      // Workaround:  modfile.php?file=/cmid/filename.ext

    require_once('config.php');
    require_once('lib/filelib.php');

    // extract relative path components
    $args = explode('/', trim(get_file_argument('modfile.php'), '/'));
    if (count($args) < 2) { // always at least course module id and file name
        error('No valid arguments supplied');
    }
    $cmid = (int)array_shift($args);
    $relativepath = implode('/', $args);

    if (!$cm = get_record_sql("SELECT cm.*, m.name FROM {$CFG->prefix}course_modules cm, {$CFG->prefix}modules m
                               WHERE cm.id = '$cmid' AND cm.module = m.id")) {
         error('Invalid Course module ID');
    }

    if (!$course = get_record("course", "id", $cm->course)) {
        error("Course is misconfigured");
    }

    // security: login to course if necessary
    if ($course->id != SITEID) {
        require_login($info->id);
    } else if ($CFG->forcelogin) {
        require_login();
    }

    session_write_close(); // unlock session asap

    // module is free to change any of these defaults
    $options = new object();
    $options->allow = true;
    $options->relativepath = $relativepath;
    $options->pathname = $CFG->dataroot.'/modfiles/'.$cm->id.'/'.$relativepath;
    $options->filename = $args[count($args)-1];
    $options->lifetime = 600; //ten minutes
    $options->filter = false;
    $options->pathisstring = false;
    $options->forcedownload = true;

    ///execute access control from module
    @include_once($CFG->dirroot.'/mod/'.$cm->name.'/lib.php');
    $accessfunction = $cm->name.'_validate_modfile';
    if (function_exists($accessfunction)) {
        $options = $accessfunction($cm, $course, $options);
        if (!$options->allow) {
            error('Access not allowed');
        }
    } else {
        error('Access not allowed');
    }

    if (!$options->pathisstring) {
        if (!file_exists($options->pathname)) {
            not_found($course->id);
        }
        if (is_dir($options->pathname)) { //do not serve directory nodes!!
            not_found($course->id);
        }
    }

    // ========================================
    // finally send the file
    // ========================================
    send_file($options->pathname, $options->filename, $options->lifetime, $options->filter, $options->pathisstring, $options->forcedownload);

    function not_found($courseid) {
        global $CFG;
        header('HTTP/1.0 404 not found');
        error(get_string('filenotfound', 'error'), $CFG->wwwroot.'/course/view.php?id='.$courseid); //this is not displayed on IIS??
    }
?>

In reply to Petr Skoda

Re: Allowing modules to handle file requests

by Eloy Lafuente (stronk7) -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Hey skodak,

could you explain why we must move everything from dataroot/courseid/moddata/activityname/moduleid to dataroot/modfiles/coursemoduleid/ ? (sure there is a good reason but I cannot see it...). blush

Ciao smile

In reply to Eloy Lafuente (stronk7)

Re: Allowing modules to handle file requests

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi Eloy,

IMO the separation simplifies many things in the long term view:
  1. DMS system does not need to handle access to mod files - for example course files can be handled by perl scripts; I suggest we move coursefiles to "dataroot/coursefiles/courseid/..." too to clean up the dataroot dir.
  2. it stops people from linking to mod data files - there should be no links to modfile.php in database, because backups can not handle them
  3. we could later share module instances in several courses
  4. simplifies coding and security - mod author decides about access control
skodak



In reply to Petr Skoda

moving modfilea out of course files is not good

by Ne Nashev -
In our practise we some times need to correct files in moddata directory: it may be image files attached to forum posts by our students - when this images is too large and make thread dificult to read. Or it may be post folder with attached zip - there are some times good way to make more then one attached file, thanks to post that display in attachment corner all files in his directory. I think exists mach more situations when access to moddata folder in cource files is wery useful!

Anyway, I think what take out acess-checking functions from file.php and to modules is good work and need to be maked, but it is not too simple question...
In reply to Ne Nashev

Re: moving modfilea out of course files is not good

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi!

One of the reasons to move modfiles out is to probihit such unauthorised corrections in moddata made by teachers. Only admins should be allowed to do so. If there is any missing functionality, it can be IMO implemented in modules - this should be much, much more teacher friendly wink


skodak
In reply to Petr Skoda

Re: moving modfilea out of course files is not good

by Ne Nashev -
In a big organizations no time to admin look in cources and make any corrections - thay only look to properly working of site engine.


It's bad circle - while i have file access to attachments a hav'nt any reasons to make correspond feachure requests to modules, but you can't leave me without file access while do'nt provide repleacement feachures in modules %)

May be leave it as is? Румянец

in any time you can 1) add admin (site-wide) or teacher (cource-wide - it's better) setting - to hide or not hide moddata folder in cource files with request to think twisely before turning on visibility and making direct updates in module data and make feachure requrest to disallow this turning in future and 2) make protected module data visible only for module - and leave non-module protected data visible any-way
In reply to sam marshall

Re: Allowing modules to handle file requests

by Matt Oquist -
Yep.  I'm agreeing with you completely.  Also, the MyFiles block is really going to take some work to get in line with the specified Moodle coding guidelines, and also to get in line with code maintainability and extensibility.  mixed  I've dropped that pursuit and am, at this very moment, implementing a generic access control module that will probably never be used by anything but the Portfolios module that I'm building.  But hey, it's generic anyway!

So I'm very interested in your file.php changes now.
In reply to sam marshall

Re: Allowing modules to handle file requests

by Matt Oquist -
Hi Sam,

I'm ready for your version of file.php now.  smile

Could you please upload it as an attachment so I can hack with it a bit?
In reply to Matt Oquist

Re: Allowing modules to handle file requests

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
OK, since I haven't been working on the latest version due to technical difficulties here, it won't cleanly patch etc - so rather than attaching the files I will just paste in the added bits.

This first part goes immediately after the require_login check and before 'only editing teachers can access backups':

    // security: prevent direct requests to data for modules that handle
    // their own file requests. This check occurs here rather than lower
    // down, so that the modules themselves can return paths within their
    // moddata. Note that we strtolower components of the path here in
    // order to avoid attacks that depend on case-insensitive filesystems;
    // the 'real' paths are case-sensitive.
    if(count($args)>=3 && strtolower($args[1])=='moddata' &&
      module_handles_file_requests(strtolower($args[2]))) {
        error('Access not allowed');
    }
   
    // Allow modules to handle certain file requests by dealing with
    // remaining parameters and then returning an 'actual' filename
    // to send (this allows for modules to protect downloads
    // themselves). URL syntax:
    // /coursecode/!via/modulename/arbitrary/parameters/for/module
    if(count($args)>=3 && $args[1]=='!via') {
        $pathname=process_module_file_request(
            $args[2],$course->id,array_slice($args,3));
        // Update args array to match new file so that
        // following security still works if needed 
        if(strpos($pathname,$CFG->dataroot)===0) {
            $relativepath=substr($pathname,strlen($CFG->dataroot));
            $args = explode('/', trim($relativepath, '/'));
        } else {
            $args=array();
        }
    }


And here are the two new functions in filelib.php:


/**
 * Processes a request for a file that should be handled via a particular module,
 * which might point to any particular file it wishes to be served.
 *
 * Relies on a module function modulename_process_file_request, which takes
 * as parameters $courseid and $args, and should return either a full path to the
 * file or else FALSE to generate error.
 * @param string $modulename Name of module (already checked to contain only safe
 *   characters, but might not actually exist)
 * @param int $courseid Course ID (already checked, exists)
 * @param array $args Additional arguments in the request (any path components
 *   that came after the /coursecode/!via/module/ bit)
 * @return string Full path to actual file that should be served (otherwise will
 *   call error)
 */
function process_module_file_request($modulename,$courseid,$args) {
    // Check for ordinary characters to avoid security hole by accessing
    // stupid filenames
    if(!preg_match('/^[a-z0-9]+$/',$modulename)) {
        error("Invalid module name in !via file request");
    }
   
    // Get lib.php for module
    global $CFG;
    $modlib = "$CFG->dirroot/mod/$modulename/lib.php";
    if (file_exists($modlib)) {
        include_once($modlib);
    } else {
        error("The $modulename module is missing important code! ($modlib)");
    }
   
    // Find function
    $function = $modulename."_process_file_request";
    if(!function_exists($function)) {
        error("The $modulename module does not provide file access.");       
    }
   
    // Call function to handle request
    $filepath=$function($courseid,$args);
    if($filepath) {
        return $filepath;
    } else {
        error("The $modulename module does not offer a file for that request.");
    }
}

/**
 * Returns true if modules handle their file requests by including a
 * modulename_process_file_request function.
 * @param string $modulename Name of module (not yet checked to
 *   contain only safe characters, but might not actually exist)
 * @return bool True if the module has the given function, false if
 *   it doesn't (or the name wasn't a valid module name)
 */
function module_handles_file_requests($modulename) {
    // The reason this returns false all over the place rather than
    // giving errors is to support legacy behaviour in case this URL
    // isn't to a real module or something weird.
   
    // Check for ordinary characters to avoid security hole by accessing
    // stupid filenames
    if(!preg_match('/^[a-z0-9]+$/',$modulename)) {
        return false;
    }
   
    // Get lib.php for module
    global $CFG;
    $modlib = "$CFG->dirroot/mod/$modulename/lib.php";
    if (file_exists($modlib)) {
        include_once($modlib);
    } else {
        return false; // Not really a module, so doesn't handle requests
    }
   
    // Find function
    $function = $modulename."_process_file_request";
    return function_exists($function);   
}



There you go. Sorry for the delay in response but I was busy with other things!

As for concerns about reducing security, this is less likely to reduce security than using a separate file.php as there are still checks that take place (or can be added) after the module has updated the filepath. It's true that there are some checks that modules can bypass this way, and with injudicious acceptance of the $args parameter, security risks are possible. But frankly since anyone who needed to do anything like this before probably ended up making their own file.php or editing the existing one willy-nilly, this system is a big improvement.

(Also conceptually it is a flaw in the architecture if modules can't control file requests, since modules are supposed to be independent... I know there are lots of other flaws but this one was easy to fix. smile
In reply to sam marshall

Re: Allowing modules to handle file requests

by Gustav W Delius -
Sam, I very much agree with your proposal. In fact it sounds very close to what we are using here at the University of York. Take a look at Alex's description of our hack and the philosophy behind it at http://moodle.org/mod/forum/discuss.php?d=14058#67655

I would very much like to see your new file.php go into Moodle CVS, then we can stop having to upgrade our patch each time we upgrade to a new version.
In reply to sam marshall

Re: Allowing modules to handle file requests

by Eloy Lafuente (stronk7) -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Hey Sam,

it sounds like a clever idea, although I can imagine some potential (just potential) problems that I would want to share with you:

- Currently, file.php, although is really restrictive (only files from the same course...) it's the unique point of access to moodledata internals. Above you confirmed that creating a new, specialised file.php to serve your module data wasn't a proper solution (duplication of code, security risks..) but immediately I thought that your alternative (to delegate the decision to one "modulename_process_file_request" function will be also risky, because, if I'm not wrong it will return one full path that will be served by file.php without limits at all. So, imagine one buggy function in one module. It would grant access to all the readable filesystem! Uhm, I know this is hypothetical, but possible.

- I suppose that, for now, all you want to do is to use that "hack" to allow your module to "take the control" of what it can serve and I'm not really sure if it's a good idea because you could develop one module able to serve files from other modules and it would have big implications in modularity and actions like backup/restore because mixing data isn't a good idea from my experience. The same is applicable to data belonging to the same module but in other courses. I'm sure that we will losing encapsulation possibilities if such type of cross access is allowed.

- My initial feeling points to the creation of one proper system to store and handle all those files that could be used by more than one activity in Moodle (call it DMS, repository... or whatever you want). Instead of to delegate the logic to each module, I think we should build it from core, allowing modules to see, use and share files across it. I know this is a long task but, if added now, your hack could cause problems later (in my opinion).

- Anyway, the best way to discuss about it is to see the code. Can we take a look to it?

Only my initial and personal opinion. Please don't think it's negative. The idea is really clever but perhaps my previous considerations could help to modify them a bit (or totally). I don't know...wink

Ciao smile
In reply to sam marshall

Re: Allowing modules to handle file requests

by Bernard Boucher -
Hi all,
it looks like an old problem if I understand it correctly.

A couple years ago I made a simple ( but not optimised or updated ) suggestion that may help to approach or solve the problem:

The idea is to check the last Moodle log when the user request a resource: if the module ( resource, quiz, ... ) in the last log make a reference to the requested resource then send the resource to the user even if the user is not registred in the resource's course.

That way , the security of orphan files ( not linked yet to a module ), guessed filenames or linked files of hidden modules, dated quiz, group , activities locked ( conditionnal activities ), Michael Course Program block, ... are all respected with a simple file.php modification.

If the user don't have access to a module for any reason, it must not have access to its resources.
Inversely, a log trace of an accessed module is sufficient to grant access to its resources anywhere in Moodledata because the access to that module was granted by Moodle and it respect all Moodle normal restrictions.


I hope it is appropriate and that it may help,

Bernard