New file management

New file management

by Petr Skoda -
Number of replies: 24
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers

I feel we need to make some redical changes in handling of uploaded files (file.php).

Current problems:

  • course data and module data are mixed
  • missing access rights - students can guess filenames and access files in data directory
  • backup does not relink shared files between courses
  • no place to store user data - student portfolios, etc.

I propose to add modfile.php and userfile.php. The fuction of the invidual scripts would be:

file.php
Proper DMS with access rights (user, group, time, ...). Teachers have administrative access and decide who can read/write. Data location $CFG->dataroot.'/coursedata/coursenumber/'. It could be linked by course number or course IDnumber (alternative linking preserved between courses during backup/restore).
modfile.php
Module data, links generated by module code only. Only admins can browse directories. Access rights defined by modules in function xxxx_modfile_check() from xxx/lib.php. Data location $CFG->dataroot.'/moddata/moduleinstancenumber/'.
userfile.php
Storage area for user portfolios, user shared files. There are already several hacks that could use it. The main benefit would be better integration of third party extensions in the future. It will take some time before DMS is introduced into file.php.

I think that the transion would be quite fast & painless and it could solve many future/present problems. I wanted to keep this post short, but I am already working on a longer PDF document, let me know if you like this idea wink

skodak

Average of ratings: -
In reply to Petr Skoda

Re: New file management

by Martín Langhoff -
Great idea!

It would be a really good thing if the logic for access control is self-contained, rather than delegated to the modules. Serving of heavy files with PHP is a major bottleneck for high-volume; so I'm interested in maintaining a mod_perl port of it if possible.

(Why mod_perl, you ask? Because it allows us to intercept the authorization phase of Apache, check the cookie, permissions, anything, and tell apache whether to allow or deny. The actual serving of the file is done by Apache's core. It's way faster, consumes minimal memory, and supports byteranges and other oddities.)
In reply to Martín Langhoff

Re: New file management

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
I agree, there could be several alternative implementations of file.php. If we separate module data, it should be much easier.

Small sites could use some lightweight PHP DMS solution, large sites need something more robust wink

In reply to Petr Skoda

Re: New file management

by John Papaioannou -
+1 for this as well! approve

A point of note here is that as far as I have seen from the DMS in contrib/ the permissions etc. system needs to be worked in such a way so as to allow easy and powerful administration of access rights. The most fine-grained access control system doesn't help a lot if you have hundreds or thousands of files in your site.
In reply to John Papaioannou

Re: New file management

by W Page -
Hello Petr!!

I think I understand what you are proposing to improve the file system. Just wanted to ask about the following.
  • Jon had the files encased [if that is a correct term to use??] in code after upload into the DMS he worked on. Will the process you are proposing be able to do that as well so it is not so easy to identify files?
  • Is there some way to have the following placed automatically within each file name for assignments/activities:
    • a student's/teacher's name?
    • a version number?
    • submission date?

Thanks in advance for your response.

WP1



In reply to W Page

Re: New file management

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi WP!

1/ I do not understand the first question. Maybe I can explain it a bit more. The three scripts are only interfaces, you can implement the actual storage in any way you like. The problem with file.php is: it isdoing too many things - it is very hard to improve it further. If we separate module data and course data we can immediately improve security of module data and deploy much simpler DMS for ordinary course data.

2/ Modfile.php should hold only code common for all modules - some basic security checks and sending of files to users. All other logic should be moved into modules. Yes, it should be possible to implement those changes in assignment module itself.

Modules should be able to override:
  • access control
  • proposed file name for saving
  • caching directives
  • file content
  • byteserving

skodak
In reply to Petr Skoda

Re: New file management

by W Page -
Hi Petr!

Thank you for the clarification and additional information.

What I was referring to in the first point was the following. If one uploaded a file into the DMS Jon worked on and looked at the file from the server end all one would see is each file represented by random numbers. If one looked at the files from within the DMS interface, the file names were present.

For example after a file was uploaded into the DMS
  • From the DMS interface - scienceinquiry_010.doc
  • From FTP - 5674839284765

scienceinquiry_010.doc and 5674839284765 represent the same file. The random numbers make it more difficult for someone to identify files from the server end.

WP1
In reply to Petr Skoda

Re: New file management

by Samuli Karevaara -
There was talk about having WebDAV support for Moodle, but it ran into some problems with the core of Moodle file handling (?). Petr, could you explore a possibility to have another option of a data folder for WebDAV accessed files?

This way the teachers (mainly, why not students also) could maintain their documents more or less directly from their own computers, with for example Novell NetDrive or similar.
In reply to Samuli Karevaara

Re: New file management

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi Samuli!

I am not planning WebDAV support myself. If we move module data from general file area, the implementaion could be easier.
In reply to Samuli Karevaara

Re: New file management

by Joseph Rézeau -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators
I'm rather disappointed sadthat WebDAV support seems to have been abandoned at the moment (is this the case?). From a teacher's point of view, WebDAV support makes updating one's online courses so much easier.

I experienced the change from the old, cumbersome "zip - upload - unzip" process to the click and put (from within e.g. Dreamweaver) or drag and drop (from Windows explorer) process in WebCT about 2 years ago. And now that my institution is moving from WebCT to Moodle I am back to the old cumbersome system angry. Please, please, before talking about DMS systems, etc. can we agree that providing a DMS system for students to upload the occasional file to their assignment folder is very different from the needs of a teacher who has to upload dozens of files (including images, etc.) daily?

If WebCT could provide WebDAV access, why can't Moodle do it? (I know nothing of the tecnicalities).

Joseph_R
In reply to Joseph Rézeau

Re: New file management

by Michael Penney -
If WebCT could provide WebDAV access, why can't Moodle do it? (I know nothing of the tecnicalities).

Hi Joseph, probably the main reason is that WebDAV would be useful to a small number of Moodle sites (though these would be the larger sites), most Moodle users are on hosts where WebDAV is not going to be supported any time soon.

Of course the second reason is $$, WebCT currently has more of it. If your institution is saving $7,000-$10,000 this year by moving to Moodle (or >$100,000 if you chose Moodle over Vista), can you spend that this year on adding WebDAV to Moodle?

If so, we or one of the other programming groups working on Moodle can probably get WebDAV going for yousmile. As more larger institutions with dedicated servers and support staff make the move to Moodle, hopefully some of them will start spending some of their savings on adding features like WebDAV that would be useful for enterprise class customers.
In reply to Michael Penney

Re: New file management

by Joseph Rézeau -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators

Hi Michael.

Thanks for your prompt reply to my query. I will pass on the message to the "powers-that-be" in my institution and see what comes of it.

I'm afraid, however, that the $$$ saved from making the move from WebCT to Moodle this year will mostly go down the drain, or has already been spent on some other budget black eye.

Joseph_R

PS Until you mentioned it I had no idea I belonged with the group of "enterprise class customers" cool...
In reply to Joseph Rézeau

Re: New file management

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
WebDAV has always been possible, it's just very manual on the administrator. All you need to do is set up WebDAV under your Apache server, and then map directories directly onto the course file directories in Moodle, so that individual teachers can access the files there via WebDAV.

If you forward your license fee that you spent last year on WebCT to moodle.com I'm sure I can find a way to make it more seamless.

(Edit: I just saw Michael's reply above after writing this, but I'll leave it here anyway  big grin)

In reply to Joseph Rézeau

Re: New file management

by Martín Langhoff -
Don't be disappointed. It is perfectly doable, but hasn't seen enough interest/funding (edit: yet!).

The trick with WebDAV support is that it's tricky to implement Moodle's "access restrictions" (authentication/authorization) so that WebDAV doesn't become a means to circumvent those. The good news is, we are maintaining a mod_perl module that provides *only* authorizations/authentication controls on behalf of Apache and then gets out of the way and lets apache serve it. It's pretty simplistic at this stage, but as the Moodle side evolve, we expect to update the mod_perl version to match.

Given that Apache2 has an excellent WebDAV module, you could use that WebDAV module with our authentication/authorization handler. It won't work out of the box, but it's 90% there. Feel free to grab it from contrib and enhance/customize it inhouse or to help fund its development.

I sure want to see this module more widely used and refined, as it opens new doors to Moodle scalability, making static file serving almost as efficient as a pure Apache setup.
In reply to Martín Langhoff

Re: New file management

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
I'm not jumping up and down to start relying on mod_perl modules ... I'm sure it can work very well for your installation but I don't want add all those dependencies to Moodle itself.

Has anyone tried the PHP WebDAV server in PEAR?   It looks promising.
In reply to Martin Dougiamas

Re: New file management

by Martín Langhoff -
Midgard is using the PEAR module to implement WebDAV access to database-stored objects, with good results apparently.

I wasn't expecting Moodle to depend on mod_perl just yet either (*) the mod_perl trick is one of higher performance. For those reading about it for the first time, it isn't an official bit of code.

* - depending on mod_perl won't be a big issue once we have ported Moodle itself ;)
In reply to Petr Skoda

Re: New file management

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Transition scenario:
  1. change $CFG->dataroot to $CFG->cdataroot in all moodle code (with exceptions)
  • add $CFG->cdataroot = $CFG->dataroot; to lib/setup.php
  • search & replace, do not change filters, sessions, users, etc.
move course files to coursedata directory and change $CFG->cdataroot
  • bump up version number and during upgrade move all data to directory coursedata directory
  • $CFG->cdataroot = $CFG->dataroot.'/coursedata';
add modfile.php, patch modules and backup routines to use moddata directory
  • patch modules to use new modfile.php and move module data to $CFG->mdataroot = $CFG->dataroot.'/moddata';
  • no need tu hurry, first Assignment
userfile.php can wait till somebody comes with some extension

The hardest task would be the backups, the rest is quite easy.

In reply to Petr Skoda

Re: New file management

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Overall I like the idea of better logical separation for the file areas (ie scripts).

I don't think the directories need to be restructured to that degree though. It would be nice to keep all the data for a course in one place, for example.
In reply to Martin Dougiamas

Re: New file management

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers

I've been considering this and I'd like to throw these ideas out...

What if we used the resource table for all uploaded files on Moodle. The resource table is already set up to handle uploaded files and is identified by course. It already has useful information such as an independent name (from the file name), a description and a type.

Every file that is added to Moodle (via the file interface) would be added as a resource to the course it was added to (or site, for site files).

What would this do?

  • It would allow us to use the resource description information and course protection.
  • We could further use the course_module table to check for visibility and group modes.

Now, as a start to file access protection, add a new table: resource_access. This table would contain records with:

  • id primary key
  • resourceid the resource this record is for
  • roleid 0=everyone, 1=editing teacher, 2=teacher, 3=student (further id's can be used in future role schemes)
  • userid can be used for specific user
  • groupid can be used for specific group
  • read 0 or 1
  • write 0 or 1
  • create 0 or 1 (only relevent to resources that allow this i.e. directories)
  • delete 0 or 1
  • ... more ...

For this table, no record for the resource means status quo. The file is available as it is now. This could be changed down the road to be locked down rather than opened up.

The roleid '0' could be used to restrict/grant access to everyone, but leave the standard Moodle rules intact for teachers and administrators. A value of -1 would indicate that one of the other fields (userid, groupid) is being used.

The roleid's 1 through 3, allow granting/restricting by specific role, and if we add more roles, we can add more records.

Userid and groupid would be used to specifically provide access by user or group.

The key thing here is that absence of a record means access stays as is. You only provide a record when you need to.

Now, file.php could use all of this to grant/restrict access. It would check:

(currently)

  • File is valid.
  • Logged in to course.
  • For site, logged into site (if necessary).

(new)

  • Any access records (look up the resource id by the filename).

We could also check the course module records for group mode and group access as well as visibility.

When this system is added to an existing installation, a script would run through all courses, searching for files in the course data directories that don't already have resource entries and create them.

It might also be smart to implement a file naming convention that uses the resource id in the filename to speed up looking for access records. This would mean we wouldn't have to search for the filename in the resource table.

Sorry. I've been a little out of touch with the 1.5 plans for this type of thing and roles. If this is moot, let me know. I figured since you posted here a couple of days ago, we still needed solutions for this.

Thoughts?

mike

In reply to Mike Churchward

Re: New file management

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
(Further thoughts...)

Files that are added as files - not resources - could still be added as resources without being a course module. An entry can exist in the resource table with the assigned course without adding it to the course module table. This would prevent a file resource (not a file resource module) from being displayed in a course section or in the resource module index. We might actually be able to add it as a course module as well (with section zero), and use the visibility field and groupmode fields.

Using these method would give us a fairly simple way of modifying file.php to provide protection over how a file gets served up.

mike
In reply to Mike Churchward

Re: New file management

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
Is this of interest? Should I pursue this, or are there other plans?

mike
In reply to Mike Churchward

Re: New file management

by Michael Penney -
How would this work in a course? Eg if all files are loaded as resources in a course, would they show up in a topic/week?

One of the current problems is with image files included as part of a web page composed in lesson, book, html resource, etc. One can hide a resource, lesson, book, etc. and (for resources) that blocks browse (via filename guessing w/ file.php) access to data (word, pdf, etc.) files in 1.5, but not to images included in a html page.

Should we even try to implement access rights on embedded images? Seems to me this could make things get pretty slow if every single image in a book, multipage resourse, etc. has it's access checked each time the html file is viewed?

Also how would your system work with user (student) specific files in a portfolio?


In reply to Michael Penney

Re: New file management

by Mike Churchward -
Picture of Core developers Picture of Plugin developers Picture of Testers
> How would this work in a course? Eg if all files are loaded as resources in a
> course, would they show up in a topic/week?

Not if they were just files. To show up in a topic, an activity or a resource has to be a course module. There is a separate table for this. As a side effect, if a resource (or activity for that matter), exists in its own table but not in the course module table, it won't show up in the course page listings - but its still there with the proper course reference. This system would take advantage of that.

With images, if they are accessed via 'file.php' then this system would affect them. So, there would be at least one database access for each image (to get the resource record), so, that could be a bottleneck. But, if we change the way we do this, we could use index references instead of filename, so the database access could be on a primary key (which is supposed to be as fast as a file access, right?).

As for student files, if we get a standard way of accessing files that includes privilege checking, we could work on developing the rest.

I think a key thing here is to use a style similar to a unix directory access system. That way, looking at a directory of files requires just checking the directory access record. Access to a specific file would use the file's access.
In reply to Petr Skoda

Re: New file management

by Michael Penney -
Hi Petr, as MyFiles installs as a block, should it use 'userfile.php'?


In reply to Michael Penney

Re: New file management

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi all moodlers!

Sorry for being out for quite a long time, I will get back to coding in about two weeks from now wink

skodak