RFC - Remote object repositories -- consolidating implementations

RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -
Number of replies: 50

The "repositories in moodle" scene has been quite exciting lately... actually I hadn't realized just how interesting things had gotten until I had a chat with MartinD today. Wow. As far as I can see we have:

  • MartinD wrote resource/type/hive
  • Eloy wrote resource/type/ims
  • Helen (and Tom?) wrote mod/object which turned into resource/type/object and now has been folded into Eloy's resource/type/ims.
  • Matt Oquist has been working on a full-blown repository API
  • Jun Yamog and myself have been working on a simple repository API, reusing Eloy & Helen's code in resource/type/ims but with a plugin API approach to the "repository".

And last, but not least...

  • In 1~2 weeks we enter the FREEZE for 1.6. We should be stable. And full of features. And solve famine too.

Heh. How's that for fun?

(Have I got the stories right? I might have gotten lost in the twists and turns. Hope I haven't accussed anyone of other people's commits.)

Anyway, I am looking at trying to consolidate some of those implementations, trying to target the 1.6 freeze, a simpler UI for users, and something that generally makes sense internally, at least in the sense that we don't paint ourselves into a corner. Being a bit naive in some aspects is not bad if the path forward remains clear...

Here's my proposal -- with an offer to get it done in time for the freeze. I want to know if it works for everyone, and I am pretty sure my plan has some holes in it... and that you'll kindly point them out wink

  • I can work with Jun to refactor the resource/type/ims plugin to split off the repository handling code, turning that into a repository plugin for the simple repo API. So resource/type/ims focuses on dealing with IMS packages, and not with repositores.
  • Similarly, we can get the current Hive plugin to conform to the simple repo API trivially. Actually, with some minor changes to the API.
  • The standard file selection dialog is the point where repo searches/browsing happens (I think Matt's code is similar on this respect) so all modules that use the file selection dialog get repo searching/browsing automagically.
  • Still, there are a few extra tricks we can teach mod/scorm, specially around Hive repos

This will effectively merge the work that Helen/Eloy have been doing on the mod/resource side with the work Jun has done on the repository handling side -- focussing on the strengths of each implementation. One thing I don't want to do is to lose any important functionality.

Now, this doesn't take Matt's work into account, which is somewhat unfair and totally due to my lack of knowledge about it. I am not so familiar with his work, from the little I could understand from MartinD's description, it has a different set of assumptions about the repo backends than the simple API Jun has worked with, and I don't really know what the status is.

Matt, if you are reading, can you fill us in? MartinD mentioned that it is meant to provide an FS virtualization of sorts, and it sounded to me that it expects the backends to be rich, responsive and available, but I am probably wrong on all counts wink In any case, what I am interested in understanding is:

  • How ready it is
  • Whether you can help us fit it in our plans instead of the simple repo API
  • Even better, if you see a way forward using today a simple API, and growing/evolving that API into your full blown implementation towards 2.0. That would be the best of the 3 worlds I suspect.

(Also: I'll post a description of the work Jun and I have been doing so this makes more sense).

Phew. Quite a bit of stuff to discuss. Naturally, MartinD is concerned that we'll be doing all this crazy stuff just before freeze, and that it shouldn't happen on HEAD because it may not be so stable in time. So anything we do should happen on a branch. Something like MOODLE_16_REPOSCRAZYWORKJUSTBEFOREFREEZE should do the trick wink

The main difference with the simple API I am talking about is that it assumes that the repository looks filesystem-ish -- where this simple api has a few required calls, and is perfectly happy with repositories/backends that are not always up, or reliable, or browseable. We are extending it to support browsing optionally (as Hive offers the feature) but it works equally well with repos that are Debian-archive-like.

Average of ratings: -
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

Some background for those following at home...

A bit more background on the OSLOR implementation -- which I know hasn't been discussed much in the public space sad

  • MartinD tells me that repo is not a good directory name. Shall be renamed to repository wink
  • When designing it, we planned to support a range of backends with different features and limitations. Our thinking was in supporting Hive-like "smart" backends as well as Debian-repository-like "stupid" backends.
  • It provides infrastructure for repo plugins to maintain their own db tables if needed (similar to modules).
  • Each repo plugin defines its config page (similar to auth and enrol plugins), all reachable under /admin.
  • A single Moodle install can have many backend repos configured (as a debian system can have many archives in sources.list)
  • It is up to the plugin how tight/loose the integration with the backend is. "Stupid-backend" plugins don't really need the backend to be there.
  • Once the file/resource is found, we copy it to the course directory in moodledata and pass the ball to the module. This means that the existing modules don't need much rework to take advantage of it. MartinD tells me this is a significant difference with Matt's work. In any case, smartish backends like Hive sometimes provides just a URL rather than the actual file. Not sure if all modules will like this without some changes wink

The OSLOR work on Moodle has a complementary part in work we have done on GNU EPrints, which is a great (Learning) Digital Object Repo, with metadata support and other niceties. We have taught EPrints to create a published archive that looks a lot like a Debian archive (possibly the most popular Digital Object Repository today by any standard). The main differences are that we don't care about 'architecture' and Packages is now an XML file full of IMSPackage/SCORM-ish metadata.

The plugin for that is a bit like apt-cache for those familiar with the Debian infrastructure. It fetches the index and performs all the searching locally -- optionally it can fetch all the objects too, so it is a boon for disconnected Moodle setups.

If you want to fetch the OSLOR code, you can get a snapshot in tar.gz or tar.bz format from the gitweb page or (even better) install a recent GIT and Cogito and do

  cg-clone http://locke.catalyst.net.nz/git/moodle.git#mdl-oslor

Feel free to ask any questions about it. I have surely forgotten something...

END_OF_SPAM wink

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Ger Tielemans -
The moodle resource plugin for:
Ariadne/Edna/CGIR/LRC Federated repositories?
It works also under 1.6, it gives me especiallay nice EDNA examples
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Markus Knierim -
Hi Martin,

Just a quick comment from a humble Moodle user/admin desperately waiting for a reposity/DMS component wink: IMHO, the most basic and critical feature of a native Moodle repository is the ability to share resources (uploaded files) across courses by making them browsable/linkable via Add a resource >> Link to a file. As far as I know, none of the current development efforts (myDMS, Filemanager/MyFiles, Matt Oquist's repository) include this function, except for MartinD's Hive plugin. Of course, this would include the ability to grant access permissions by course (as in Matt Oquist's repository). As you can guess, I'm strongly in favor of linking to reposity files rather than copying them to the course directory.

Thanks for all your effort!approve
Markus
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Eloy Lafuente (stronk7) -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Peer reviewers Picture of Plugin developers Picture of Testers
Hi moodlers,

I really think that the repository API is a must so it should be implemented soon and, more important, once (i.e. it must be simple, functional, robust and stable along Moodle evolution).

Also, it should provide all the interfaces to work against different types of "conceptual" repositories (plain repositories, tree-organised repositories...) and offer them to Moodle in a coherent way (i.e. independent of the repository, "moodleized" presentation).

I must recognise that I've become a bit exhausted (it's really later here tongueout ) when I've seen Matt's specs list in this discussion but I'm pretty sure he knows more about it than me so such ideas/implementation could be a good start.

One important (in my opinion) point that should be really well defined is how to define what every type of repository does. Some of them could simply, expose "things" to Moodle (browse, search and upload/download, synchronise/update) where others could also "play" the "thing" (without the need to fetch "things" from them at all). And if both these types of repositories must be supported (it's only a personal vision) then, perhaps, the API should enable also a new sort of actions like "configure", "play", "get grades".... (just typing some quick actions). Specially if we are moving to the Moodle Community Hub after 1.6 where Moodle itself will act as a repository for other "friends".

So, yes, yes, yes. Or, if you prefer it, +10 for it. cool

About the IMS CP resource type, after some conversations with Helen, we decided to maintain my "simpler" resource type for 1.6 (without the local repository of materials) and to add their IMS CP module to contrib until it was ready to to the switch. So, currently in HEAD, there isn't IMS Cp repository code at all (just some display improvements in my TODO list are pending to be 100% ready). So, if the API arrives, the correct path should be to use it directly, and the current "local repository" code would be simply a repository plug-in, sure!

I'll keep connected to this discussion. Really interesting! cool

Ciao smile

P.S.: Although not directly related, some words to think about: metadata, transportability, moodlets, permissions, replicate changes... wink
Average of ratings: Useful (1)
In reply to Eloy Lafuente (stronk7)

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
stronk7 says: Also, it should provide all the interfaces to work against different types of "conceptual" repositories (plain repositories, tree-organised repositories...) and offer them to Moodle in a coherent way (i.e. independent of the repository, "moodleized" presentation).

Do you think we're getting at that goal with our discussion of browesable v. only-searchable repositories? What other kind of UI do you picture a repository having?


In reply to Eloy Lafuente (stronk7)

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Oh, man. My further-edited reply got eaten. sad

Here's another go at it:

stronk7 also says: And if both these types of repositories must be supported (it's only a personal vision) then, perhaps, the API should enable also a new sort of actions like "configure", "play", "get grades"...

That sounds good. It makes sense to me that we should define a set of APIs that includes all those sorts of actions, and we can retain both robustness and flexibility if we implement all those APIs in the base class (repository_base in my implementation) to return appropriate values on behalf of plugin subclasses that don't implement all the APIs. Then all the APIs can be used throughout Moodle without messily checking for their existence, and each plugin only needs to implement the set of APIs to do its own specific job for its repository type.

Maybe this part is obvious to everyone, but I thought it was worth mentioning here.

Hmm, alternatively, maybe we should have an I'm-not-implemented return value as well/instead. It would be nice if it was, say, '', because that evaluates == false but does not === false, so callers could make the distinction as necessary. Maybe some APIs should just return 'true' even if they're not implemented, but others should return our I'm-not-implemented value, whatever it is.


In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Julian Ridden -
Gee, Times like this I really wish I knew more technically to be able to actively participate in this discussion.

So while I can add nothing technically, there were a couple of points I would like to make.

Stronk is right, this needs to be put in (in some form) sooneer rather than later. There is a huge demnd (mine included) for a centralised searchable repository for resources tha are used across multiple courses. Now these may include learning objects, multimedia, help files, you name it.

I have also been following Matt's portfolio /repository work with grea interest and it is shaping up to be a fine peice of code.

But, all said and done, we need some kind of centralised storage system. And I am not talking "site Files" here. The question of how to meet such a wide range of requirements is certainly beyond my ability. But I would say, as always, listen to the community. What are the users wanting. Lets not overcomplicate this..but, at the same time lets make sure we put good groundwork in place for standards (woohoo API).

Let me know when we getto interface design and then finally I might be of some real use smile
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

Matt, in terms of API implementation, Moodle has gotten a lot of mileage from not having the API calls in a base class. So you can ask all the time, trivially, method_exists(). Similarly, if you look around, the code does function_exists() quite often to discover what modules can do.

If we want to have a base class with some non-API helper methods, we can have it. And if we want to have a 'template' class for reference, we can have it too.

Having API methods in a baseclass has a very nasty side-effect in FOSS projects. Assume N number of strangers have implemented plugins that conform to your API, but those are kept private, are in contrib and you don't care for them. And then you add a new optional method -- the moment that you add it to the base class, it is implemented for all those plugins, silently, whether they like it or not -- 99% of the cases the plugins will break. sad

It is a popular strategy, even for APIs, when you are dealing with in-house projects, where everyone is within shouting distance. And that's ok. But in FOSS projects I'm not so excited about magic default inherited methods...

BTW, I recently found this entry in wikipedia: http://en.wikipedia.org/wiki/Action_at_a_distance_%28computer_science%29

And there I got to read about the Law of Demeter ("an object should only interact with other objects near itself"), and in a sense, in a FOSS space, all the "distances" are far greater.

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Martín Langhoff contributes: Matt, in terms of API implementation, Moodle has gotten a lot of mileage from not having the API calls in a base class. So you can ask all the time, trivially, method_exists(). Similarly, if you look around, the code does function_exists() quite often to discover what modules can do.

Point taken.  I have no particular committment to one way or the other, and now we have a decision made.  smile

Initially it was for my own development process that I went ahead and put the API calls in my base class, so I had a useful place to list APIs without defining them yet.  But then I started to think it might be a good way to do things permanently -- and now I see why it probably isn't a good way.
In reply to Eloy Lafuente (stronk7)

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

About the IMS CP resource type, after some conversations with Helen, we decided to maintain my "simpler" resource type for 1.6 (without the local repository of materials) and to add their IMS CP module to contrib until it was ready to to the switch.

I think that there's a lot of overlap with Helen's code, and that if we just had a plugin that dealt with the repository being a local directory, we would be roughly meeting her needs. But I'd have to has her... Helen? smile

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Helen Foster -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators
Hi, and apologies for the delay in replying - these extra-long posts take time to read through. thoughtful

To briefly explain, the object module was developed in response to a clearly defined need in UK Further Education (and more recently in UK Adult and Community Learning) for a repository solution for over 800 hours of Government-funded e-learning materials, known as the NLN materials.

We were committed to developing the object module into a repository resource type for Moodle 1.6, and response to feedback requesting the ability for teachers to upload packages, we included Eloy's IMS CP code in it, however this feature is currently completely separate from the central repository.

As Eloy mentioned, our repository resource type may be found in cvs:/contrib/ims. It would be great if you could make use of it. approve
In reply to Eloy Lafuente (stronk7)

Re: RFC - Remote object repositories -- consolidating implementations

by Bhupinder Singh -

Hi All,

As a functional user I would like to suggest that it would be nice if  the repository is truly generic and has the ability to  store files which can vary in size from < 1 MB to greater than 100 Mb . The ability to store in varied formats (format Independent).

This I am adding from the perspective of use of the repository and Moodle in Healthcare Setting for education  and content delivery.

Best Of Luck

Garry

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Wow - this certainly is exciting.

How ready it is -- I think it's close to being a workably complete API, but nobody (including me! (to my knowledge)) has carefully reviewed the entire API, and I'm sure it has some weaknesses that need to be changed. I changed a few things when I modified my portfolio module to use my repository, but a good, careful review of the API would be a very good thing.

DEMO: You can check out my repository work already running at http://portfolio.spdc.org/portfolio/. Just log in and poke around the "File keeper" block. If you want admin access just email me after you create your own account (and if I don't already know who you are, I might ask to know a little about you first). Admins have the additional abilities to create arbitrary files/folders in the top-level repository folder (just like root access to the root directory) and change resource ownership.

Whether you can help us fit it in our plans instead of the simple repo API -- Yes, I can do that. I agree that starting with the simple API and evolving into a more-capable one may be the best approach. (I just spoke with my sponsors, and I have funding to work on this. Yay!) Here's the only cramp on my involvement -- I'm presenting the portfolio module to Massachusetts educators on March 20th, and I definitely need to move that effort significantly forward by that time. So if I want to be working like an insane person, I can surely do both! big grin

I did try to make relatively few assumptions about the backends, but I'm sure some unintentional assumptions snuck in, too. Firstly, I (like you, I believe) started with a base class that can be extended by plugins for each type of supported backend.
  • filesystem-ish: Yes, that tallies with my primary assumption about which API methods each plugin will implement. Ahh - in my in-code documentation about the API, I didn't make a distinction between APIs each backend should implement and APIs each may implement. I've added appropriate comments now, and I'll commit to CVS soon so it's public. But here's what I was thinking each backend should implement:
    • new_resource($type=0, $path='', $resourceid=0, $interactive=false, $new_resource_data='', $new_version=false, $inherit_access=false);
      • $type: REPO_TYPEFILE, REPO_TYPEFOLDER, REPO_TYPEURL, etc. Right here we see an assumption that backends won't be adding additional resource-types that the base doesn't know about... but at the moment I can't think of any problems that would arise if a plugin did add its own custom resource-type. Interesting.
      • $path: optional -- /filesystem/like/path within the repository; i.e., each repository is like a DOS drive, with its own "root directory". This is a path to an existing non-folder resource (if the backend supports versioning or overwriting old resources) or else the parent folder of the new resource. $path always takes precedence over ID in every API that has them both.
      • $resourceid: optional -- specifies a resource by ID instead. The ID is stored in the object as $blah->nativeid, and should be set by the repository plugin to any ID that the backend may associate with the resource. Hmmm. Possibly bad assumption: All backends will have IDs for resources. (Removing this assumption should be feasible since almost all the APIs take in $path and an ID.)
      • $interactive: bool -- Whether to display errors, warnings, etc. interactively.
      • $new_resource_data: assoc. array of data for a new resource (name and content for a file, name and URL for a URL, just name for a folder)
      • $new_version: bool -- helps to handle some simple checking about filetypes. This parameter should probably be dropped; it won't hurt anything for plugins that don't support versioning, but it's probably just unnecessary/confusing paranoid overkill.
      • $inherit_access: bool -- whether or not to inherit the ownership and access_controls of the parent folder. Backends without ownership and ACLs can just ignore this, of course.
    • new_file($resource='', $new_file='', $interactive=false, $source_path='', $inherit_access=false)
      • $resource: object -- parent folder (or old file version)
      • $new_file: optional, object/array -- new file data with displayname and content
      • $interactive: bool -- same as above
      • $source_path: optional, for copying legacy Moodle files in from $CFG->dataroot
      • $inherit_access: bool -- same as above
    • create_folder($folder='', $new_folder_name='', $interactive=false, $make_parents=true, $preexisting_ok=false, $inherit_access=false)
      • $folder: object -- parent folder
      • $new_folder_name: string
      • $interactive: bool -- same as above
      • $make_parents: bool -- if necessary, create parent folders
      • $preexisting_ok: bool -- return "true" if folder already exists
      • $inherit_access: bool -- same as above
    • read($fileobj='', $expected_type=0, $add_parents=true, $interactive=false)
      • $fileobj: object -- resourse we're going to read (should probably be "$resourceobj" or just "$resource")
      • $expected_type: optional -- expected REPO_TYPE* of $fileobj
      • $add_parents: bool -- whether to add a runtime-only Parent folder when reading a folder's contents
      • $interactive: bool -- same as above
    • delete_resource($path='', $fileid=0)
      • $path: optional -- in-repository /path/to/doomed/resource (takes precedence over $fileid)
      • $fileid: optional -- $blah->nativeid of the doomed resource. This should probably be $resourceid instead.
    • get_file_object($path='', $nativeid=0, $expected_type=0)
      • $path: optional -- in-repository /path/to/resource (takes precedence over $nativeid)
      • $nativeid: optional -- $blah->nativeid of resource
      • $expected_type: optional -- REPO_TYPE* that we expect to get (return false if specified and non-matching)
    • get_file_path($nativeid=0, $crumbs=false)
      • $nativeid: $blah->nativeid of resource whose path we want
      • $crumbs: bool -- generate path as a breadcrumbs list for the header
      • The native plugin I wrote has an additional parameter, but that parameter is specific to that plugin.
  • few required calls -- The preceeding list is relatively short, I think. One thing I have not done is go through my repository_base class carefully to be absolutely sure I never assume that other methods are also implemented. Let me be more specific: There are ~10 other methods that must exist for each plugin, but they can be empty and simply return appropriate values. Since each plugin provides its own controls to the user interface, these unimplemented methods should only be called when the APIs are used by other code in Moodle.
  • perfectly happy with repositories/backends that are not always up, or reliable -- The plugin itself is assumed to be responsible for establishing a connection to the (possibly remote) backend, so if the connection isn't up or can't be established, the plugin will have to deal with that (and probably just return "false", etc.)
  • perfectly happy with repositories/backends that are not always...browseable -- same as above, but I should note that the repository_base class provides only an architecture within which each plugin can implement its own browsing methods. The repository_base method is list_index(), and the plugin methods follow:
    • list_index($path='', $ids=array(), $recurse_levels=0, $form_action='', $hiddens=array(), $return=false, $headers=true, $columns=array(), $menu_names='', $exclusive_type=0)
      • $path: in-repository /path/to/resource
      • $ids: array -- list of resource IDs of resources to display
      • $recurse_levels: int -- number of levels to recurse down under folder(s)
      • $form_action: <form action="$form_action">
      • $hiddens: assoc. array -- print hidden form inputs
      • $return: bool - return the index or print immediately
      • $headers: bool - print headers
      • $columns: array -- ordered list of names of columns to display in the index table
      • $menu_names: array -- ordered list of menus to display for the index
      • $exclusive_type: REPO_TYPE* of a single resource type if we only want to display files, folders, URLs, etc.
    • file_display_index($table='', $file='', $baseurl='', $formname='', $columns='', $recurse=0)
      • Used to add one or more rows to an "index" table that displays a group of resources (primarily used for browsing through folders). This should probably be named "resource_display_index()".
      • $table: object -- for print_table()
      • $file: object -- file for which to display index row(s)
      • $baseurl: URL to which the form will submit
      • $formname: form in which table will be printed
      • $columns: ordered array of names of columns in the index table
      • $recurse: int -- how many levels down to recurse if $file is a folder
    • function index_menus($formname='', $menu_names='', $return=false)
      • Used to create menus that apply to the entire index table.
      • $formname: form in which the menus will be included
      • $menu_names: ordered array of names of menus to create (right now I'm using 'groupaction', 'add', and 'search' in the native plugin, but this is entirely plugin-dependent)
      • $return: bool -- return the menu data instead of printing immediately
    • file_action_menu($file='', $formname='', $return=true)
      • Used to create an action menu that applies to a single resource. (This should probably be "resource_action_menu()".)
      • $file: object -- file for which we're creating a menu
      • $formname: name of the form this will submit to
      • $return: bool -- return the menu data instead of printing immediately
I hope this gives a better picture of what I've done and what I've had in mind.

If we want to head toward an evolutionary approach from the simple toward the complex, then we'd best
  1. review the APIs at either end of the spectrum to be sure of our starting and ending points, and then
  2. craft our starting set to be extensible to the ending set by only adding parameters to existing methods and adding new methods.
It's possible that we could then use a simplified(?) version of the repository_base class to start with, and grow toward a more complex ending repository_base.

I have a paper due tomorrow (I'm a grad student) that I really need to start on, so I should stop with this for now. But tomorrow I could possibly take more of a look at the simplified API...nah, I'll look now.

The following are the class methods you list in the above-attached PDF:
  • cron: Great idea to fetch updated package/resource lists regularly. (The native plugin can just "return true".)
  • simple_search: Also good idea, more mature than what I slapped together for searching. Hmm. Alternatively, perhaps the search form could provide separate text boxes for each field? Then the backend doesn't have to parse the text. Is there a reason to support the complex text parsing? (I know lots of repositories use things like au:foo ti:"bar baz".) If we have a class method that returns a list of supported search fields, then the generic/simple search form can simply disable (or disappear) the unsupported ones.
  • advanced_search_form: great. Perhaps this method simply prints out additional search fields after the simple search form is include()d?
  • advanced_search_results: This might be doable with the current way I have columns defined and displayed in the index tables, if a tabular format is sufficient here. repository_base->list_index() will call down into the plugin's index-displaying methods, and those methods put whatever is appropriate into the index table. Perhaps if we add a parameter to repository_base->list_index(), then we can specify a "level" or "type" of display that plugin->file_display_index() should insert into the table. If a tabular format is inappropriate here, then a separate method is appropriate.
  • fetch_object: Yep. I called it get_file_object(). ("get_resource_object()" would be a better name.)
  • object_path: Yep. I called it get_file_path(). ("get_resource_path()" would be better.) We might be making different assumptions about what these paths are; as I mentioned above I'm picturing something like DOS disks, where each repository has its own independent filesystem tree, and all the paths that get passed around (or that I expected to be passed around) in the repository API are already associated with a particular repository.
With all that said, the only significant things I know my repository implementation needs are paging and backup. Right now it attempts to return/display everything under a folder every time, and that really breaks when you have 10,000 things in a folder. If we get these taken care of and we do some serious testing (and it holds up sufficiently under testing), I think we could consider using this implementation now and integrating the other repository work with it. Here are further thoughts along those lines:
  • It shouldn't be too difficult to fit what you've already done in OSLOR into the my repository infrastructure. Based on my reading of your PDF, anyway, the simple APIs should fit right in.
  • Just to note again -- paging must be added. I didn't even think of it, which is a huge oversight. This will involve changes to the repository_base class and my "native" plugin.
  • Just to note again -- backup must be added. At least the native plugin stores all its folders and files on the actual HDD, so those will get backed up by normal system backups. But the metadata and ACL stuff has no backup yet. This will involve changes to the repository_base class and my native plugin.
  • Just to note again -- testing must be done. I've obviously tested things myself, and I think a couple other people have poked around at things casually, but this needs to be seriously hammered at. I've done just a tiny bit of performance testing.
    • I added 10,000 files to a folder and immediately removed them, because what was the point? Without paging, the repository was useless to display that folder so I could do anything with it.
    • I shared a file to 3,000+ individual users, and then un-shared from all of them at once, through the user interface. IIRC, this completed, but it took a few minutes. (Only 50(?) users at a time are displayed in the right-hand column, just like the create-new-admins interface, so sharing to 3000+ users simultaneously -- through the user interface -- is not possible.)
  • Performance tuning (of my "native" plugin) -- I have not made the slightest attempt to make this effecient. I wanted to make it work first, and increase complexity for the sake of efficiency after I had it working. But I think there is quite a bit of low-hanging fruit here; there are several places we could just add an array parameter to a method and get it (and the database) to process lots of resources all at once. This functionality would replace PHP loops that do database lookups, call other methods, and do database writes for multiple resources. Sloooooooow.
  • We must keep in mind that adopting the "native" plugin for my repository architecture requires us also to adopt the access_control class that I've implemented. Of course, I think it's a fantastic piece of work wink, but others may disagree or at least have reservations. It attempts to provide completely generic ACLs that can be applied to absolutely anything (even concepts, roles, etc.) in Moodle. There is some slightly out-of-date documentation of the access_control class API here. There is also a flowchart of how it does access-checking here. (If the flowchart isn't perfectly up-to-date, it's close.)
    • At the moment, the access_control class supports ownership and access_control (of whatever type -- read, write, rename, etc., plugins can add their own) by userid, by groupid, and by courseid.  We're planning to add a system by which expiring access keys can be created, so that non-Moodle users can be sent URLs containing access keys that grant them a specific access for a given amount of time.  ("Click here to view my portfolio.  You will have access until March 25th...")
    • I'm planning to add configurability so the administrator can grant/deny access for students/teachers to publish/share (etc.) by userid, groupid, or courseid.  Put more simply, and by example, I'm thinking that a school might not always want kids to be able to share their files to one another, or to entire groups or courses, or by key.  Also, some schools may want to disable (especially) access-by-key altogether, to make sure that only Moodle users can access data on the site.
OK, now I need to go work on that paper.
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Thinking more about your Debian-like-archives comment (and not, as you can see, about my paper), the create_folder() API isn't at all necessary as long as all the other class methods behave appropriately when passed-in $paths may include such folder names.  If you have a flat archive that only provides a search-based UI, then you just strip the filename off the end of any $path that you receive, and implement create_folder() { return true; }.

We would also need to add a mechanism by which a repository plugin can specify whether or not it supports browsing, or better yet, what method should be called when the user selects the repository for "browsing".  Maybe a non-browsing plugin simply overrides repository_base->list_index() with a pass-through call to the searching interface... or maybe we add a new required method ("main_display()") which is then a pass-through to whatever primary method of browsing/searching/whatever each plugin supports.  Oh - I guess we have lots of parameter-list-related questions to answer with these solutions.  But there's my brain-dump, which will hopefully inspire us toward better ideas.
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Still not thinking about my paper...

My repository code is all updated in CVS, with some minor clarifying comments in repository_base.php about the API, and with some lately-added-features bug-fixing done.

The demo system is also updated with the newest repository code. Please try to break it (the "File keeper" block) and let me know how you did it. smile
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

Wow.

This is a really complete (and complex) API. I'll answer a few of you. A few clarifications on our API first...

  • cron: Great idea to fetch updated package/resource lists regularly. (The native plugin can just "return true".)

We want to be able to use method_exists() to query for optional methods.

  • simple_search: Also good idea, more mature than what I slapped together for searching. Hmm. Alternatively, perhaps the search form could provide separate text boxes for each field? (...)

Simple search means we cannot assume the backends know about fields or complex metadata. It's up to each backend.

  • advanced_search_form: great. Perhaps this method simply prints out additional search fields after the simple search form is include()d?

Advanced search is completely in the hands of the plugin. This is a bit on purpose wink It could have a super-AJAX browse/search interface. Or redirect elsewhere (with SSO, to another webpage/webserver/whatever). Or open an ActiveX control (yuck).

We are trying to support "repositories" that are not smart, or that work mostly in a disconnected fashion. The plugin does a lot of work in those cases. In the case of smart repos, the plugin can be lighter.

(...)

  • object_path: Yep. I called it get_file_path(). ("get_resource_path()" would be better.) We might be making different assumptions about what these paths are;

I think we are -- our model (so far!) is that in the last stage, the plugin copies the file into the course directory. We are looking (thinking of Hive) of letting it return just a URL that the module can use. That'll need some changes in the modules so that they know what to do with it wink

Reading your spec, I am starting to wonder whether we are dealing with two very different concepts of repository. Perhaps we can bring them together, and yet, there could be some differences that mean that it makes sense to do both things (but find different names for them).

The repo API you are working on has some assumptions that if I understand them right are:

  • Work with a DMS that is online -- that is, available all the time. If the DMS is offline, files are not available.
  • Assume the DMS is filesystem-ish. (offtopic, but I'm curious: how do you do non-memory-bound fopens() from the DMS?)
  • Assume you can write to the DMS as to a FS

In our repo stuff the assumptions are that

  • The repo can be a stupid distribution repo, just like a Debian archive. (And you don't need it to use the content).
  • The repo can be smart like Hive is. (And you need it online to use the content).
  • The API has to be super-simple. Most moodle APIs have just a couple of calls and I think that's part of the magic.
  • Writing to the repo is via repo-specific mechanisms, not necessarily fs-like. Some repos can let you push content into an "inbox" area, where the QA team picks it up and decides what to do with it.

Matt, what are your thoughts?

(I am starting to thing that we could end up doing something that bridges the gaps if we can push more of the browsing work away from the API and onto the plugin. I have to run now, but stay tuned.)

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
> We want to be able to use method_exists() to query for optional methods.
I'm on-board with this now.

> Simple search means we cannot assume the backends know about fields or complex metadata. It's up to each backend.
*nod*

> Advanced search is completely in the hands of the plugin.
*nod*

> Reading your spec, I am starting to wonder whether we are dealing with two very different concepts of repository.
I think we are. I looked at "Site files", files.php, My Files, and the needs of my portfolio project, and thought that Moodle needed a single, common, extensible API that could handle all file-storage needs. Along the way it made sense to implement generic ACLs, too, so I did. I was really thinking of it as a fully-featured filesystem that you use through Moodle (API and UI).

I had some vague ideas in mind about how remote/disconnected repositories would work, but I didn't spend any time working out any of the details (I was/am on a tight schedule and wasn't sure I was going to succeed with even my more modest goals).

In a sense, as I look now at the needs of the sort of repository you're looking at, it makes sense to me that your repository would actually cache files and/or metadata in the repository I implemented. And it was my intention all along to provide a way for courses, forums, and everything else in Moodle to stop messing with anything on the HDD directly, and use the repository system instead. (...in part because the "native" plugin I included with it comes with so many features that courses, forums, etc. could use, such as versioning and user-group-course-aware ACLs; it should be very easy to share a file from one course to another)

  • Work with a DMS that is online -- that is, available all the time. If the DMS is offline, files are not available.
    • Not addressed directly by me or my code, but not true. Plugins can do caching; I imagined that if we integrate your OSLOR work with my repository your plugin will cache files/metadata in the native repository somewhere. (And access to all of this can be locked down so that nobody can get to it through the UI.)
  • Assume the DMS is filesystem-ish. (offtopic, but I'm curious: how do you do non-memory-bound fopens() from the DMS?)
    • That's a good point I've neglected to raise. I have two calls to fopen(); one in the read() method and one in the new_file() method. Neither deals in the slightest with the possibility of enormous files.
  • Assume you can write to the DMS as to a FS
    • Yes/no. There is a logical separation between the repository_base() class and my native plugin, and now that I know the Moodle Way is to use method_exists(), I'll change the repository_base to check before it calls any class method. Plugins don't have to implement any of these writing methods if they don't want to.
    • I should clarify; now that we're having this conversation I'm seeing more of the diversity in repositories than I had imagined before. So my above list of "should-implement" APIs can really be eliminated entirely, because each plugin can implement only what it needs to use. Maybe there's a read() method, or maybe there's just a play() method, etc.
  • The repo can be a stupid distribution repo, just like a Debian archive. (And you don't need it to use the content).
    • You're just referring to the caching, right? I intended to leave caching possibilities open, even though I didn't need to address the issue for the single plugin I was doing to start with.
  • The repo can be smart like Hive is. (And you need it online to use the content).
    • I've never actually seen or touched Hive. I should. But yes, the plugin I implemented is naturally online, since it's local (and just for now I've assumed
    1. that it will be there,
    2. and the ID in the database is actually #defined to be 1).
The API has to be super-simple. Most moodle APIs have just a couple of calls and I think that's part of the magic.
  • Hmm. Point taken.
  • I formerly worked on operating systems, so I now see your point that this API that I thought was quite simple (especially given the complexity of its task) is perhaps not. I think a lot of the complexity comes from the general nature of the code; it's trying to be all-things to all-callers.
  • Assuming that, at some point, we want feature-rich repository functionality in Moodle, do you think that can be achieved with a significantly simpler API than what I'm proposing? If so, how?
Writing to the repo is via repo-specific mechanisms, not necessarily fs-like. Some repos can let you push content into an "inbox" area, where the QA team picks it up and decides what to do with it.
  • *nod* Given that we're now looking at method_exists() checking and we're not going to assume that any APIs exist, is this still a problem? Callers, including the repository_base itself, won't be calling new_file() if it doesn't exist, and we can call send_file(), or submit_file(), or whatever() instead.
I, uh, still need to get to that paper. I have <12 hours before I need to leave for class.blush (But this is so much more interesting right now!)
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

I, uh, still need to get to that paper. I have <12 hours before I need to leave for class. (But this is so much more interesting right now!)

Good luck with that. I'm feeling guilty that I'm distracting you with this random stuff... I mean -- I should wait 12hs before posting more here wink

Thanks for being so patient with all this API discussion. It is definitely heading in interesting directions.

As you say, we are ending up stretched to do too many things for too many scenarios. Be all things to all sorts of people is kind of a bad spec... perhaps we can design something that does all we need, and yet keep all the parts simple.

I am now thinking along the lines of building from the OSLOR spec. Right now, from a user's perspective, there's a "file from repositories" button right next to the "file from coursefiles" button. This opens a dialog with the simple search box, and links to advanced search for the repo instances that support it. For repos that support browsing, we should offer that too.

The Hive plugin (not part of the oslor api yet) does that already, and we intend to port it to the oslor api. The API should ask if(method_exists('browse_url')){$url=$repo->browse_url()}. If a plugin supports it, we provide that link to the user. Browsing itself is not in handled by Moodle's infrastucture -- it may be something fs-like, but it could be something wild and wacky.

(Of course we can offer good support to build fs-like browsing there, useful functions, a sample implementation, etc. Letting plugin writers free at that point is more interesting -- there's so much you can do with DMSs that have rich metadata and annotations... I'm sure stuff will emerge that we would have never thought of.)

Actually. I'll go further and say: advanced search and browse could be folded into one same thing. It's up to the plugin author to define. The only thing I don't like of that plan is that I don't know what words to put in the link.

In terms of what happens when you have picked the file from the repo, there are 3 things that today's plugins are returning:

  • The repo plugin gives us a url (hive does this)
  • The repo plugin puts a file in coursefiles and tells us the path
  • The repo plugin provides an open filehandle

You'll notice I'm not listing the "we read the file into a variable and return that" option. Of course we can have it, but I'd really try not to. It is not something we can really do in a scalable fashion. Beyond the obvious limitations in being memory-bound, the Apache/PHP combo has a nasty memory handling model that means that we really pay for our memory handling sins sad

(It's a long story. If you are interested in a full read, check out Stas Bekman's mod_perl guide, which has a juicy section on memory handling. mod_perl and mod_php have the same issues.)

Now, if we get a url, we pass the url to the page that invoked the whole file selection dialog thing. I am not sure if this is kosher with all modules and file selection dialogs. What should backup/restore do if it gets a URL? I suspect the API should provide a reliable means of fetching a real file -- this sounds easy but note that the system serving that URL may be expecting cookies or other credentials that need to be emulated by Moodle.

And if we get a URL, we can't use filters on it. Not too bad, but a repo plugin with this limitation should mention it. If we get a file, or a filehandle of course we can do filters on it.

I am realising, some of this may depend on having fopen support URLs. Hmmmm.

If the plugin does give us a real file things get very easy for module writers, content filters, etc. What we are really losing from a rich DMS point of view is the 'always offer latest version' feaure. But we can offer that in other ways.

If we maintain a table with files in moodledata that exists in moodledata but are actually "tracked" by a repo plugin, we can teach file.php to check this table before serving a file, and perhaps instantiate the repo plugin and ask about the file freshness. Trick here would be to make this fast -- letting plugins give us a cache lifetime for instance so that file.php can just serve it without asking during the cache validity lifetime.

File moves/renames/deletions via the filelib intefaces should update this table too. A gotcha there is sysadmins moving files by hand sad (give them a smack!) Or they could be in a special subdirectory under course files (cachedfiles) where filelib doesn't allow renames/creation.

Backups can then safely skip those files as they are cached and non-authoritative. Perhaps they should save the record from this "cached files" table.

What do you think so far? As you can see, I am keen on cheating here and there to still be able to use the real FS, because it is a certain way to keep things simple, be fast, scalable, etc. And it makes things easier when we reuse other GPL code.

(I haven't dealt with file creation and writing yet. But my brain is fried now... so tomorrow...)

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
> Good luck with that. I'm feeling guilty that I'm distracting you with this random stuff... I mean -- I should wait 12hs before posting more here wink

Ha!  I'm accountable for my own procrastination, whether for noble and worthy reasons or not.  big grin

But, I am going to hold off on a complete response to your post at this point to complete the paper.
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
You'll notice I'm not listing the "we read the file into a variable and return that" option. Of course we can have it, but I'd really try not to. It is not something we can really do in a scalable fashion. Beyond the obvious limitations in being memory-bound, the Apache/PHP combo has a nasty memory handling model that means that we really pay for our memory handling sins sad

I think this definitely needs to be availabie, even if we strongly discourage its use. In fact, we could just cap the size at something [configurable!] reasonable and tiny if we want.

I'm using it because the portfolio module reads in submitted assignments from the database, and online-type assignments get dumped into files. These are almost always going to be tiny, and this is a one-time occurrence for each submission of an online-type assignment into the portfolio system. I think this is a reasonable use of fopen().

Additionally, we can add another filetype if we like: REPO_TYPEFILEDB. This would be a file stored directly in the database instead of on the filesystem; some callers of the [native, I guess] repository API might find this useful. The downside is that there isn't a real file on the filesystem corresponding to what users see in the repository through Moodle, but the performance should be better. Again, we can cap the size of this at something reasonable and configurable.

  • The repo plugin gives us a url (hive does this)
  • The repo plugin puts a file in coursefiles and tells us the path
  • The repo plugin provides an open filehandle
Just FYI, my APIs typically return file object instances from the file_base and file_native classes I defined. Otherwise they often return nativeIDs of files -- that is, $file->nativeid, assuming it's stored in another database somewhere and has a unique ID on that system.

I suspect the API should provide a reliable means of fetching a real file -- this sounds easy but note that the system serving that URL may be expecting cookies or other credentials that need to be emulated by Moodle.

This is something I assumed was a requirement from the beginning. It should be up to the plugin to take care of all of this. I was thinking that there should/could be authentication data cached in $USER so that a user only needs to authenticate to a remote repository once/session.

I'm not sure how much of that was targeted toward the 1.6 API and how much was for 2.0, so I'll stop responding at this point assuming the rest was definitely 1.6. (Because surely file.php will be gone by 2.0, etc.)
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Robert Brenstein -
I am an outsider to all these discussions but the subject of central repository is quite dear to us. One thing which may be implicit but does not seem to be mentioned explicitly (unless I missed it) is the ownership of files. I mean that the repository, while being central, must allow us to set (optional) restrictions for the file access to certain people or courses, so it can be used for files that are truly site-wide but also for files that are available only to specific teachers/courses/student groups; for example, files that a single teacher puts there to use them in multiple courses but wants no other teachers to access to them.
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
I totally think we can fit both types of repository (read-write and read-only) in the one API.
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
non-memory-bound fopens()
Now that the basic functionality is in there (proof-of-concept out of the way), I've been paying more attention to efficiency, and at this point there are two calls to fopen() in repository_native.class.php.
  1. fopen() is called when new_file() or new_resource() is called and the new file's contents are passed in memory.  This has to be written to the disc, so fopen() is called.  (New files can also be added by uploading, or by passing in an absolute FS path.)
  2. When read() is called for a file, if the $return_handle parameter is set to 'true' (which is the default) then fopen() is called for the specified file and the handle is returned.  (This parameter has been added since I posted the APIs above.)  This way callers can go on to call fgetc(), fgetcsv(), or whatever they like.  Of course, we can't stop them from getting the file's path through get_file_path() and calling file_get_contents() themselves if that's what they really want.

In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Peter Campbell -
This is SO exciting! I just poked around in the "File Keeper" at http://portfolio.spdc.org/portfolio/

What's the likelihood that this will be part of 1.6?

Awesome work!
In reply to Peter Campbell

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Thanks - I'm glad you like it.  smile

Definitely not in 1.6, but I'm hoping we can carefully evolve this into something acceptible for 2.0.
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Let me just make it very clear to everyone what we are facing here before this gets too excited and confusing.

For Moodle 1.6 release, a very small modification to rationalise two repository-related things that are already in Moodle 1.6 dev. It's too late to do anything major for this release, especially with many features still incomplete. In fact, if it can't be done by the end of this week then I'm happy to leave it as is.

For Moodle 2.0, a complete and radical new repository API that allows a number of repository plugins available from ALL file points in Moodle (resources, SCORM, assignments, forum attachments etc)
  • read-write access to a DMS (defaulting to the server filesystem) for all users (not just teachers) with Moodle holding only links to that DMS.
  • read-only access (with optional local caching) to external repositories like MERLOT, NLN, eprints etc.
The API is actually very thin, just providing the hooks, really.  Most of the code will be in the plugins.
In reply to Martin Dougiamas

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
> For Moodle 1.6 release, a very small modification to rationalise two repository-related things that are already in Moodle 1.6 dev.

Thanks for the clarification; this is a relief. After the Feb. 3 dev meeting, I had the distinct impression that what MartinD just said was true: Moodle would make a quantum leap forward in DMS for 2.0, but that was quite a ways into the future. So I wrote a bunch of code with the intention that it would 1) meet the immediate needs of my project, 2) move the overall process forward, and 3) have several months to for us all to ruminate on it, test it, and make it more efficient before there would ever be a serious question about making it part of the core.

> The API is actually very thin, just providing the hooks, really. Most of the code will be in the plugins.

Are you thinking that the base would provide some forms (such as file upload, simple searching, maybe browsing) in order to keep everything looking similar, or that each plugin (that supports each of these actions) will completely implement each of these itself?

> read-only access (with optional local caching) to external repositories like MERLOT, NLN, eprints etc.

Interesting; must it be read-only? Why couldn't some of these plugins support writing, as well?
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
> Why couldn't some of these plugins support writing, as well?

Some definitely would, that's why it was the first point of the two I made in the post you were replying to.  big grin
In reply to Martin Dougiamas

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
>> Why couldn't some of these plugins support writing, as well?

> Some definitely would, that's why it was the first point of the two I made in the post you were replying to.  big grin

tongueout OK, to be more specific, I had the impression from your post that external repositories would be read-only, but obviously that's not what you meant.  approve
In reply to Martin Dougiamas

Re: RFC - Remote object repositories -- consolidating implementations

by Dirk Herr-Hoyman -
I've been watching from the sidelines, but did want to chime in just a bit.
This does seem like a reasonable approach to remote repositories.
A little bit in 1.6 and more in 2.0. 2.0 is a *major* release, after all.

Part of my perspective comes from looking quite hard at requirements for
a repository API in the Sakai project (if someone really wants to see them,
I can certainly share). I'm seeing Matt's API hitting those requirements at
a 90% level. I like the 90-10 rule here, don't try for 100% of the requirements
or you'll never get done.

A plugable architecture is the key. This was the conclusion we came to also.

In reply to Martin Dougiamas

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

100% with you on the 1.6 goals. Only that now we have to work fast ... Can "this week" extend a little bit into next one? wink

In terms of achieving...

For Moodle 2.0, a complete and radical new repository API

If the code we are discussing is ready before the freeze, we can mark it clearly as a beta/experimental API. We will surely learn from how repo plugin developers use it, and evolve it into the world-rocking 2.0 API.

In terms of write access, I haven't come up with anything too smart. Jun is looking at porting the Hive plugin to the oslor API, and we definitely have a plan that doesn't break the 'upload to repository' functionality I see it has.

defaulting to the server filesystem

I think we'll be able to

  • keep direct access to dataroot for simplicity & performance
  • be able to designte a repository directory inside dataroot
  • still do all the repository magic

The API is actually very thin, just providing the hooks, really. Most of the code will be in the plugins.

Agreed. Though I am thinking Matt's work will be a boon to people writing those plugins.

Matt? How did that paper go? thoughtful

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
  • keep direct access to dataroot for simplicity & performance
    • Is this desireable?  Do you want anything other than the repository API accessing the filesystem?  Why?  I ask because my understanding was that all access to the filesystem was intended to go through the repository API, and I designed my work with that in mind.
  • be able to designte a repository directory inside dataroot
    • That's what my implementation does.  It creates a 'repository' directory under $CFG->dataroot/SITEID/moddata/.
> Matt? How did that paper go?
Heh; thanks for asking.  I got an extension and will submit via email in a couple of hours.  It'll be alright.  (And best of all, I can really focus on the portfolio project for the next two weeks.)
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

I got an extension and will submit via email in a couple of hours.

Cool! Another chance to procastinate! wink

MartinL: keep direct access to dataroot
for simplicity & performance

Matt: Is this desireable?

Well, I don't claim to speak for MartinD here. As far as I can see, it is desirable for DMSs to be FS-like (or whatever they want to be like) at the UI level.

However, the internal code needs to be able to deal with the filesystem -- the real one -- unmediated or with minimal mediation. We do a lot of stuff that does not make sense to put anywhere but on the filesystem. And we have to keep a tricky balance in terms of scalability: we do have sites with tens of thousands of users, often serving huge files (PDFs, mp3s).

For examples of that, you can see Skodak's work on byteserving. We also used to have a couple of situations where fileserving was memory bound if you had just the "right" php.ini settings, and we had several bugs filed with regards to that. People are serving large files with Moodle -- and it can be quite efficient at it.

OTOH, if a DMS is so FS-like that it makes sense to use it there, chances are that it actually has a kernel module (or Windows driver) that makes it look like a native FS anyway. And few DMSs do that, and Moodle "supports" it like a champ wink

Abstracting FS calls is hard (not news to you) and I suspect it will get us in trouble in the long run. We won't be able to reuse code easily (I mean -- other GPL projects code), and we'll be beset on all sides by the law of leaky abstractions ...

Moodle uses the FS for all is worth... doing things like calling exec('/usr/bin/zip', ... parameters) to zip up course directories of perhaps a couple hundred megs into huge zipfiles. I just don't know how to virtualize at the PHP level the FS that /usr/bin/zip sees sad and it just seems too hard to do right to that level...

... and it is not that meaningful to our end users.

I think we can extend the DMS layer to be FS-like when we are facing users. To a degree, the Hive plugin shows a way of doing it that is quite interesting. This also opens the door to DMSs that are really not like a filesystem but have other strengths and metaphors.

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

I said

... and it is not that meaningful to our end users.

which isn't completely true. It is important to our users. The question I'm trying to figure out is what aspects of having DMS integration are important to our users, and how we can address those while keeping all the advantages of real FS access?

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
However, the internal code needs to be able to deal with the filesystem -- the real one -- unmediated or with minimal mediation. We do a lot of stuff that does not make sense to put anywhere but on the filesystem.

I see your point, but I still have concerns about access controls.  One of the sticky issues with the existing file.php "system" is that it provides only the most rudimentary control over who can access what, and it does this by associating files with particular courses, modules, etc.  If we want to be able to provide rich access controls for these different resources, then even if internal code accesses the filesystem directly it needs to do proper checking (and management) of these access controls first.

In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

I still have concerns about access controls.

Agreed. Right now, Moodle's access control is course-centric and rather basic. There is a course-centric but more flexible access control model already spec'ed and MartinD intends to have it for 2.0 . And from what you say you have one up your sleeve too wink

I don't have an answer there. Just want to point out that the scenario is that we are still managing this from a course perspective 90% of the time, so an effective & efficient access control model should leverage that. And today the AC model does the right thing 90% of the time without asking anything from the user and without additional SQL queries.

There's that 10% of cases where people want exceptions, and shared content, but we have the option of adding support for those, and leave the "natural" cases still be "cheap and simple".

I think your code is good and that what it does is needed but we need to find a way that we can take the benefits without needing to rip FS access away from Moodle.

As MartinD indicates, this is unlikely to be in 1.6 -- we are making a modest shot at the oslor api for 1.6 but what I want to figure out is what is roughly the path forward (post 1.6, whether the oslor api is merged or not) to getting all the goodies your code has, without the complications.

Do you think we can "fake" the FS abstraction part? We can have a convincing user-facing FS abstraction, and module writers can use it at their discretion. What are the parts that bring tangible benefits (as opposed to being the hard work of implementing a FS)?

I suspect we can get moodledata to be some kind of magic overlaid space, where some files are real files on the FS, and others are "there" from the user point of view. Things like backup/restore should become aware of this so they can deal with them by storing the 'pointer/uri'.

Modules are mostly using get_directory_list() which could slip in the overlaid entries. And then we would have to

  • teach file.php that fetches for those overlaid entries are forwarded to the plugin (the plugin could say, hey, here's a locally cached file for this, just use filelib internal methods, and here's the cache lifetime for it, don't bother me for a while, for I am slow and costly to invoke)
  • check the fopen()s that refer to files in coursedata to see how they should be handled. Some of them will really need the file locally. Those should get the plugin to fetch it to the real FS. Others may be serving the file when they could delegate to file.php. And others will want to edit the file and need to check if the file is from a repo whether it can be edited, edit it locally, and then tell the plugin hey, here's the updated file, go put it wherever.

If an access control revamp is part of this, we can also add AC checks in file.php and in some of the fopen() calls.

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -

Abstracting FS calls is hard (not news to you) and I suspect it will get us in trouble in the long run. We won't be able to reuse code easily (I mean -- other GPL projects code), and we'll be beset on all sides by the law of leaky abstractions ...


Actually, I tried to edit my earlier post to say that although I worked on operating systems, I've never worked in filesystems, so no, I've never really spent time thinking about abstracting FS calls. I am spending some now. smile

Thanks for the article link; I hadn't read that one. Of course we need to be aware of how leaky our abstractions are, but if it is clear that what we gain by an abstraction is greater than what we lose or risk, then it seems to me that the abstraction will benefit us and we should pursue it. In the same way, if giving up copy 'n paste code reusability (from other GPL products) benefits us sufficiently, it may be worth it. So now we need to figure out if this is the case in this instance.

Moodle uses the FS for all is worth... doing things like calling exec('/usr/bin/zip', ... parameters) to zip up course directories of perhaps a couple hundred megs into huge zipfiles. I just don't know how to virtualize at the PHP level the FS that /usr/bin/zip sees sad and it just seems too hard to do right to that level...

Right; the repository code would need to provide a zip() API that take the appropriate arguments and then, in its turn, calls exec('/usr/bin/zip', ... parameters). I guess I don't see the problem with that part.

But let's suppose that the school has a shared folder (like /tmp) where everybody can put stuff. The Biology class creates a class folder in there, and only the teachers and students enrolled in the course have access to it. Suppose that we have a year's worth of files in there, and a student in the course has access to everything except other groups' folders from a group project. Can that student create an archive of everything under the class fodler, or only the things to which she has access? If we directly call exec('/usr/bin/zip', ... parameters) on the FS then the access controls are ignored, but if we check all the access on everything under the class folder first, we take a performance hit.

Here are some thoughts I have about this problem:

  • The access_control module already understands that admins "own" everything (unless you pass a parameter that indicates that admins should not be treated as universal owners), but we could easily add a mechanism (one parameter and <10 lines of code) by which admins get short-circuit approval of all access. In fact, we could make this mechanism work for anyone if necessary, requiring that the calling context will then have performed sufficient access checking itself. (That is, if the current user is a teacher in the course and we're zipping up the course files, just pass in $short_circuit = true or whatever to the repository APIs you use, and the access_control APIs will then always short-circuit approve access, and the repository will not check for access to everything in the tree, etc.)
  • The access_control class can be made vastly more efficient, using the database to do what PHP loops and function calls are doing now.
  • The real problem here is that [in my opinion] we should provide rich access controls about which the native FS will be ignorant, and so internal Moodle code will either need to use the repository APIs which do all the appropriate checking, or do sufficient checking before accessing the FS directly.
I should also mention that although right now my repository code creates a folder under $CFG->dataroot/SITEID/moddata/, I always intended it to handle the entire structure under dataroot. Assuming that we don't limit ourselves to one instance of the "native" repository, we might then have $CFG->dataroot/repository0, $CFG->dataroot/repository1, etc., and under the "primary" one of those we would see the structure that currently exists directly under dataroot now:
$CFG->dataroot/repository0/1/moddata
$CFG->dataroot/repository0/1/moddata/assignment
$CFG->dataroot/repository0/1/moddata/assignment/1
$CFG->dataroot/repository0/1/moddata/assignment/1/7
$CFG->dataroot/repository0/1/moddata/assignment/1/7/uploadedfile.txt

Then internal code can just use the repository APIs to do everything: add, move, copy, link to (on the FS), zip, link to (URL), etc.

I didn't post more than the most basic parts of the API above, but the native repository plugin I wrote has a much, much more extensive set of interfaces. I'll probably post some more about them now since it seems relevant.
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Martín Langhoff -

Hmmm. My mention of zip wasn't meant literally. I mean, we do exec() it, but we also exec() several binaries, often to operate on files that are expected to be in moodledata.

One of the great things of Moodle is that if you are writing a module, a block, a filter, a (non-repo)plugin things are really easy. You have to comply with a few API calls, and from there onwards you are free to roll things your own way.

Of course, if you are targetting inclusion in core moodle you have to mind input validation and moodle-style, but losing the ability to deal with real files is a bit of a setback.

And not, for a bit of an offtopic ramble ...

In terms of Moodle's internal API evolution, I think it's a lot more reasonable to try an approach where if you don't use NewMoodleFacilityFoo you don't get the CoolNewFeatures but you are still alright. So my thinking of the repository API is not a mandatory one, but an advantageous one.

Then a few module authors can take the plunge, and if the CoolNewFeatures are truly worth it, word spreads and all modules are quickly converted. A mandatory thing means we force a lot of work on people -- or at least a huge patch to the existing codebase (I am thinking of the patch to existing modules/plugins/blocks/etc, not your additional code).

If the new core feature needs some work, those early implementors are going to help (if they are early adopters, you have their goodwill and motivation). And you can try again adding benefits and removing the awkward bits until it's the all singing/dancing/shining.

So -- we want it all, spoiled little brats we are. And we want it still easy and natural and not meaning a big huge patch that has to be taken in one go (of course -- new code comes in a big patch, but a big gnarly patch against the existing codebase tends to lead to trouble).

And I guess the path will emerge if we keep talking about it wink

In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
I'm taking a break from this thread so I can focus intently on the portfolio work for the next couple of weeks.  But I'm still reading the posts when I get them in email, and I'll be back as soon as I can spare the cyles.
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Heh; I took your mention of zip() literally in part because I'd been planning to work on adding that to the Moodle Native plugin soon.  smile

In terms of Moodle's internal API evolution, I think it's a lot more reasonable to try an approach where if you don't use NewMoodleFacilityFoo you don't get the CoolNewFeatures but you are still alright. So my thinking of the repository API is not a mandatory one, but an advantageous one.
I think I've been unclear about how I see the road ahead.  When I talk about "replacing all the files.php stuff", I'm not thinking about something that will happen next month, next year, or even, perchance, the year after that.  I'm thinking about something that would start in Moodle core and spread in much the way you're discussing.  The reason my discussion may seem to refer to a more "forced" change is that I've primarily had Moodle core stuff (courses, assignments, Site files) in mind, and presumeably the maintainers of that code can be talked into using the new API, whatever it turns out to be.  smile

I have no intention to make the lives of module maintainers difficult.
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
Here's more info about the APIs I've implemented in the Moodle Native repository plugin, which I was intending to become a sufficient replacement for "Site files" and all the file.php stuff.

For the most part this is not carefully ordered.
  • change_access($ac='', $specify=true, $hiddens='', $recursive=true)
    • Used to specify or unspecify an access control, which may be a grant or a denial.
    • $ac: an access_control object or an assoc. array that defines one
    • $specify: whether we're specifying or unspecifying a control
    • $recursive: whether to recurse down through folders
  • set_owner($resourceid=0, $ownerid=0, $owner_type=0, $safe=false)
    • $resourceid: ID of the resource whose ownership is being set
    • $ownerid: ID of the new owner
    • $owner_type: is the owner a user, a group, or a course?
    • $safe: if true do NOT perform access checking
  • set_access_a_like_b($aid=0, $bid=0, $safe=false, $setowner=true, $setaccess=true, $yesno=-1)
    • $aid: ID of the item whose access we're setting
    • $bid: ID of the item whose access we're copying
    • $safe: same as above
    • $setowner: set the ownership of a like b
    • $setaccess: set the access of a like b
    • $yesno: if 0 or 1, set only access grants or denials, else set all
      • I added this because I wanted new files to inherit the access controls of the parent directory, but users are denied 'delete' and 'rename' access to their home folders, so all the files they created inherited these denials. Now I can specify '1' here when the users home folder is the parent directory, and only granted access controls are inherited in that case.
  • versions_index($path='', $resourceid=0, $return=false)
    • Create a table of all the previous versions of a specified resource.
    • $path: path to the resource (takes precedence over ID)
    • $resourceid: ID of the resource (precedence given to $path)
    • $return: whether to return the table instead of printing it now
  • delete_resource($path='', $fileid=0)
    • Heh, I forgot to list this earlier even though it's basic.
    • This is assumed to be recursive -- if $path or $fileid specifies a folder, that folder and everything under it will be affected.
    • The Moodle Native plugin has a "trashcan" into which all deleted resources go, so this routine actually only moves all the resources and sets their 'trashed' field in the DB.
    • $path & $fileid behave like all the other APIs
  • empty_trash()
    • Implemented but not accessible from the UI yet.
  • find($path='', $fileid=0, $ids='', $follow_links=false)
    • Recursively 'find' all the files/folders/etc. under the specified file/folder.
    • $path & $fileid -- typical
    • $ids: used by recursive calls to construct the list of IDs and protect against loops
    • $follow_links: I haven't yet implemented file linking, but this will be useful once I have. smile
  • rename_resource($path='', $fileid=0, $new_file_name='', $interactive=false)
    • $path & $fileid -- typical
    • $new_file_name: string
    • $interactive: whether to display errors, etc.
  • does_name_collide($path='', $folderid=0, $new_file_name='', $interactive=false)
    • Check for a name collision in the specified folder.
    • $path & $folderid -- typical
    • $new_file_name: string
    • $interactive -- typical
  • copy_a_to_b($spath='', $sresourceid='', $dpath='', $dresourceid='', $newname='', $recursive=true, $interactive=false)
    • Copy resource 'a' to location 'b'.
    • $spath: source resource path (takes precedence)
    • $sresourceid: source resource ID ($spath takes precedence)
    • $dpath: destination resource path (takes precedence)
    • $dresourceid: destination resource ID ($dpath takes precedence)
    • $newname: string, new resource name, if any
    • $recursive: recurse through folders, etc.
  • change_owner_form($path='', $fileids=array(), $recurse=true)
    • Present the user with a form to change the owner of one or more resources.
    • $path: path to a single resource (takes precedence)
    • $fileids: array of one or more resource IDs ($path takes precedence)
    • $recurse: recurse down through folders
  • change_access_form($path='', $fileids=array(), $access_type='')
    • Present the user with a form to change the access granted or denied to the specified resource.
    • $path: path to resource (takes precedence)
    • $fileids: array of one or more resource IDs ($path takes precedence)
    • $access_type: REPO_ACTION_SHARE, ...SHARER (recursive), ...PUBLISH, ...PUBLISHR
  • can_i_access($accesstype=0, $nativeid=0)
    • Test whether the current user has the specified access to the specified resource.
    • $accesstype: REPO_CHMOD, ...CHOWN ...COPY ...DELETE ...LINK ...MOVE ...READ ...RENAME ...WRITE ...etc.
    • $nativeid: ID of the resource. (Hmm. Why isn't this taking in a $path as well?)
  • change_owner($path='', $fileid=0, $ownerid=0, $ownertype=0, $recursive=true)
    • $path & $fileid -- typical
    • $ownerid: same as above
    • $ownertype: same as above - can be a user, a group, or a course (or whatever else the access_control module supports)
  • download_file($path='', $fileid=0)
    • This replaces file.php. It performs all the necessary access checking and then sends the file if everything's legit.
  • find_user_shares($access_type=0, $userid=0, $how='')
    • Find all the resources shared to the specified user with the specified access.
    • $access_type: only REPO_READ, REPO_WRITE are in the switch(){} ATM, because I didn't think other types would be especially useful.
    • $userid: obvious
    • $how: user, group, course. This lets us find only access granted by virtue of my userid, my group memberships, or my course enrollments, if that's what we want to find.
  • user_shares_index()
    • Present an index table of all the resources shared to a user.
  • delete_access($ac='', $ids='')
    • Delete all access_control records (but not ownership) for a specified(set of) resource(s).
    • $ac: access_control object
    • $ids: list of resource IDs
  • search($search_criteria='', $exclusive=true)
    • I should change this to support the simple search strings you're doing in OSLOR.
That's it ATM. I have dummies in there for things like move_a_to_b(), link_a_to_b() and lock(), but those are low-priority for me so I'm leaving them alone for now.
In reply to Matt Oquist

Re: RFC - Remote object repositories -- consolidating implementations

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
Hi!

I have been reading this thread since the very start and learned a lot from it. Most of all I liked Eloy's "it must be simple, functional, robust and stable along Moodle evolution". My addition would be: Do not force it on users that do not want/need it.

Technically we can not make perfect repository design and make it mandatory for all modules. I guess we should come up with some smooth transitional stage:
  • separate file.php (Document Management System area) and modfile.php (module file serving area) - more about it here
  • make some basic repository infrastructure that modules can use - repositories would be module specific; teacher would construct course from his own activities and others downloaded from repositories; first resources, IMS, scorm, then quiz, glossary,...
  • allow easy export of activities into repository
  • move to course level - please do not ask me how wink
Document Management System is about individual files stored in coursefile area. Repository is IMO something completely different - it is about distribution of content packages, repository controls access on package level only, the rest should be done by module itself.

I think our repository server should be dedicated, I do not think it should be ever part of standard Moodle installations - remember we have all those low requirements for PHP versions, its extensions, outdated databases, etc. This would help repository security tremendously. Does it have to be PHP based anyway?

I guess the integration of other repository clients should be plugin based - we would have one native Moodle client plugin for each activity and several others for 3rd party repository servers (IMS, Scorm and the like).

Those are just some of my ideas, they are definitely not perfect.

skodak
In reply to Martín Langhoff

Re: RFC - Remote object repositories -- consolidating implementations

by Jun Yamog -
Hi,

Please excuse my post if it does not make sense, I am relatively new to moodle.

From what I understand each of us is making/solving a different problem space but are using the same term - repository.

Matt's repository is about a central repository that moodle can use, very much like a file manager.  So it deals with access, organizing objects, etc.  While Eloy's IMS CP is about getting an IMS CP and deploying the package.  It is not a remote repository, but more of a user/consumer of a repository (remote or local).  The oslor repository is about finding and getting objects from a remote repository, which is similar what the current hive/(resources/type/repository) is doing.

The way I understand it, there is no consolidation work that needs to be done other than hive/(resources/type/repository) and the oslor repository.  These are the 2 things that I see that deals with remote objects, rather than managing the objects in moodle.  Matt's repository is a complementary part, wherein it manages the objects once its been fetched/linked from a remote repository.

Does this make sense?
In reply to Jun Yamog

Re: RFC - Remote object repositories -- consolidating implementations

by Jun Yamog -
Hi,

Anyone here can help me with the hive integration?  So far it is working but not consistent.  I am able to get hive objects and have "link to a file or website", "ims cp" to work.  I am using hive.moodle.com as a test server since I don't have a local hive instance.  Most of the time at the popup of searching or browsing hive, I get a javascript error "the tree node is not set up correctly. treemode=".  Authentication is ok, since sso_user_login returns true.  I can also view the resource that is linked to hive with no problem.  I have also tried deleting cookies, using a different browser, etc.

Anyone have any ideas what is going on?  Thanks very much.
In reply to Jun Yamog

Re: RFC - Remote object repositories -- consolidating implementations

by colin alteveer -
Ever get that Hive integration working, Jun? I have managed to get beta version of 1.6 to browse and return a URL, but no loading, no joy. Also, this URL returning, remote content situation brings up and issue of cross-domain scripting with SCOs. I'm not sure if i am hijacking evil, because it would be part a plugin's entrails, but does anyone have any thoughts on the issue of tying distant SCORM APIs to Moodle via a Hive /repository/ plugin (or however it gets implemented)? I have personally been leaning towards backend fetching/caching...
In reply to colin alteveer

Re: RFC - Remote object repositories -- consolidating implementations

by Jun Yamog -
Hi Colin,

We didn't get the integration with hive as much as we liked.  Currently hive doesn't have web services yet, so our integration is just the same what we have with moodle already.  Look at the link below:

http://test.moodle.com/course/view.php?id=4
In reply to Jun Yamog

Re: RFC - Remote object repositories -- consolidating implementations

by colin alteveer -
this looks like pretty much what i have. is there a username/password that works with SSO? and have you actually gotten it to load objects into Hive?
In reply to Jun Yamog

Re: RFC - Remote object repositories -- consolidating implementations

by colin alteveer -
Jun (and Martin(s)?),

  I have been on the horn with Harvest Road, and they provided me with their own portion of the mod/resource/type/repository/hive/ stuff, including browse and load templates, though i have yet, as i said, to get the load working. Is this at all inline with/part of moodle 1.6? Has there been any communication here between HR and Moodle?

   Colin
In reply to Jun Yamog

Re: RFC - Remote object repositories -- consolidating implementations

by Matt Oquist -
I am relatively new to moodle.
Welcome!  I am too, actually.  smile

The way I understand it, there is no consolidation work that needs to be done other than hive/(resources/type/repository) and the oslor repository.  These are the 2 things that I see that deals with remote objects, rather than managing the objects in moodle.  Matt's repository is a complementary part, wherein it manages the objects once its been fetched/linked from a remote repository.
Does this make sense?
Yes, this makes sense, but I disagree.  The repository work I've done is pluggable, so that plugins for remote repositories can be written to use the same core APIs as the local plugin ("Moodle Native") that I've already written.  My intention was that when you click on the "File keeper" block (or go to http://yourmoodle.com/repository/) you'll be presented with a list of all the repositories your Moodle has connected to, and one of those will be "Moodle Native".  Then you choose a repository and are presented with whatever interface (searching or browsing, probably) that repository, whether local or remote, provides.  If the repository is remote and inaccessible, you'll get a message stating so.

The main idea is to provide a common way for all of Moodle to deal with file-like resources, so the course code and the assignment module and the blogs and the forums, etc., don't need to know whether they're dealing with Hive, OSLOR, something local, something remote, a URL, a file, an archive, a folder, etc.  They can just call the common APIs and pass around the data those APIs use without needing to think/code very much about it.