We are designing a massive distributed system with multiple (several hundred) Moodle servers sharing data in real-time. This is not a standard client-server model due to limited GPRS connectivity. It is for a non-profit national rollout of free HIV-education to clinics and hospitals in South Africa. There is a potential user-base of more than 10 000 professional nurses.
Our model will involve a central Moodle system with a master user-authentication table and custom tables for reporting across all the distributed systems. The clinic-based Moodles will connect to the master system for authentication and the sending of assessment completion data.
Has anyone attempted anything like this before?
Would any of the main Moodle developers be interested in discussing the implementation directly with us?
Many Thanks,
Craig.
Regards,
Craig
Correct me if I'm wrong, but the structure is going to be just like having hundreds of Moodle installations, but each of these installations will be communicating back to a central server to exchange user info, assessment data, log data, etc. Presumably things like password changes will need to be able to be pushed in both directions? And things like new course material will need to be pushed out from the central server to the others?
If the connection between the central and satellite servers is not reliable then you won't really be sharing data in "real time", but presumably doing things like attempting a batch update every 6 hours? The satellites wouldn't authenticate against the central server whenever the user logs in. Or perhaps I've misunderstood.
I did a little bit of HIV-related work myself - http://hiv.sourceforge.net/ - it may not be useful to you since the computer requirements might not fit your scenario, but I thought I'd mention it
The system is a little complicated and relies on asymmetrical communication between the master-system and the clinic-based systems (which are as you describe 'hundreds of Moodle installations').
There is a satellite downstream to the clinics that allows us to forward-load large amounts of content to the clinic. So we'll keep the content synchronised in that way.
In terms of user data, there is an initial authentication over GPRS that will check if the nurse is registered in the national database. After that they can login freely to the clinic-system. On completion of a content-module or assessment, data needs to be sent upstream to the master-system (which will handle the printing of certificates for the national 'professional development' programme). If this IP-over-GPRS communication fails, there will be a backup OS-based applet to handle the batch push to the master.
Ultimately, we want the master system to be able to report in real-time across all the clinics to check all user progress whether modules have been completed or not.
The communication-layer will be prototyped over the next two months in a pilot implementation. Based on what we learn, we will redevelop and/or adjust requirements for the full-scale rollout. It's going to be an exciting few months of development. I know that Martin is looking at some sort of community architecture for Moodle 2.0 or greater and any lessons learnt will be distributed accordingly.
Craig.
P.S. Your 3D-model looks very interesting but sadly the download of JRE files kept failing. I'll try it again later and if it can work on the clinics' Debian platform, I'd love to include it the 'toolkit' area of useful standalone content.
It's an interesting project you seem to have there. We had some discussions with Spanish programmers about a similar concept over in the last MoodleMoot Spain.
One of my side activities (hobby?) is hacking on a project (called "git") developed by Linus Torvald that actually has an underlying distributed database / distributed filesystem. So needless to say, I find the concept of a distributed Moodle intriguing.
I am not sure it could be made to work "transparently" as you propose. Applications that work with a "local" RDBMS rely pretty heavily on certain things that are just not true in distributed environments. Midgard CMS, for instance, has strong support for replication (look at the repligard infrastructure). Something like that could be used in a "star" fashion to achieve what you describe. The overhead would be significant, however. And it's a quite a bit of work to implement too.
Another model is the 'git' model, which is to make things content-addressable. This is possibly more flexible and portable than the repligard strategy, but still quite a bit of work.
Do you think there are "usage" models that would allow you to meet your goals without true full blown replication? Real star-shape replication will need major brain surgery in Moodle (as it would in any other web app that expects to have the DB right there, and under its full control).
To reap the true benefits of Moodle you want to /not/ need to change it /that/ much. There are several ways you could "fake" the replication so that it is not truly star-shaped -- so you can get away with a small Moodle customization that lets you push content out, and pull "activity" records in.
Can you tell us more of your use cases?
Craig
HIV is by far the more important issue but, I think that this will be of great using Language teaching where the benefits of international cooperation accross moodle installations.
In the world of langaguage teaching, it would be helpful to have grades, scales, GUI languages, and user-information to be stored locally, but the data of some activities (such as forums) to be shared (cross viewable)?
Tim
Yes, very interesting.
I'd very much like to see any work on synchronisation of data between Moodle's be done in a generic way that we can include in the main distribution. I'm thinking of a single script that calls subscripts within each module.
But firstly we need to know, what does "rollout of HIV education" mean, exactly? It sounds from your post that you mean to use Resources and Quizzes only. Is that correct? If so that makes your job easier, as some of the other modules could be tricky to work out (eg forums, wiki).
I'm guessing you need to synchronise updates in resources and quizzes from master to clinics, and synchronise quiz responses from clinics to master.
My first ideas for an approach would be to deal with this on a course level, creating one course per clinic. The master server will have many courses (one for each clinic), and each clinic server will have only one course. If all the IDs are kept the same on each server then that makes things a lot easier - avoid all renumbering.
When connected, a custom script on the clinic (with unique course ID "X") could connect to the master server and:
- pull the moodledata/X directory exactly (with rsync).
- pull the course_* and resource* tables exactly
- pull the non-attempt-based quiz tables exactly
- push the attempt-based quiz tables exactly
Live user authentication the first time and non-live thereafter can be fairly simply arranged by writing a custom auth module.

To clarify various bits and pieces...
For the initial phases, the educational material will consist of Scorms and Quizzes only. Later on, I'd love to see a situation where the nurses get to work on the hugely powerful collaboration tools of forums and wikis, and that's where the replication gets real tricky. But by then we should have a clear foundation that can be built on.
The aim right now is to come up with the least invasive model which requires minimal admin, and will allow for easy Moodle upgrades.
In terms of use cases and requirements, what I've described so far in the previous posts is pretty much the complete set for this pilot implementation.
The requirements for the distributed model are:
- Authentication on first-time login
- Flagging completion data of quizzes (e.g. score, time spent.) Learner responses are not required in master-system.
- Real-time reporting across the distribution.
Based on how this evolves, the requirements will be remodelled and added to appropriately.
Martin L's suggestions for replication (Git and Repligard) have a lot of merit, but as he says, will require significant modification.
My feeling is that we will probably be able to work without full-blown replication. So far the ideas we've discussed around replication involve individual tables and custom schemas. The concern is that when we scale up the number of clinics to several hundred, we have the problem of needing a one-to-one between the tracking tables and clinics. This could cause a DB admin problem from within the master system.
I'm really liking your idea of a single, unique course-per clinic. Until now I've been looking at the whole system from the perspective of having one course per content-topic (of which there are approx 25 e.g. Antiretroviral Therapy, HIV and TB, Voluntary Counselling and Testing, etc.) That would translate to 25 discrete courses sitting on any given clinic-Moodle. However, if every content-topic is implemented as a 'Topic' within a single course structure, that allows us to use such functionality as the course backup-and-restore to do quick and easy batch loads of all a clinic's course data.
We are going to develop the prototype over the next two weeks.
Please keep the ideas coming and keep up the great work on the system - it really is appreciated.
Craig.
Yes, doing some coarse-grained push/pull would work... here's a simple model...
- There is a master server, but no "master moodle" setup
- In the master server, have one database per "slave" Moodle install. Let's call that a "slave copy on master".
- Have custom scripts on 'master' that give you an overview of the many slaves, reading the slave copies on master.
- Slave copies on master are read-only!
- Create content on a dedicated "content creation" moodle setup, export as backup, push and restore on each remote slave -- possibly via an automated script.
- "Pull" the database from the remote slave servers to their copy on the master server. Full copies won't scale very well, so add a "last updated" column to each table, and triggers to "touch" the field on insert/update. You'll need additional handling of deletes via a trigger too. The script that pulls can now fetch only updates. (BTW, what's the bandwidth you do get?)
- Database upgrades happen on the slave servers, followed by a "full database" pull.
This way, you avoid the issue of table id conflicts, and don't need to change Moodle at all...
Actual GPRS bandwidth has never been clocked by the people responsible for the comms layer. No documentation either, which is equally unhelpful. The only info so far is qualitative, like it's 'incredibly unstable'! There is no knowledge of the maximum actual file size that can reliably get through. I think it's fair to say it's somewhere between a 28k and 56k modem that drops connections anything from every 5 minutes to 4 hours. We will have do our own testing of throughput and latency when we get access to the servers.
* use both incremental and full updating on the db/content pulls.
do incremental (diffs since last pull) for the most part, this will send
less traffic over the wire. have full update available too, just in case
you want to be absolutely sure you are in sync.
* consider using a transfer mechanism that's inherently capable of doing
incremental updates. rsync or svn come to mind. perhaps look at the
edukalibre project, which is doing something like this with svn.
this will work better with the files in moodledata than with the db.
* perhaps a serialized copy of the db, one that goes into multiple files based
on tables, would allow for updates to be sent by file diffs, rather than db triggers.
would need to then deserialize to go back into the master db.
Trying to use Moodle as is, and augmenting for the distribution, that seems
most sensible.
You mentioned the need to service 10,000 professional nurses, we have reports of userbases of more than 10,000 already. The "Servers and Performance" forum has a few posts on managing large scale installations.
Here's one such post: http://moodle.org/mod/forum/discuss.php?d=13658&parent=76119
Is there something about the userbase / course content that would make this situation different than the existing larger installations?
Not that a Massively Distrubited Architecture wouldn't be "cool", but for a non-profit organization is the maintenance of such a beast going to be "cost-effective"?
This discussion sounds vaguely familiar to an old discussion regarding students being able to install a "version" of Moodle locally and being able to work on the coursework without being connected to the Internet all the time. At some point the student would be able to connect to the Internet and submit their work.
I'll have to dig through the forums and see whatever happened to that.
Hi Craig,
I recently started a project decentralizing a Moodle master server to gain consistent accessibility at remote nodes, aiming at an improvement of course quality at Open University of Tanzania. This thread pops up at first when I google around and I think we do share some goals and needs. Since this thread is a bit old, I'm wondering how this project turned out to be and whether you have some documents or experience that you could share?
Thanks in advance!
/guoger