Massive Distributed Architecture

Massive Distributed Architecture

Craig Meltzer發表於
Number of replies: 15
Hello all,

We are designing a massive distributed system with multiple (several hundred) Moodle servers sharing data in real-time. This is not a standard client-server model due to limited GPRS connectivity. It is for a non-profit national rollout of free HIV-education to clinics and hospitals in South Africa. There is a potential user-base of more than 10 000 professional nurses.

Our model will involve a central Moodle system with a master user-authentication table and custom tables for reporting across all the distributed systems. The clinic-based Moodles will connect to the master system for authentication and the sending of assessment completion data.

Has anyone attempted anything like this before?

Would any of the main Moodle developers be interested in discussing the implementation directly with us?

Many Thanks,
Craig.
評比平均分數: -
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Anthony Borrow發表於
Core developers的相片 Plugin developers的相片 Testers的相片
Craig - I find this kind of use of Moodle very exciting. I would hope that somehow there might be some grants to help fund your not-for-profit effort and work. I like the idea of using open source software especially one like Moodle which focuses on a social constructionist approach to dealing with challenges to respond to real social issues of importance. Best of luck in your endeavors! Peace - Anthony
In reply to Anthony Borrow

Re: Massive Distributed Architecture

Craig Meltzer發表於
Thanks for the good wishes, Anthony. This is partly funded by government and partly by international donors. It is a hugely important project that will prototype a  model which can be applied throughout Africa. And it can only really be done with a system such as Moodle as the core.

Regards,
Craig
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Dan Stowell發表於
Craig - I can't offer any directly helpful information but this is a really exciting project. Can I ask a question or two?

Correct me if I'm wrong, but the structure is going to be just like having hundreds of Moodle installations, but each of these installations will be communicating back to a central server to exchange user info, assessment data, log data, etc. Presumably things like password changes will need to be able to be pushed in both directions? And things like new course material will need to be pushed out from the central server to the others?

If the connection between the central and satellite servers is not reliable then you won't really be sharing data in "real time", but presumably doing things like attempting a batch update every 6 hours? The satellites wouldn't authenticate against the central server whenever the user logs in. Or perhaps I've misunderstood.

I did a little bit of HIV-related work myself - http://hiv.sourceforge.net/ - it may not be useful to you since the computer requirements might not fit your scenario, but I thought I'd mention it 微笑
In reply to Dan Stowell

Re: Massive Distributed Architecture

Craig Meltzer發表於
Thanks Dan.

The system is a little complicated and relies on asymmetrical communication between the master-system and the clinic-based systems (which are as you describe 'hundreds of Moodle installations').

There is a satellite downstream to the clinics that allows us to forward-load large amounts of content to the clinic. So we'll keep the content synchronised in that way.

In terms of user data, there is an initial authentication over GPRS that will check if the nurse is registered in the national database. After that they can login freely to the clinic-system. On completion of a content-module or assessment, data needs to be sent upstream to the master-system (which will handle the printing of certificates for the national 'professional development' programme). If this IP-over-GPRS communication fails, there will be a backup OS-based applet to handle the batch push to the master.

Ultimately, we want the master system to be able to report in real-time across all the clinics to check all user progress whether modules have been completed or not.

The communication-layer will be prototyped over the next two months in a pilot implementation. Based on what we learn, we will redevelop and/or adjust requirements for the full-scale rollout. It's going to be an exciting few months of development. I know that Martin is looking at some sort of community architecture for Moodle 2.0 or greater and any lessons learnt will be distributed accordingly.

Craig.


P.S. Your 3D-model looks very interesting but sadly the download of JRE files kept failing. I'll try it again later and if it can work on the clinics' Debian platform, I'd love to include it the 'toolkit' area of useful standalone content.
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Martín Langhoff發表於
Hi Craig!

It's an interesting project you seem to have there. We had some discussions with Spanish programmers about a similar concept over in the last MoodleMoot Spain.

One of my side activities (hobby?) is hacking on a project (called "git") developed by Linus Torvald that actually has an underlying distributed database / distributed filesystem. So needless to say, I find the concept of a distributed Moodle intriguing.

I am not sure it could be made to work "transparently" as you propose. Applications that work with a "local" RDBMS rely pretty heavily on certain things that are just not true in distributed environments. Midgard CMS, for instance, has strong support for replication (look at the repligard infrastructure). Something like that could be used in a "star" fashion to achieve what you describe. The overhead would be significant, however. And it's a quite a bit of work to implement too.

Another model is the 'git' model, which is to make things content-addressable. This is possibly more flexible and portable than the repligard strategy, but still quite a bit of work.

Do you think there are "usage" models that would allow you to meet your goals without true full blown replication? Real star-shape replication will need major brain surgery in Moodle (as it would in any other web app that expects to have the DB right there, and under its full control).

To reap the true benefits of Moodle you want to /not/ need to change it /that/ much. There are several ways you could "fake" the replication so that it is not truly star-shaped -- so you can get away with a small Moodle customization that lets you push content out, and pull "activity" records in.

Can you tell us more of your use cases?
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Timothy Takemoto發表於

Craig
 HIV is by far the more important issue but, I think that this will be of great using Language teaching where the benefits of international cooperation accross moodle installations.
 In the world of langaguage teaching, it would be helpful to have grades, scales, GUI languages, and user-information to be stored locally, but the data of some activities (such as forums) to be shared (cross viewable)?

Tim
 

In reply to Craig Meltzer

Re: Massive Distributed Architecture

Martin Dougiamas發表於
Core developers的相片 Documentation writers的相片 Moodle HQ的相片 Particularly helpful Moodlers的相片 Plugin developers的相片 Testers的相片
Hi, Craig.

Yes, very interesting.

I'd very much like to see any work on synchronisation of data between Moodle's be done in a generic way that we can include in the main distribution. I'm thinking of a single script that calls subscripts within each module.

But firstly we need to know, what does "rollout of HIV education" mean, exactly? It sounds from your post that you mean to use Resources and Quizzes only. Is that correct? If so that makes your job easier, as some of the other modules could be tricky to work out (eg forums, wiki).

I'm guessing you need to synchronise updates in resources and quizzes from master to clinics, and synchronise quiz responses from clinics to master.

My first ideas for an approach would be to deal with this on a course level, creating one course per clinic. The master server will have many courses (one for each clinic), and each clinic server will have only one course. If all the IDs are kept the same on each server then that makes things a lot easier - avoid all renumbering.

When connected, a custom script on the clinic (with unique course ID "X") could connect to the master server and:
  • pull the moodledata/X directory exactly (with rsync).
  • pull the course_* and resource* tables exactly
  • pull the non-attempt-based quiz tables exactly
  • push the attempt-based quiz tables exactly
That would get you most of the way. 大笑

Live user authentication the first time and non-live thereafter can be fairly simply arranged by writing a custom auth module.
In reply to Martin Dougiamas

Re: Massive Distributed Architecture

Craig Meltzer發表於
Hey Martin D. (killer app btw! approve)

To clarify various bits and pieces...
For the initial phases, the educational material will consist of Scorms and Quizzes only. Later on, I'd love to see a situation where the nurses get to work on the hugely powerful collaboration tools of forums and wikis, and that's where the replication gets real tricky. But by then we should have a clear foundation that can be built on.

The aim right now is to come up with the least invasive model which requires minimal admin, and will allow for easy Moodle upgrades.

In terms of use cases and requirements, what I've described so far in the previous posts is pretty much the complete set for this pilot implementation.
The requirements for the distributed model are:
  • Authentication on first-time login
  • Flagging completion data of quizzes (e.g. score, time spent.) Learner responses are not required in master-system.
  • Real-time reporting across the distribution.

Based on how this evolves, the requirements will be remodelled and added to appropriately.


Martin L's suggestions for replication (Git and Repligard) have a lot of merit, but as he says, will require significant modification.

My feeling is that we will probably be able to work without full-blown replication. So far the ideas we've discussed around replication involve individual tables and custom schemas. The concern is that when we scale up the number of clinics to several hundred, we have the problem of needing a one-to-one between the tracking tables and clinics. This could cause a DB admin problem from within the master system.

I'm really liking your idea of a single, unique course-per clinic. Until now I've been looking at the whole system from the perspective of having one course per content-topic (of which there are approx 25 e.g. Antiretroviral Therapy, HIV and TB, Voluntary Counselling and Testing, etc.) That would translate to 25 discrete courses sitting on any given clinic-Moodle. However, if every content-topic is implemented as a 'Topic' within a single course structure, that allows us to use such functionality as the course backup-and-restore to do quick and easy batch loads of all a clinic's course data.

We are going to develop the prototype over the next two weeks.

Please keep the ideas coming and keep up the great work on the system - it really is appreciated.

Craig.
In reply to Martin Dougiamas

Re: Massive Distributed Architecture

Martín Langhoff發表於

Yes, doing some coarse-grained push/pull would work... here's a simple model...

  • There is a master server, but no "master moodle" setup
  • In the master server, have one database per "slave" Moodle install. Let's call that a "slave copy on master".
  • Have custom scripts on 'master' that give you an overview of the many slaves, reading the slave copies on master.
  • Slave copies on master are read-only!
  • Create content on a dedicated "content creation" moodle setup, export as backup, push and restore on each remote slave -- possibly via an automated script.
  • "Pull" the database from the remote slave servers to their copy on the master server. Full copies won't scale very well, so add a "last updated" column to each table, and triggers to "touch" the field on insert/update. You'll need additional handling of deletes via a trigger too. The script that pulls can now fetch only updates. (BTW, what's the bandwidth you do get?)
  • Database upgrades happen on the slave servers, followed by a "full database" pull.

This way, you avoid the issue of table id conflicts, and don't need to change Moodle at all...

In reply to Martín Langhoff

Re: Massive Distributed Architecture

Craig Meltzer發表於
Cool, thanks. That's sounding very appealing. We still have a situation with one-database-per-clinic, but that is a reasonable tradeoff for all the other benefits of your model, such as elimination of Moodle modifications.

Actual GPRS bandwidth has never been clocked by the people responsible for the comms layer. No documentation either, which is equally unhelpful. The only info so far is qualitative, like it's 'incredibly unstable'! There is no knowledge of the maximum actual file size that can reliably get through. I think it's fair to say it's somewhere between a 28k and 56k modem that drops connections anything from every 5 minutes to 4 hours. We will have do our own testing of throughput and latency when we get access to the servers.
In reply to Martín Langhoff

Re: Massive Distributed Architecture

Dirk Herr-Hoyman發表於
I'd also suggest

* use both incremental and full updating on the db/content pulls.
do incremental (diffs since last pull) for the most part, this will send
less traffic over the wire. have full update available too, just in case
you want to be absolutely sure you are in sync.
* consider using a transfer mechanism that's inherently capable of doing
incremental updates. rsync or svn come to mind. perhaps look at the
edukalibre project, which is doing something like this with svn.
this will work better with the files in moodledata than with the db.
* perhaps a serialized copy of the db, one that goes into multiple files based
on tables, would allow for updates to be sent by file diffs, rather than db triggers.
would need to then deserialize to go back into the master db.

Trying to use Moodle as is, and augmenting for the distribution, that seems
most sensible.
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Scott Elliott發表於
Pardon my naivety, but what's the issue with using a "single moodle" installation?

You mentioned the need to service 10,000 professional nurses, we have reports of userbases of more than 10,000 already.  The "Servers and Performance" forum has a few posts on managing large scale installations.

Here's one such post:  http://moodle.org/mod/forum/discuss.php?d=13658&parent=76119

Is there something about the userbase / course content that would make this situation different than the existing larger installations?

Not that a Massively Distrubited Architecture wouldn't be "cool", but for a non-profit organization is the maintenance of such a beast going to be "cost-effective"?
In reply to Scott Elliott

Re: Massive Distributed Architecture

Gavin McCullagh發表於
"This is not a standard client-server model due to limited GPRS connectivity."

In other words the clients don't have broadband or even flat rate narrowband so it's too expensive for them to always be online.
In reply to Gavin McCullagh

Re: Massive Distributed Architecture

Scott Elliott發表於
I see, I guess I read right past that part.  The statement just before the one you quoted said "Moodle servers sharing data in real-time" which I just assumed meant their would be a connection to allow real-time synchronization.  I get the picture now!

This discussion sounds vaguely familiar to an old discussion regarding students being able to install a "version" of Moodle locally and being able to work on the coursework without being connected to the Internet all the time.  At some point the student would be able to connect to the Internet and submit their work.

I'll have to dig through the forums and see whatever happened to that.
In reply to Craig Meltzer

Re: Massive Distributed Architecture

Jiannan Guo發表於

Hi Craig,

 

I recently started a project decentralizing a Moodle master server to gain consistent accessibility at remote nodes, aiming at an improvement of course quality at Open University of Tanzania. This thread pops up at first when I google around and I think we do share some goals and needs. Since this thread is a bit old, I'm wondering how this project turned out to be and whether you have some documents or experience that you could share?

Thanks in advance!

 

/guoger