I'm putting together an architecture for my institutes migration to Moodle 2 and would really appreciate some feedback from the wider community on it. The intention is to virtualise as much as we can; possibly 2 - 3 physical servers running guest OS's for load balancers, web and database cluster. Going the virtualisation route allows us to adapt the architecture a lot more. Anyway, I'm working on diagramming some more thoughts on it here, so I'll follow up this post with more information.
That all makes sense to me. We operate a similar architecture for some of our clients except that we use Apache2 rather than nginx and our haproxy is behind apache rather than stunnel.
Glad to see that you've got APC mentioned there - don't forget to tune it. Personally I'd switch to PostgreSQL over MySQL any day - though that's purely a personal view. In my experience there's greater potential to tune Postgres.
In terms of the web frontend, have you considered the data storage options? One of the benefits of an architecture like this is the ability to scale web frontends horizontally to add capacity and resiliency, but to do so you'd need a network file system (e.g. NFS) to share data on the dataroot.
Similarly, if you do go down the multiple web frontend route, I'd strongly recommend putting your moodle deployment into debian packages. This offers a greatly reduced deployment overhead (once you get your head around building packages of course). It also means that you have an easy reversion path (apt-get install COMPANY-moodle-frontend=version for example).
Thanks so much for the reply. Yes, I should have included a SAN in the attached diagram; we plan on having the moodledata/dataroot directory mounted to each of the nodes in the web cluster.
I see from another topic in this forum that Apache 2.4 has just been released. I'm not sure if we will stick with Apache or migrate to Nginx altogether. That remains to be seen.
Also, thanks for the tip on packaging up our Moodle deployment. I'm not very familiar with this. Could you elaborate a little on this?
With the way in which we operate, we have a separate branch containing our debian packaging. This is basically a debian folder at the top level containing a small selection of packaging files - e.g. changelog, copyright, rules, control, and various other bits.
There are numerous ways to build debian packages. Personally I use dpkg-buildpackage:
dpkg-buildpackage -uc -us -sa -rfakeroot
This ensures that when I build the package it's built in a fakeroot environment for example.
After building a package, we'll include it in our package repository using reprepro. It's highly worth reading the reprepro documentation and there are other (simpler) repository tools out there too.
We then have the package repository in our /etc/apt/sources.list. Installing our moodle package (or any of our other customer software) is just an apt-get install FOO away.
I'll see if I can dig out some links on getting started and whether I can post any of our configuration to help,
We package up the entire moodle source directory with all of our customisations, additional blocks, modules, etc.
We also set the package dependencies (e.g. php5, php5-pgsql, clamav, freshclam, zip, unzip, etc...). Incidentally, we don't include apache in our dependency listing. This is because we also have a cron server which is intended to reduce the burden on the frontends and the cron server doesn't require apache (just php). This is something you may wish to consider in the future - it's relatively trivial to implement and it can make a big difference to your frontends.
We typically also have a service package for each class of server. For example:
** Includes the luns-moodle-CUSTOMER package, apache2, etc
** Include the relevant postgres packages
** Includes the luns-moodle-CUSTOMER package, and anything else required
This ties in nicely with our configuration management system. If we need to quickly provision a new frontend, in theory we can create a preseeded debian installer which installs all of the base system config, updates everything, and then installs the service package.
The way that we do things is slightly convoluted, but enables us to easily track the main moodle repositories. You can definitely do this in a much less complicated fashion than we do so but for interests sake, this is what we do:
* Each feature (mod/block/local/theme/etc) is in it's own git branch based on the latest major release (e.g. all of our features are currently in branches named CUSTOMER-feature-2.2-FRANKENSTYLE_FEATURE based on the v2.2.0 tag).
* We then have a deployment branch (e.g. CUSTOMER-deployment-2.2).
* When we develop a feature, we do so in isolation on it's on branch, after which we checkout the deployment branch and git merge it in.
* We then have a packaging branch (luns-moodle-CUSTOMER-2.2) which is identical to the deployment branch but additionally has the debian packaging files.
* For testing we also have a staging branch which we push directly to a testing server - e.g. CUSTOMER-staging-2.2
An example workflow would be:
# Check out a new branch based on the latest major stable tag v2.2.0
git checkout -b FOO-feature-2.2-mod_example v2.2.0
# write some code here and commit it
git add mod/example
git commit -m 'mod_example Some description for the initial commit (Fixes #123)'
git push lunsprivate FOO-feature-2.2-mod_example
# Merge it into the deployment branch
git checkout FOO-deployment-2.2
git merge lunsprivate/FOO-feature-2.2-mod_example
# Do some testing
# Push the feature to our repository
git push lunsprivate FOO-deployment-2.2
# Check out the packaging branch
git checkout luns-moodle-foo-2.2
# Merge in the new feature
git merge lunsprivate/FOO-deployment-2.2
# Update the debian changelog
git add debian
git commit -m '[debian] Updated to include mod_example'
# Tag the commit
git tag luns-moodle-foo-2.2.1-21
# Push everything to our private repository
git push lunsprivate luns-moodle-foo-2.2
git push --tags --repo=lunsprivate
# Build the package
# Send it to the package server
# On the package server we now include the package
sudo su - aptfarm
reprepro include ~nicols/packages/luns-moodle-foo_2.2.1-21_amd64.changes
# On the servers deploy
sudo aptitude update && sudo aptitude upgrade
For every feature and every bug that we include in the stable deployment, we also create a tracking issue in our issue tracker. This means that when the next major version of moodle comes out, we can go through each issue and make sure that we don't forget any, or introduce any regressions.
It does sound very long and arduous but once you've got the basics together, it actually saves us a lot of time.
If you want any of the above clarifying I can try. As I say, this is a fairly convoluted and I'm aware that we don't do it the easy way. It does have a number of advantages:
* we base features on the latest major stable tag to make life easier when the next one comes out. At this point we take each feature branch in turn and git checkout -b FOO-feature-2.3-mod_example; git rebase v2.3.0; git push lunsprivate FOO-feature-2.3-mod_example. Once each feature branch is complete, we check out the new deployment branch and merge them all.
* each feature can be worked on in isolation
* different developers can work on different features for the same customer very easily (git also makes this easy on the same feature though)
* we can re-use features multiple installations for the same customer
* we can easily apply upstream bug fixes/new features
Any questions, feel free to ask!
Hi Andrew and Cathal,
it's a very interesting topic. We use similar architecture: dedicated caching & proxy & load balancer server, dedicated PHP & Moodle Server, dedicated Database server. But we use different software & OS. I will be grateful if you could explain your point of view on using PostgreSQL instead of MySQL. Please, pay attention to speed & security & safity as the main factors of advantages for me. The point is that currently we're using mostly MySQL which is accelerated & cached using appropriate software - so it's fast enough. Also there're no problems with replication. And threre's a new product - MySQL Cluster which is supposed to be a brilliant solution, but we haven't tested it yet. And another question is which database is native for Moodle? MySQL or PostgreSQL. It's clear that optimized code works fine in one environment but it needs further modification to work at the same optimized level in another environment.
Thank you in advance and glad hearing from you soon!
delighted to see that you have Git in your workflow too. I use it myself and reckon it's a fantastic piece of technology. Branching and merging is much easier than on other SCMs (SVN can be a nightmare!).
For deployment, I've generally used Capistrano (https://github.com/capistrano/capistrano/wiki/). It has grown out of the Ruby on Rails community but can be applied to pretty much anything you want to automate the deployment of. There's also a Gem for handling multi-stage deployments (https://github.com/capistrano/capistrano/wiki/2.x-Multistage-Extension); for example, if you wish to seperate your deploy into servers for 'production', 'qa', 'staging' etc. Capistrano easily handles this.
Just as a follow up to the above, I've put together the following that shows the physical structure of our Moodle 2.x architecture.
- 2 physical servers (based on Dell PowerEdge R815) running VMWare vSphere.
- a primary and slave load balancer running Nginx to distribute web traffic to the web cluster.
- a cluster of web server VMs on both servers running Moodle on top of Nginx and PHP-FPM. (haven't decided if its necessary to keep web traffic localised to within a single physical server or if the primary load balancer can distribute web traffic across the web clusters running on both physical servers.
- MySQL master and slave databases replicating through DRBD.
- finally, heartbeat as the mechanism for failover between the master and slave pairs in the load balancer and database clusters.
I'm not too hot on the lower level details for the configuration of our virtualised servers at the moment. Things like how best to configure the network for the clusters of VMs (web, load balancer and database) and what kind of disk configuration (RAID ??) we should adopt are a bit beyond me at this time.
Any comments on the above would be grately appreciated.
Thank you for all this disucssion, It's been very educational. I'm not heavily experienced in the replication/failover arena, so I'm looking at I think a smaller implementation of what you have.
I'm trying to just start with replicating a single server to a duplicate backup server. Where I'm at now is rsyncning moodledata and html between the servers. I currently do daily database backups and keep 7 past backups. I just started attempting to replicate the database from the main server to the backup server, and settingup MySQL replication went fine, but I'm getting errors with duplicate primary records in the sessions and log tables.
I undertand my setup requires a physical intervention to initiate the failover, and do plan on moving to heartbeat and DRBD, as I've been reading on those, but until I get a better grasp of those optoins, is simply replicating a database an option here?
So the database issue was resolved with a little thinking through - I should have been using mast-master replication. With that setup, the database replication is working just fine.
There is an issue with images not showing up though, have to look a little more into that. When images are posted to the main site, then moodledata and html directories rsynced, the backup site show sthe image place holders, but does not display the images.
The biggest risk is getting into a split brain situation where one node in the replication set loses communication with the other, or the replication becomes laggy. If you have some frontends interacting with one node, and some with the other, then there's a chance that you may get into a situation where two inserts are made at the same time (or deletions) but these aren't replicated immediately and you have to manually resolve the conflict. On a busy site, this is a real risk.
Don't use multi-master unless you *really* **really** have to.
Thanks for the word of warning!
I think I'm okay in my situation, the 2nd system isn't used by any users, it's merely for emergency failover. While I know it's not the ideal failover situation, it's the best option given the constraints of the organziation I have to work with.
The only time the 2nd server will be used by end users is if the main server goes down, I'll have to manually make the switch and then deal with the differences in the database when the main server gets repaired. I'm also making daily backups of the database, so I think I'm pretty well covered. We just needed an option that would keep Moodle up in the event of a hardware failure.
Looks like the warnings were right, master-master creates database errrors left and right.
I started with slave-master and then went to master-master because the first setup cause duplciate key entires. The second database is on a duplciate server running Moodle, so when I went to access that server to verify the replication, it would create errors. I guess I just need to setup the master-slave with a non-live database and check it on occassion to make sure it's replicating.
I'm still in test phase, so it's not been a critical issue. I appreciate the input from you guys!
I'd really like to get a true failover setup, but I'm limited on what I can do in terms of getting IPs setup and I only have the two boxes to work with.
This may seem like a relatively scary prospect, but have you considered moving to Postgres. Postgres doesn't support multi-master (yet), but it does support streaming replication from Postgres 9.0 onwards. We, and many other of the large Moodle installations, prefer postgres over MySQL for a plethora of reasons. It's highly scalable and pretty easy to tune (see the Postgres Tuning Wiki for initial pointers). It also cares about your data far more than an out-of-the-box MySQL installation, and a well tuned postgres installation is a very good match for a well tuned MySQL installation (ignore any performance guidelines which say to switch to MyISAM - it's not ACID compliant).
If you do go down the postgres route, I'd also recommend looking at pgpool-II as a connection pooler - this reduces the connection overhead to your database.
With streaming replication in Postgres, you can perform read-only queries on the slave, and read-write queries on the master. Later versions of pgpool-II can send non-transactional read-only requests to your slave (if it's hot enough) and write or transactional requests to your master to help spread the load.
Others on these forums have been experimenting with MySQL Cluster server, but I'm not sure of it's capabilities or how they're getting on -- perhaps they may like to post with their experiences.
I know that this is a little late for responses, but have you considered using keepalived rather than heartbeat for IP failover on your load balancers?
haproxy is a little less STONITH and more about VRRP priority, but one of the main benefits to your setup is that you can have a pair of VIPs, with each having priority on a different load balancer. If either load balancer fails its healthchecks, the VIP will swing over to the other load balancer automatically and users shouldn't notice any difference.
You could easily have each load balancer configured to give the virtual server(s) on the same host a higher weighting such that they're used in preference to the second server but all servers can still be used.
You may also want to look at haproxy rather than nginx for the load balancing. Although it doesn't terminate your SSL connections, it does load balancing very well. If you do discover that session stickyness is important, you can direct sessions based upon client IP hashing which should (I believe) work consistently between two load balancers -- this isn't strictly required for Moodle.