New setup questions

New setup questions

by Chris Cormier -
Number of replies: 16

Hi All

I am new to moodle and have been trying to google alot of info lately and feel like I am falling further and further down a rabbit hole where every new link I open leads me to open 3 more links trying to get info on what I just read. Seems neverending lol. Because of this I was hoping some very kind individuals can help point me in the right direction.

Some background - We have been using Rackspace as our infrastructure on our projects for a while. We will setup moodle instance in there. In fact we already do have a low priority single server setup done but we are looking for something much more robust. The current plan is as follows:

1 Load Balancer

2 Web servers

1 High availability cloud database with auto failover

1 File server with an additional server acting as a mirror for a manual failover.


Some questions I have come up with as I was looking into things:


1 - I have seen terms like memcache and clustering thrown around. Do I need to do anything special given I have 2 web servers behind a load balancer? I believe I understand memcache is not an absolute requirement but can speed things up. If I did want to use it, it would be installed on its own server? Does load balancing the servers require me to do anything special to handle sessions or is that handled by the load balancer?

2 - I believe I am doing it right having one file server that will connect to both web servers so they share the data. Is there a way to configure it to use two file servers? Right now this is the weak point in the architecture. Is this where clustering comes in? Is there some option in a config to specify use file server X but failover to Y if X is not available?

If anyone has any type of getting started documentation for a setup like this please pass along. I don't believe my setup is too complicated but it is beyond a single server deployment. I'm just hoping to not run into any surprises and the more reading I do the more questions I have.

Average of ratings: -
In reply to Chris Cormier

Re: New setup questions

by Paul Verrall -

Hi Chris,

First, if you have not already, go read this -> https://docs.moodle.org/29/en/Server_cluster

Now as you have gathered this is a complex subject so I'll try and keep things brief and hopefully get you going in the right direction.

In my mind 'Clustering' is a vague term just meaning 'a group of computers working together to perform a task'. With this in mind your combinations of servers is already a Moodle cluster. The definition can also be applied to any one of our component services used to build our Moodle cluster (database, file serving, caching service). Commonly these services are 'clustered' in order to give better performance and/or more resilience.

Now your questions,

Do I need to do anything special given I have 2 web servers behind a load balancer?

YES! For moodle to work on multiple web servers they must share some information like sessions and cache data. They must also share the same file store which must use a filesystem that is shared and aware of file locking. The doc linked above covers these points well.

If I did want to use [memcached], would be installed on its own server?

You can use memcached to store sessions and (being sure to use a separate instance) the Moodle application cache (MUC). It MUST be available to all your webservers so it makes sense to run it on a separate instance. To add complexity memcached can itself be clustered, i.e. use more than one memcached server where they all synchronise the information stored between them to ensure it is consistent.

Without a shared memory caching service like memcached your webservers must share their cached data through the shared file system and this is inevitably MUCH slower.

Is there a way to configure it to use two file servers?

Ish... Bottom line is your file servers MUST be consistent and aware of file locking on each other. This is beyond the Moodle config and needs to be a clustered service of it's own ; this is where things start to get really complicated!

The main options in this area are GlusterFS and Ceph but there are other options too like OCFS2 if your data is local to the web servers or you serve it over NFS. It could be that Rackspace already offer this as a managed service like the resilient DB?

Average of ratings: Useful (5)
In reply to Paul Verrall

Re: New setup questions

by Chris Cormier -

Hi Paul


Thanks so much for taking the time to answer my questions. You've definitely started to make things clearer for me.

If I already have a separate server for files is it a terrible practice to use that server for memcache as well?

At what point (size) do you start to see issues with just using a file system and not something like memcache?


In terms of disaster recovery - what kind of impact would losing the memcache have (assuming content and DB are OK)?

In reply to Chris Cormier

Re: New setup questions

by Paul Verrall -

Chris,

Using an existing server will work just fine for memcached. Remember you will need two instances started on different ports if you want to use it for sessions and MUC. 

The issue with using the filesystem for cache is that it is inherently slower than RAM, especially when it it shared over a network. Assuming your filesystem can handle the IOPS it should scale OK without using memcached but page loads might be just the little bit slower. As ever there is no substitute for testing and you can easily enable and disable using memcached and judge the difference it makes on your setup.

Nothing of consequence is stored only in the caches so there is no risk of data-loss in a disaster recovery situations. You do need to be aware that Memcached data is not persistent so if you use it for sessions restarting will log out all your users. This is also why MUC and sessions should not share the came memcached instance.

Average of ratings: Useful (2)
In reply to Paul Verrall

Re: New setup questions

by Chris Cormier -

Ok noted. I guess what we will do is start without the memcache and see how things perform and we can add it if/when we need it.

As for as the file server goes, that is the only server in which I don't have automatic redundancy. I know you mentioned in your original post that this is where things get complicated and there are some options outside of moodle. With the idea of trying to keep things a bit simpler, at least at the start, are there any other options for beginner that is a bit less complex? Like having a hot spare that is synced at 5 minute intervals that we can swap out if there is a problem? 

In reply to Chris Cormier

Re: New setup questions

by Mathew Gancarz -
Picture of Core developers

Hi Chris, If you can tolerate a little bit of downtime at times, I would say you may want to consider a non-clustered, single web server and file server. If you don't have automatic failover for the file server, then I don't see much point in having automatic fail over for the web server, as without it's file store, Moodle isn't usable.

I would say your simpler solution is to have a hot spare of the web and file server that is rsynced on an interval as you mention.

For your concurrent user counts, is it closer to 50-100 users a day or closer to 50-100 users an hour or a minute? You can get away with fairly light hardware for the first two cases.

Average of ratings: Useful (2)
In reply to Mathew Gancarz

Re: New setup questions

by Chris Cormier -

Thanks for the suggestions. I discussed it with my team and I think we will look at this setup a bit more - A web server with a hot swap. The main negative of this approach is should we ever publish a high demand course that will temporarily require more resources we can't just spin up a few more servers to handle the load. So in a last ditch effort to discuss my original setup, does the fact I have these options on my load balancer fix anything:

- Ability to have session persistance so a user will remain on one server.

- Ability to put content caching on the load balancer (files are stored on LB for 10 minutes)

Would either of those options make it simpler to have multiple web servers (thinking of my original plan except with a hot swap of just the file server)

A few further questions:

With your proposed plan do you even use a separate file server or would everything just go directly on the web server?

I assume you would/could still use a memcache server if you wanted to speed up performance?

For file system at rackspace I have a choice of SATA or SSD. I'm assuming SSD is the best choice but is there a way to estimate how much of an impact that would have?

It was explained to me that in a disaster recovery scenario we would be OK without an up to date live copy of the cache or sessions. If that is the case what is the issue with needing them to be shared accross two servers?

When you discuss fairly light hardware, can you define a bit more what that is from a RAM standpoint? 2GB?4?8?


We were planning on using CDN for the data as we will have worldwide connections. Should I assume this would be easily worked into the plan and nothing we discussed would change?

In reply to Chris Cormier

Re: New setup questions

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Chris,

In answer to your questions:

  1. Typically, and unless you've configured them otherwise, sessions are stored in the database. This means that there is no issue with session persistence when using a load balancer;
  2. I'm not sure how you'd achieve this, and it would depend on your load balancing solution, but it is inadvisable. This is because content being accessed would then bypass security checks. As an alternative you can use X-Sendfile to serve the files - search for xsendfile in the config-dist.php. In the past, I've used nginx as a SSL terminator, and then used haproxy as a software load balancer distributing to apache backends. In this situation, we enabled xsendfile support in Moodle and had files served straight from the shared file store by the nginx SSL terminator. There is still a massive benefit to doing this after load balancing. This takes the load of serving files away from PHP and gives it to the webserver (which is designed for this purpose). We also implemented an nginx in-memory cache for this content - this was only used for small files and the request still ran through Moodle first.

Running with multiple web servers, and only a single file server is a perfectly acceptable solution. This will still help you spread the load of clients, and enable you to patch your user-facing services without downtime. It's definitely something that is worth considering now, and personally something that I would do. It also means that you can scale more cheaply with virtualised infrastructure - adding many small nodes to the system as demand rises rather than a single mammoth system. In a previous University system I worked on I think we ran with 4GB RAM and 2vCPU on VMWare.

The other thing to consider is that you can have fault tolerant layer at both web and database, but only a high-availability layer in the file system. This reduces downtime for most types of failure, and gives you the flexibility to work on the file system without a complete, or extended outage. For example, with use of services such as DRBD and other similar clustered filesystem services your data is replicated across multiple servers in a fault tolerant manner, but only one server is typically live. Switchover time from one to another is minimal (should be less than 10 minutes normally). In Gluster you can have multiple servers serving your content. In both situations you're able to take nodes down and patch/upgrade/etc. them relatively easily and with minimal, or no downtime.

For the file system it's hard to dictate what kind of disks to go for. This all depends on budget, anticipated space requirements, anticipated filesystem requirements, number of disks, speed of disks, type of controller, etc. SSD will be much faster, but are more expensive. Without running benchmarks, it's impossible to say what the difference will be. Maybe worth asking the vendor for trial hardware?

Using a memcache server is optional. I'd recommend getting everything else going, and then having a look at the optimisations that you can make with memcached and friends. Memcache is not the only store, there are those listed in the plugins repository, and a few others around (like this Redis one by Sam Hemelryk).

This was explained correctly - the cache will be rebuilt automatically, but it must be shared. If it is not shared then some servers will have a different view of cached content and will serve different content to the same and different users. Some of things which are cached change and the cache is capable of invalidating itself when required. Bits of the cache are also used to generate temporary content which is accessed between requests - one example of this is backup and restore.

In the past, for Lancaster University, our setup was something along the lines of:

  1. Load balancers x 2(nginx SSL termination + haproxy load balancing): 2GB RAM, 2vCPU each;
  2. Web servers x 5 (Apache): 4GB RAM, 2vCPU each;
  3. Database servers x 2 (Postgres in master/slave HA setup): 10Gb RAM, 4vCPU each;

Our file system was provided by the central university SAN and was a fault tolerant, highly available system with automated failover. It was accessed over NFS and we had no control over it.

What kind of data are you storing on the CDN?

Hope this helps,

Andrew

Average of ratings: Useful (2)
In reply to Mathew Gancarz

Re: New setup questions

by Paul Verrall -

I think rsync is a very poor solution to this problem and will become more of an issue as the file store grows over time, maybe even to the point that rsync would not finish one cycle within an acceptable time frame. I could not reasonably advise anyone to go down this path.

IMHO a better (best?) solution for a hot spare replicated filesystem would be DRBD.

Average of ratings: Useful (2)
In reply to Paul Verrall

Re: New setup questions

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Yes - I second what Paul says. DRBD is an excellent tool.

I would specifically avoid rsync for this kind of task. It is slow, can add huge IO bottlenecks to a system, and is not fault-tolerant. If you have a bad disk cluster and a read brings back corrupted data, that corrupted data is replicated!

I have seen at least one really really terrible disasters because of rsync.

Andrew

Average of ratings: Useful (1)
In reply to Andrew Lyons

Re: New setup questions

by Chris Cormier -

First, a big thank you to both of you for taking the time to work me through this. It is definitely very appreciated.

So it seems that what is stopping me from doing what I really want (multiple web servers with a separate DB server) is trying to find a way to create either a high availability file system or a cluster that will have true redundancy. Like I mentioning in a previous post, if I was working with physical hardware I would probably make use of a SAN with RAID and be off and running. With cloud hardware I have less options.


So let's say I were to go back to the plan of having a separate web server. When we talk about files, I assume that were talking about only the cache and the content that is uploaded by us correct? Or is there something else that I need to be concerned with? I assume most data changes are done in the DB. If that is true, obviously we can control the the content part and make sure we have backups that can be recreated quickly. Based on previous post, it is better to have the cache persist but in a worst case scenario the cache can be rebuilt (I assume without too much of an impact).  While this situation might not provide the true redundancy I am looking for in terms of an automatic failover, it at least gives us a way to manually get things back on track with small downtime. Thoughts? Or am I missing something? Is there other data I need to be concerned about?


As for the DRBD suggestion - I am definitely looking at it. My worry is that I already have so much to learn for moodle that trying to implement DRBD at the same time will increase the startup time by a decent factor. I would look to do something for this as a second phase if you guys agree that my plan above could work if we can live with the shortcomings.

On the subject of DRBD type solutions - In your experience is that the best/easiest one to setup for someone in my situation? I can across the links below and I think they would help me out:

https://developer.rackspace.com/blog/clustered-storage-on-rackspace-opencloud/ - slightly outdated but specific to Rackspace

http://justinsilver.com/technology/linux/dual-primary-drbd-centos-6-gfs2-pacemaker/ -specific to Rackspace but uses CentOS when I planned on using Ubuntu

In the end maybe it doesn't seem so bad to setup. However please let me know about my data questions above (and at least temporarily using an image as a manual failover assuming the data outside of the cache will only be created by me) because I would prefer to attack this as a second phase.

As for the CDN - The data will be the actual static content (the flash streaming content). Rackspace allows us to setup a pull CDN so as things are requested they will be pulled down the edge node and then served from there going forward until the content is updated on the source and the TTL expires.

Average of ratings: Useful (1)
In reply to Chris Cormier

Re: New setup questions

by Andrew Lyons -
Picture of Core developers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Testers

Hi Chris,

Glad to be of assistance so far smile

I still think that you should be looking at a system with multiple web servers as you originally asked.

When we talk about files, we talk about several things:

  • The Moodle codebase - this is generally best stored directly on the web servers for improved caching and reduced latency. You do need to consider how you intend to keep this in-sync across all web servers;
  • The Moodle dataroot - this is where all transitory data is kept. This includes the Moodle filedir, the temp dir, the cache dir, and others. This does need to be shared across all web servers; and
  • The moodle localcache directory. This can (and should) be stored locally on each server.

So the only thing you need to really consider is the Moodle dataroot. As you suggest, data changes are persisted to the database, not to file.

The cache is safe to throw away at any time - there is no benefit to backing it up, but it must be shared between all nodes. It will automatically regenerate itself so there's no manual task to restore cache. Basically it fills up as things are requested - so the first time somewhere in the code calls get_string(), it fills the string cache.

With regards DRBD, from a very quick glance, the article you posted from Rackspace looks exactly what you're after. The second link is relevant too. I've always been a Debian man myself, but I don't imagine that much has changed between Ubuntu 12.04 and Ubuntu 14.04 with regards most of that article. It looks like about half of it is specific to rackspace (e.g. the nova commands), whilst the other half is relevant to Ubuntu. It should be pretty easy to find an updated version of most of that information.

Paul and I are both fans of DRBD - he still uses it in a number of high-concurrency environments and the maintenance of it has been relatively low-key.

For you setup, you can of course start with a non-replicated file server. This isn't the end of the world, but the usual caveats apply:

  • make sure that you have working and tested backups; and
  • downtime of your file server node will take down your entire service.

Getting something working now without a replicated filesystem seems like a sensible approach to me, but just be aware that it is possible to have some data loss if a server has a hardware failure, and your last backup was several hours ago.

I'm not sure what you mean when you say "and at least temporarily using an image as a manual failover assuming the data outside of the cache will only be created by me". Content in the filedir is created by all users. For example, students can upload files to forums, text editors, their profile image, their private files, assignments, etc.

Hope that this helps,

Andrew

Average of ratings: Useful (3)
In reply to Andrew Lyons

Re: New setup questions

by Chris Cormier -

I agree that I would like to setup in a way that allowed for multiple web servers. If I think I can get the file server to be redundant to some extent that is what I will look for. 

For the purposes of the remainder of this discussion let's assume the following:

  • Let's not concern ourselves with the codebase. I know it needs to be in sync and we will work on that but that isn't the big problem.
  • The localcache will be stored on each server
  • The moodledata root will be shared across all servers. The question is how to make this redundant. Below I will outline my plan in two phases. If you could please let me know if you think I am missing something.

First Phase

So one thing we have to get out of the way is how we will setup moodle. I must apologize for not being more clear with this at the beginning but it is because I wasn't aware of some things. I am just the infrastructure guy while someone else is in charge of actually doing all the moodle config. I spoke to him and confirmed that the system will not have forums, user uploads etc. I was told we the content we need to be worried about is the files that go into the web directory (the actual flash content that will run the courses) and the files uploaded through moodle by the system admins like the manifests and resources files that will be part of the course (I hope I am using proper terminology to makes things clear).

Given the above, I know the system might not be perfect, but do you think it is conceivable given that we control all uploads to use a server image as a hot swap backup. So in the event that the file server has a problem, we would spin up a new server from the image and put it in its place. If any files had been uploaded between the backup and now we would manually upload them. There would be downtime while the new server is created but the downtime wouldn't be excessive.

The reason for this first phase is to make sure we at least have something up and running now without having to wait to make sure we can setup a clustered file server before moving forward. My concern is I still might be missing files of importance that can play a role. Some specific questions I have:

  • Are there other files of critical importance that a potential day old backup would impact? 
  • How would the system behave if a file uploaded via the moodle CP (which I assume would create a DB entry) was later missing from the actual disk? Obviously it would throw an error if you tried to access it but would it be enough to effectively just redo whatever work we did between the last backup and now?

Phase 2

Once we get something up and running in an acceptable way (to make sure we can move forward with our planned launch) I would look to try and move forward with the cluster file server. That would provide me the time to learn and implement the solution. The idea would be to have two or more file servers in a master/master type cluster.... similar to a Rackspace link I had posted about before.

In reply to Chris Cormier

Re: New setup questions

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi

What is the kind of load you expect in terms of number of users, concurrent users and their activities? These things are explained in more detail in the documentation linked from the header of this forum.
In reply to Visvanath Ratnaweera

Re: New setup questions

by Chris Cormier -

I guess it is difficult to say anything with good precision. We actually currently have an LMS that is handled by a 3rd party and we are moving to a setup in house. We will be slowing transitioning users over so at it's peak we may have 10K users, of which maybe 1K are active. Probably about 50-100 concurrent users. 

In terms of activities they will be split between running flash based presentations (altho will be slowly moving to html5 presentations) and doing assessments.


Would this be considered a large deployment?


Please let me know if I didn't give you the info you are looking for.

In reply to Chris Cormier

Re: New setup questions

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi

That is what I suspected. For some unknown reason many planners of Moodle excessive estimations of the future usage and even more exessive imagination of the hardware needed. Taking the worst case scenario of all your 100 concurrent users are going through a synchronous on-line test (MC using the Quiz activity) a five year old server would handle that, provided you take a decent operating system and know how to do a neat installation.

For more general information read the documentation linked in the header of this forum. (Well, there are more in the header, hidden. Try to add a new discussion, then you'll see more, the advanced search, for example.)

Here are two specific examples:
- "Login 1000 student at a time on one quiz"
https://moodle.org/mod/forum/discuss.php?d=316736
(600 simultanous on-line exam candidates on a single server)

- "10 Mini-PCs With Pre-Installed Linux"
https://moodle.org/mod/forum/discuss.php?d=312419
(A decent Moodle installation on a Raspberry Pi 2)
In reply to Visvanath Ratnaweera

Re: New setup questions

by Chris Cormier -

I appreciate the follow up but I think your off on my assumptions. 

First I should state that 100 concurrent users would be quiz + flash content (not sure if that makes a difference because one if worse than the other).

Using the first link you provided I can see that Frankle Lee had 600 concurrent with 2 CPUs, 32GB of Ram and Raid 5 drives. In his environment he has dedicated server(s) so he has the ability to shape his hardware exactly to his needs. In my example I am trying to use cloud hardware while also being redundant. 

I suppose I could spin up a 32GB server and do the same but I figured things would be better and much more redundant by running 2-3 servers with lower RAM (4-8GB) so if one drops it's not the only server.  Running in a Raid 5 setup would also be faster then the HD option I was planning in the cloud. So again, if I spread people across more servers then different HDs are sharing the load.


A setup is very different when you have specific planned events vs needing to be open for 24/7 connections.