Hi Chris,
In answer to your questions:
- Typically, and unless you've configured them otherwise, sessions are stored in the database. This means that there is no issue with session persistence when using a load balancer;
- I'm not sure how you'd achieve this, and it would depend on your load balancing solution, but it is inadvisable. This is because content being accessed would then bypass security checks. As an alternative you can use X-Sendfile to serve the files - search for xsendfile in the config-dist.php. In the past, I've used nginx as a SSL terminator, and then used haproxy as a software load balancer distributing to apache backends. In this situation, we enabled xsendfile support in Moodle and had files served straight from the shared file store by the nginx SSL terminator. There is still a massive benefit to doing this after load balancing. This takes the load of serving files away from PHP and gives it to the webserver (which is designed for this purpose). We also implemented an nginx in-memory cache for this content - this was only used for small files and the request still ran through Moodle first.
Running with multiple web servers, and only a single file server is a perfectly acceptable solution. This will still help you spread the load of clients, and enable you to patch your user-facing services without downtime. It's definitely something that is worth considering now, and personally something that I would do. It also means that you can scale more cheaply with virtualised infrastructure - adding many small nodes to the system as demand rises rather than a single mammoth system. In a previous University system I worked on I think we ran with 4GB RAM and 2vCPU on VMWare.
The other thing to consider is that you can have fault tolerant layer at both web and database, but only a high-availability layer in the file system. This reduces downtime for most types of failure, and gives you the flexibility to work on the file system without a complete, or extended outage. For example, with use of services such as DRBD and other similar clustered filesystem services your data is replicated across multiple servers in a fault tolerant manner, but only one server is typically live. Switchover time from one to another is minimal (should be less than 10 minutes normally). In Gluster you can have multiple servers serving your content. In both situations you're able to take nodes down and patch/upgrade/etc. them relatively easily and with minimal, or no downtime.
For the file system it's hard to dictate what kind of disks to go for. This all depends on budget, anticipated space requirements, anticipated filesystem requirements, number of disks, speed of disks, type of controller, etc. SSD will be much faster, but are more expensive. Without running benchmarks, it's impossible to say what the difference will be. Maybe worth asking the vendor for trial hardware?
Using a memcache server is optional. I'd recommend getting everything else going, and then having a look at the optimisations that you can make with memcached and friends. Memcache is not the only store, there are those listed in the plugins repository, and a few others around (like this Redis one by Sam Hemelryk).
This was explained correctly - the cache will be rebuilt automatically, but it must be shared. If it is not shared then some servers will have a different view of cached content and will serve different content to the same and different users. Some of things which are cached change and the cache is capable of invalidating itself when required. Bits of the cache are also used to generate temporary content which is accessed between requests - one example of this is backup and restore.
In the past, for Lancaster University, our setup was something along the lines of:
- Load balancers x 2(nginx SSL termination + haproxy load balancing): 2GB RAM, 2vCPU each;
- Web servers x 5 (Apache): 4GB RAM, 2vCPU each;
- Database servers x 2 (Postgres in master/slave HA setup): 10Gb RAM, 4vCPU each;
Our file system was provided by the central university SAN and was a fault tolerant, highly available system with automated failover. It was accessed over NFS and we had no control over it.
What kind of data are you storing on the CDN?
Hope this helps,
Andrew