A question I have for the community is, I am currently using Reverse Proxy, Sticky Sessions in Apache along with load balance settings to create a cluster for my Moodle instance.
I had to place the session store files in a local mount directory (instead of the shared mount where the content directory is) because of performance.
By having the session files stored at the local mount for each machine gives me the opportunity not to have a single point of failure and I can scale more via hardware and not to be putting pressure on the content directory. Its even good security wise because if the content directory (which is in a shared directory mount between the machines) has an issue, the sessions will not be affected.
Now with this setup that I have, I wanted to ask, is this the right way for me to be doing Moodle Clustering? I wanted to see what the community's opinion is regarding how I have my infrastructure in place.
It is how we do it here at the OU, though we would prefer to have the sessions on our shared mount, and non-sticky sessions. It was not performance that stopped us doing this, but weird synchronisation issues. If one server handled one request, and then another server handled a different request almost immediately afterwards, then sometimes the second request did not see changes in the session made by the first request (but it was intermittent.) So, sticky sessions are easier. The only down-side is that if you have a lot of long-lasting sessions, then gradually the load can get a bit unbalanced.
The reason I am not having my sessions stored in the shared mount is because we also didn't even need to run a load test to see the issue. We have a custom page in our Moodle instant that has several AJAX calls. With only one user, when the user hit that page, 1 or 2 ajax calls get passed but the other 3 or 4 calls failed.
We have verified that the problem was caused by session handling.
The issue we saw from debugging was Session locking issue.
So in order for us to solve this issue, we stored the sessions in the local directory structure instead of the shared mount.
Something you should look into...
We've recently done a lot of load testing on our Moodle Test system and discovered that, using an Oracle 10g database and 2 x dual-quad-core Intel apache servers, with sessions stored in the database, the database started having load issues at around 500 concurrent users (this was with a very specific set of test profiles of course). Sessions-in-database doesn't work so well in Oracle because one of the columns in the sessions table is a CLOB, and at 500 users, most of the database resources were taken up reading from and writing to this column. We switched sessions to the NFS-shared filesystem and were able to get up to around 850 users before we experienced performance problems, and those problems seemed to be related to the session files (high kernel wait times).
We rolled the two physical apache servers into production and for some reason, with sessions in the filesystem, performance is a lot worse than it was on our test system. We think it's because of all the 'things' monitoring whether apache is up on each server, things like load balancers, reverse proxy servers etc. (we didn't have nearly as much monitoring going on when the physical servers were part of our Moodle Test system). The kernel wait times have been high once again.
We are going to change the monitoring of the apache servers by having the load balancers/reverse proxy servers check that apache is up and can talk to the database, but avoid hitting Moodle itself so that a session is not created on each monitoring request. This should hopefully improve things.
I thought of saving sessions locally to each server, but given our Moodle session timeout is 1 hour, it would take a lot of flexibility away from us in that we couldn't so easily take apache servers out of the load-balanced pool any more (we'd have to disable the server in question, leaving existing sessions going to it but not allowing any new ones, and leave it up for an hour before we could take it down). Also, if a server failed, everyone logged into that server would of course have to log in again.
Ali, how are you storing the session files on the local filesystem? By symlinking the 'sessions' directory in the Moodle userdata NFS share to a local directory, or another method? Do you attempt to synchronise the sessions between application servers?
Martin Langhoff mentioned in another thread that it is possible to hash the directories that session files are stored in, but I can't find any other reference to that anywhere... anyone know anything about that?
We are supposed to be moving back to sticky sessions and local disk, which should solve this problem again and improve performance (yes it will introduce a problem if removing a server from the pool because we lost power to a computer room or whatever, but that doesn't happen all that often).
On our system, because of the auth plugin we use integrated with university SSO, users don't actually have to log in again even if Moodle trashes their session (it will just make another one); the only problem is if you are e.g. in the middle of writing a forum post, when the session key will now be wrong. We get sporadic reports of this (forum post failing for session key) at present due to session corruption; there are a few other less obvious symptoms.
Idea that would involve coding: I think the only information that's really needed to recover a session in most cases is the session key and user id (there are other things that you lose if you lose a session, but most of that's not frequently used or the consequences of it disappearing are not very serious). A possible way to improve session performance while not losing all data in the event of server failure might be to use local sessions, sticky, but at the start of each session, store the session cookie and its relationship to session key [and, when authenticated, user id] in the database. That way, if a server goes down and another server picks up the user, that other server can initialise their session from this information. Would need serious security consideration though...
The way I am storing sessions in the file on the local filesystem is with a little tweak in the codebase to store the files in the local machine (rather than where the content directory is which is inside the shared mount) and by using sticky sessions. I have a web layer that acts as a loadbalancer and a reverse proxy to my application layer which is hosting my moodle site. I have 2 machines each running their own apache web server on the web layer. Both configured as a load balancer and reverse proxy along with sticky session. I also have 4 machines on my application layer which hosting their own moodle. the 4 moodle instances are looking at one shared mount for the content directory but each moodle instance stores its own sessions locally. Again this is good for scaling out and not putting pressure on the content directory which by default is where the file sessions get stored and also no single point of failure. And it is faster because of the items I explained in my previous posts.
Let me know how your setup goes
With sessions in the userdata NFS-shared filesystem, if I run 'iostat -xn 2' on our apache servers, there are intermittently very high numbers (2000-4000) in the 'ops/s' column, but virtually nothing in the 'rops/s' and 'wops/s' columns. Concurrent with this is a general performance hit for our users. I can't find much out there about Linux 'iostat -n', so I can only conjecture...
I'm suspecting the high 'ops/s' is because PHP session garbage collection is happening on all three of our apache servers (we have one 'admin' server that isn't in the load-balanced pool), so all three apache servers stat the sessions directory in the NFS share, which is very slow. Could be totally wrong about that though... if I'm right about that, the reason we didn't see it in our testing in the Moodle Test system was because I cleared out all the session files after every test, and we only ran the tests for half an hour, where the session timeout was (at the time) 2 hours.
I've hacked /lib/setup.php a bit so that it's able to take heed of user-defined PHP session configuration (e.g. session.gc_probability). I can't test it at the moment but I will tomorrow. My idea is that I'll only have the admin server doing session garbage collection and leave the two 'live' apache servers to just create and use sessions. Perhaps even having one of the three servers that mount the NFS share doing a directory stat will be enough to slow down the entire system, I don't yet know.
If my fix outlined above doesn't make any improvement, I'll investigate hashing the PHP session save path -- I see it's natively possible within PHP (with 'session.save_path = N;/path'). Might do that anyway.
find /<moodle-sessions-dir> -type f -mmin +60 | xargs rm -f
I'm using session hashing 3 levels deep, which for PHP5 results in 32,768 unique session directories. It's a lot quicker than when all session files were saved in one flat directory, but there is still the odd slowdown, so I'm going to save session files on each apache server's local hard disk and rsync them between apache servers, running the deletion cronjob on each apache server every 5 minutes. Also, I'll code the rsync into the apache service script so that it happens on apache shutdown, so that sessions will exist on other apache servers when apache is shut down on one server.
isn't hashing in general slow for php?
By 'hashing in general' do you mean like the sha1 function? I don't see a reason why that would be slow, and haven't noticed any particular performance problems connected to it. Do you have a performance test that demonstrates any problem?
sha1 is really fast in java (I tested it at one point for another project); in PHP I assume the function is implemented in C so should be approximately the same speed, or possibly slightly faster depending on platform and C compiler used. Of course this assumption might not be correct.
I tested sha1 and md5 performance in PHP one time, and it was really fast. I can't remember the numbers now, but it was the kind of thing where that was never going to be the performance bottleneck in Moodle code.