I've considered to rent some virtual servers, or I don't know if necesarily we have to buy phisical hardware with a special configuration.
What do you recommend me?
Thanks in advance
In any case, I wouln't trust a virtual server for that, at least not for the database part. Hardware is cheap these days, people's time is much more expensive.
Do we have to install the same application in multiple servers
- each has his own definition of "concurrent users"
- the size of the courses matters a lot
- the type of "activities" people use is important
- different Moodle versions have different needs
- the software stack below it, operating system, webserver, database server, PHP engine, PHP accelerator, etc. are important
- and how they are tuned
- the hardware: RAM, CPU, the i/o performance
- the network link
- if still necessary, using techniques like clustering, load balencing, reverse proxy, etc.
There are very many detailed discussions in this forum. Go through them first.
The definition normally used here is 'users requesting something from the server at that particular point in time'. If all those users want to listen to a streamed audio clip at the same time, then you have a big challenge. However, if they are just going to look at some written resources, which they will download and read for a while, then this is very different. You will first need to profile what exactly the users will be doing and work out how often they will make page requests and then go from there.
Ok, here's a quick guide. It does make me sad that I keep writing these notes and noone around here seems to know how to use google.
So - step one: Use google
I'm going to mention several important things that you must use google to find more info about
Don't use virtualisation
Get real hardware. Virtualisation is for when you have too much HW, and very little to do so being inefficient is not a problem.
Clearly not the case here
Spend your $ and time on the DB server
The DB server is where performance happens, or not. You will want several cores, lots of RAM, fast, good networking gear.
My recommendation is to buy only stuff that has high quality open source drivers - some kit has only closed source drivers that work on specific versions of RHEL or SuSE. Say no to that. Newer (faster!) kernels will come out, and you'll be left running an old "certified" and slow kernel.
Use PosgreSQL and Linux in 64 bits, and get hold of a good DBA that understands how to tune PostgreSQL and Linux's disk IO for the database. I've posted quite a bit on thos forum about Pg tuning, and there are some excellent guides out there on the net. Pg 8.3 is a speed monster
Ask the DBA how many spindles to get for the DB system. A good DBA will want at least 2. 3 is better.
If you don't tune it, you'll probably only get 1/10th to 1/4th of the performance that the HW can deliver.
And! Only use a SAN for the DB if you can get guaranteed, assigned spindles.
Moodledata + sessions
I've done moodledata - with sessions in it - over NFS. Not so hot for high scalability.
But at one large install we saw the Oracle clustering FS for this role. Fantastic! I've posted a few times about it, and about GFS (GlobalFS, open sourced by RH).
If you have a SAN handy, GFS or OracleFS are both GPL'd and on all modern linuxes.
Avoid sessions on the DB, use hashed dirs
Your DB will be busy enough as it is. Sessions have high contention, and an ACID DB will spend a lot of effort in the dance around sessions.
Put the sessions on that GFS/OraFS SAN, on NFS or on memcached. If you put them in the FS, make sure you used hashed dirs (I've posted instructions on this in the past, also config-dist.php has some hints).
I am personally not super happy with memcached because
- I'm old school, and memcahed is a cache, not a storage tool. So it will lose data if restarted. There are some scripts that let it save the data.
- Last time I used it showed high tcp latency
- and the C version would crash if the daemon restarted...
- and the pure PHP version was a tad slow
but those are all minor issues.
Go cheap on the webservers
Work hard ensuring you have very low-latency on your network between the webservers and the DB server and moodledata/sessions storage.
And go cheap on the webservers you can just buy more of them.
The most important thing on the webservers is memory. I get lots of RAM but go cheap on CPU and disk.
And understand how memory is used, and tune apache for the memory usage. I've posted a ton about maxclients, maxrequestsperchild and related tunables here. Search for my notes on http keepalive timeouts.
One thing we didn't use to have was good info on how much memory each process was actually using -- the COW-related shared memory info was pretty opaque. But recent linux kernels are exposing what are called 'smaps' in /proc, and there are some nice utilities that aggregate that info. It's a bitof a moving target right now - but it gives excellent data on how much memory each apache/php process has that is shared and unshared.
Also - until recently the 64-bits 'webserver' toolchain was not stable - some precompilers acted up, some PHP extensions had trouble. There are advantages to staying on 32 bits for the webservers - including that your mem usage for PHP is smaller. These advantages are starting to disappear, so perhaps this is old advise.
Do use a good precompiler
It is a good idea to not get too many webservers. Get 2 or 3, and add more when you need them. The more you delay the buy, the more power your dollar buys!
Good tcp-level load balancer
LVS has been great for me. You'll also want to run a reverse proxy - varnish seems to be all the rage lately. Be careful with bad interactions between the LB, rev proxy and httpd.
If you set the LB to be very sensitive things go to hell on a fast train when you start getting real traffic.
Monitor at the lowest level possible
make sure you have iostat & vmstat installed, and are running sysstat. Learn to read all that gibberish and become a true Zen master.
Not kidding. Running sar is always the best way to see what's wrong.
But will this serve N concurrent users?
That's a silly metric
I tend to measure distinct users in 5 minute windows. The time-window is the most important factor, a few times I've seen people claim of "over 1000 users!" based on 1000 session files in the sessions dir. The sessions dir has a 4hs session expiry
With a good setup (might take a bit of $ on the HW ) and good tuning you can do 500 to 1000 unique users in a 5 minute window.
However, it depends what they are doing! Some modules are very light, some are very hard on the servers. Or it may depend on specific options - a simple quiz is lightweight, but you can cram a ton of hard-to-compute stuff on a mod/quiz page.
So you monitor carefully - even the same # of users will change the load on the servers over time:
- they'll use it more as they learn more about it
- they'll learn about new features and start using them - the new features could be heavier - or lighter!
- new versions of moodle will have performance improvements...
- and new modules and features that haven't been performance-tuned
And one trick for your users
If you think about how http is stateless, and how fast computers are, you quickly realise that you can have a few hundred users requesting something off your server in the same minute but they will rarely be concurrent. Think of it for a second -- as long as all the users are distributed over that minute, the webpages are created and served quickly so from the server's point of view, it's dealing with one requester at a time. Two or three, tops.
Dealing with "one or two simultaneous requests" at a any time, it can still do hundreds of thousands of pages as long as they are distributed over time. That's the magic of the webservers - we serve billions of pages over a stretch of time with an infrastucture that is often modest.
Now, imagine a demo situation, or a big lecture hall with 300 students. The lecturer says "and now, we all click 'Search' ". And at that point, in the space of 200ms, 300 replies come. And pile up. Everything goes a bit to hell -- even if the server does not, it's likely that you'll hit some bottleneck in the middle.
So one trick I teach to tutors and lecturers is a bit of indirection. Instead of "and now we all click bla", they say "read the intro text, and when you are done, click bla". The requests get spread in a period of a couple of minutes, and everyone things "wow, this moodle thing is faaaast!"
This is a filtered list, leaving out "googling", flattening peaks by spreading out synchronous user activity and the discussion on concurrency:
- Don't use virtualisation
- Spend your $ and time on the DB server
- Use PosgreSQL and Linux in 64 bits, and get hold of a good DBA that understands how to tune PostgreSQL and Linux's disk IO for the database.
- Moodledata + sessions
- Go cheap on the webservers
- Do use a good precompiler
- Good tcp-level load balancer
- make sure you have iostat & vmstat installed, and are running sysstat
I think you have to be careful about taking this advice out of context. We had just been through a very bad time (with 1.8) due to severely under-optimised SQL queries. I sense, on the whole, that things have got a lot better and that the database is *less* likely to be the bottle-neck.
However, you don't get something for nothing and we now have caching with its own bunch of potential issues.
Essentially, if you are running a large, critical system you (still) have to know what you are doing - or find someone that does. This isn't a criticism of Moodle it's simply the reality of running a big web application.
Our school has done an exhaustive test of 300+ concurrent users doing the same 50 question quiz with 30 audio files streaming on each client. The 300 users were literally doing the same quiz, at the same time, in the same rooms (four adjacent computer labs). I have a feeling you will not have 1000 people doing the same thing and the same heavy module as we have. Yet, even if you are, you may take heart because the results showed that we could have handled 1000 easily if they were all on the inhouse LAN, or 300-400 if they had been distributed across the internet.
The hardware needed: one single US$1800 server with 4GB ram, 2.8ghz dual core processor running on well-configured LAMP software. Full results published here.
¡Mira donde te encuentro!, bueno para no incomodar a los demás participantes del foro, contactame a mi correo y platicamos de tu proyecto, no se si es para la ESIME o estas ayudando a el personal del CENLEX para su examen con 2000 participantes.