Large site, VMWare, Advice needed

Large site, VMWare, Advice needed

by heli g -
Number of replies: 26
Our university started using Moodle in January this year. Our installation currently has:
  • 6,413 courses (created via external database enrolment plugin) but only 105 of these courses are currently active (available to students)
  • 60,499 Users (LDAP sync with Novell eDirectory) but only 7,798 have logged in. 192 of these users are Teachers
  • The most logins we have had are: 4138, unique logins: 2107
Our current setup is:

VMware ESX 3.5 virtual machine
CPUs : 4, RAM : 8 GB, Swap partition : 2 GB
LSI Logic SCSI Controller
Hard Disk 1 = 50 GB virtual disk
Hard Disk 2 = 100 GB mapped raw LUN on iSCSI SAN (database)
Hard Disk 3 = 500 GB mapped raw LUN on iSCSI SAN (moodledata)

OS SLES 10 Service Pack 2
php 5.2.5
mysql 5.0.26
APC PHP accelerator
Moodle: 1.9.7 +


I am concerned by 'Insert into log failed' errors we have been receiving in the last few days. (mdl_log contains 825,206 records, the db is not corrupt and the disk is not full)

We currently have both Statistics and Course backups (3 x a week) turned on.
And run the auth_ldap_sync_users.php and enrol_database_sync.php scripts in the early hours of each morning.

I would really appreciate input and advice - especially related to the configuration, stability and scalability of our system as the number of active courses is increasing daily.

Regards

Heli
Average of ratings: -
In reply to heli g

Re: Large site, VMWare, Advice needed

by Robert Brenstein -
Why would you run a production site under VMware instead of native?

On such a site, I would not bother to run site-wide course backups. You keep producing more and more zip files which chew lots of resources when being produced and which do not give you real site backup unless you run also a script that keeps deleting old files and copies new zips to a separate backup volume or they are picked by a remote backup server. In other words, if you have a proper multi-tier server backup strategy, you can skip those IMHO.
Average of ratings: Useful (1)
In reply to Robert Brenstein

Re: Large site, VMWare, Advice needed

by Neil S -
How are your backups, ldap sync, and stat runs timed? Are they overlapping? Also - are your stats processing correctly? What about all of your course archives? We get those messages on rare occasion and usually at least one course archive 'errors' out or the stats run 'errors'... You might want to check for slow queries on/around the time you get the notice and go through the mysql config variables with a fine tooth comb...

Sorry I don't have a lot of answers for you - I would also like to add that we will likely be moving from physical to virtual for our moodle app server(s) on an esxi / vSphere platform. Our database will be staying on physical hardware.

I'm currently considering running three vm's across a 10 node esxi cluster using layer 4 load balancing... two vm's will handle users and the third will handle cron and backups... with database sessions and a clustered gfs filesystem for moodledata and moodleroot I think it should be pretty easy to implement.

I'm also interested in those with a similar setup or those that have experience running in an esxi environment...

Take care,
Neil S
Northwestern Michigan College




In reply to Neil S

Re: Large site, VMWare, Advice needed

by Neil S -
A couple of additional thoughts...

Seperate your DB to another esx server - or a physical server if possible....

Use innodb if you're not already...

Enable query cache within innodb and tune other mysql/innodb parameters...

ns
In reply to Robert Brenstein

Re: Large site, VMWare, Advice needed

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi heli guy (and other floating objects)

To underline what Robert said:

> Why would you run a production site under VMware instead of native?

You should search this forum for vmware. Here is a typical discussion http://moodle.org/mod/forum/discuss.php?d=125702

> On such a site, I would not bother to run site-wide course backups. http://moodle.org/mod/forum/discuss.php?d=114464
In reply to Robert Brenstein

Re: Large site, VMWare, Advice needed

by Greg Lund-Chaix -
Why would you run a production site under VMware instead of native?


We use a lot of virtualization in our infrastructure (although we use Ganeti and KVM/Xen instead of VMWare).

Redundancy and failover. By separating the servers from the hardware, you can more gracefully handle hardware failures. When the hardware fails, you can live-migrate the virtual machine onto a different physical node and keep the services up. You can also take a phyiscal node offline for maintenance without any service interruptions.

Virtualization also makes it easier to add capacity on the fly. You can quickly add virtual machines and physical nodes to the cluster to scale the system up without taking the whole system down.

The minimal cost of the additional overhead for virtualization is well worth the increased flexibility and high availability. Especially now that most modern server processors have virtualization support directly built into the chip.

-Greg
Average of ratings: Useful (1)
In reply to heli g

Re: Large site, VMWare, Advice needed

by Greg Lund-Chaix -
Some initial thoughts:

* Are you using memcache? If not, you may want to look at enabling it.
* Are your database tables MyISAM or InnoDB? You may see an increase in scalability converting some of your tables to InnoDB - especially on large/high traffic tables.
* Is the MySQL slow query log showing anything interesting?
* What does your MySQL query cache look like? Is it full? Is the hit rate low?
Average of ratings: Useful (1)
In reply to Greg Lund-Chaix

Re: Large site, VMWare, Advice needed

by heli g -
Thank you for all your suggestions and sorry about the delay in getting back to you. I'm in the process of enabling memcache and slow query logging.

Our database tables are currently MyISAM - so you would recommend converting them to InnoDB?

I've tried to schedule syncing, stats and backups so they do not overlap. Course backups always has errors. How would you check for Statistics errors?

Since my last post the "insert into mdl_log failed" errors have stopped. I ran the MySQLTuner (http://blog.mysqltuner.com/download/) and made the following adjustments to my.cnf

skip-innodb
tmp_table_size = 64M (was 32M)
max_heap_table_size = 64M (was 16M)

I've just run the MySQLTuner again and see that it wants these values raised further.

"Why would you run a production site under VMware instead of native?"
Good question I've been following Rosario's VMWare nightmare at http://moodle.org/mod/forum/discuss.php?d=125702 and almost added this support request to that thread.

I've tried to motivate for moving the database to a physical server - to no avail. For the reasons mentioned by Greg (Redundancy and failover, capacity on the fly) our university has made a policy decision to use VMware.

In an attempt to improve the database performance while sticking to VMWare we moved it to a RAW LUN patition (in line with SUN Microsystems' VMWare Best Practices and Performance Guide).

"Seperate your DB to another esx server..."
We initially had a clustered installation - with 1 database server (InnoDB tables), 1 file server, 2 webservers (using Round Robin for rudimentary load balancing).

In December - just before going live - Moodle started randomly losing connection with its database - so in a panic I set up this single server installation as a short-term solution. But it appears now that the problem may have been related to VMWare rather than the clustering...

Thanks everyone for all your input, this side of administration is new to me (I was initially employed to provide training to lecturers -e.g. "how to turn editing on" etc.) - so I am undergoing a rather steep learning curve.

Your advice is greatly appreciated.

Regards

Heli
In reply to heli g

Re: Large site, VMWare, Advice needed

by heli g -
Update:

Following all your input we have acquired a physical server for our database (our webserver, moodle core and moodledata are to remain on a single VMWare server).

The new database server is a Dell Power Edge R710, 16Gig RAM, Dual quad core Xeon 2.66Ghz processors, 6 x 450 GB SAS drives.
OS is SLES 11
I have just installed MySQL and moved its data directory to a separate partition.

I have still to optimize MySQL move the moodle database.

Any recommendations on how to proceed will be greatly appreciated.
In reply to heli g

Re: Large site, VMWare, Advice needed

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi

> we have acquired a physical server for our database

Definitely a good investment.

> I have still to optimize MySQL

Tuning MySQL is a topic for itself! There have been many advice in this forum in the past. Run a search.

> our webserver, moodle core and moodledata are to remain on a single VMWare server

I would still be worried about the network latency between the two. Is it a (virtual and real) Gbit network? Can you measure speed and latency?

In reply to Visvanath Ratnaweera

Re: Large site, VMWare, Advice needed

by heli g -
Thanks for the response Visvanath.

I have forwarded your question about measuring network latency to Technical Services and eagerly await their reply.

Currently 6,748 courses (external database enrolment + course creation) and over 60,000 users exist in the db (LDAP sync) - but so far only 430 of these courses are active and only 9,628 users have logged into the system.

We are busy promoting the Moodle and expect the usage to increase significantly in the course of this year.

What would your advice be given the scope of our project?
In reply to heli g

Re: Large site, VMWare, Advice needed

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
> about measuring network latency

However don't overestimate isolated performance measurements. They can identify whether something is wrong but won't tell you anything about the combined maximum load the server can take. May be this one is an exception: http://moodle.org/mod/forum/discuss.php?d=57028

Go through the excellent performance docs http://docs.moodle.org/en/Performance Note the cautious note on virtualization http://moodle.org/mod/forum/discuss.php?d=102978#p461624

> over 60,000 users exist in the db (LDAP sync)
How's the reaction time of the auth-server once many people try to login at the same time?

> What would your advice be given the scope of our project?

You won't really feel the sheer size of the installation until a some teachers start to conduct synchronous online-exams. See for example
http://moodle.org/mod/forum/discuss.php?d=149959 or http://moodle.org/mod/forum/discuss.php?d=68579 (There are much more, run a search for "quiz performance" on this forum.)
In reply to heli g

Re: Large site, VMWare, Advice needed

by Rosario Carcò -
I am sorry for the delay, but my investigations showed up one very simple calculation, after months of testing and researching my VMware ESX issue:

a) we have a cluster of 5 Servers

b) we have more than 100 virtual machines running on them

c) we have NO reservations

So our VMware Engineer lets ESX do all the necessary work to balance the load between the 100 Virtual Machines.

My nightmare experience showed that without Reservations you do not get enough CPU and maybe even not enough memory despite having set up the VM with 4 CPUs and 8GB of RAM.

And now it comes, if you reserve ONE Server for your own machine because you need it all the time, the other 99 VMs must run with the power of the 4 remaining servers.

So at the moment being I asked for a new server in january and installed everything on a 8 core 1.8 MHz Standalone Server with 8 GB RAM, built in systemdisk and 400 GB Data-Disk on our SAN.

I tested also separating Apache and mysql Servers: I left Apache on the VM and used the physical mysql server on which my production system runs. No avail, I got the same performance issues. This lead me to think, that it is not a matter of mysql running on a VM, nor Apache running on it, it must be the VM itself.

There is ONE LAST test for me to do: run the VM with a Reservation of CPU and RAM, to see whether I will get the needed power.

You will find the VMware docs on reservations easyly searching for them on the web.

I will be migrating my Moodle server again in the next two months. So my nightmare took almost one year. Be warned, if you try to use VMware ESX systems.

Rosario
Average of ratings: Useful (1)
In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
I feel like I've been warning people about this for ages. There seems to be a current obsession with VM. Unfortunately, it isn't magic and can't conjure processor cycles and memory from thin air. If you have a large, memory and processor hungry application (like a big Moodle install) you are really going to have to be careful running in a VM.

I still say that for anything bigger than a known small install or pilot you should seriously consider a standalone server. It keeps it simple.
Average of ratings: Useful (1)
In reply to Howard Miller

Re: Large site, VMWare, Advice needed

by Rosario Carcò -
Howard,

I saw your warnings but as our University had made this same decision, I had to go through it. My Installation worked fine in August 2009 and turned into a nightmare as soon as the semester started and 30 and more people started to work concurrently on Moodle.

You know the price I had to pay... and the time it took to get a new physical server...

Rosario
In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by sam marshall -
Picture of Core developers Picture of Peer reviewers Picture of Plugin developers
Our major system (the one that handles the big load) runs on real hardware - but OpenLearn runs on a few VMs and as far as I'm aware, performance is okay. (*Clicks around it* - hey, that's pretty snappy!)

We intend to use VMs for other small Moodle installations (in pairs of front-end servers on different VM hosts; not sure if the back-end database server is VM or real hardware).

I am not an expert (I only personally use virtualisation so I can run Internet Explorer 8 on my work pc which has 7 installed...) but as I understand it, you shouldn't have any problem running Moodle in a VM - given that the VM host is powerful enough and doesn't try to cram too many other things, which will all be busy at the same time, onto the same box...

In other words this sounds to me like a VM management issue, and somebody who runs the VMs not realising how much power Moodle needs (at peak), rather than an inherent problem with the concept of virtualisation.

--sam

PS Did the other 99 apps complain when moodle was killing their performance? smile
In reply to sam marshall

Re: Large site, VMWare, Advice needed

by Greg Lund-Chaix -
We use a lot of virtualization (Ganeti/KVM/Linux in our case) and have had no problems with load issues using virtual machines.

That said, we use real, physically connected disks (not virtualized). I/O bottlenecks are extremely common on many virtualization systems - especially commercial providers. We watch our provisioning very closely to make sure we don't put more VMs on a node than the hardware can support. Also, we run hundreds of small Moodle sites, not a monolithic large site - so there may be different bottlenecks for us than you may see.

Virtualization is a useful tool when properly configured and provisioned. You have to watch your node resource utilization very carefully and make sure you don't over-subscribe the hardware. As Howard said, you can't just "conjure processor cycles and memory from thin air." (Well-said, Howard, I'm going to have to remember that line!) smile

-Greg
Average of ratings: Useful (1)
In reply to sam marshall

Re: Large site, VMWare, Advice needed

by Rosario Carcò -
THIS LAST TEST with the RESERVATION is still not made. I will report back the results.

But as I said, the 100 VMs DO LACK POWER all the time. I have installed also a SUSE Linux 11 Server for a couple of Assistants who use it only to make complicated calculations. They realised a few days ago, that their desktop computer made the same calculation in a few seconds whilst it takes Minutes on the VM-Server... So they will be switching to a simple physical server too.

And this confirms again that my issue was not related to SUSE Linux-Apache-MySql-PHP but it is a matter of resources and management on the ESX-cluster.

Rosario
In reply to heli g

Re: Large site, VMWare, Advice needed

by Rosario Carcò -
>>our university has made a policy decision to use VMware.

OUR UNIVERSITY DID THE SAME MISTAKE 2007!!

And our VMS have been lacking performance since then. But nobody remarked ever because a lot of them are Windows 2003 Servers, Domain Controllers, Print Servers and the like, which are a real spread of power and resources, a whole OS for those simple tasks.

But the few web and linux VMs which do real real-time work, do lack power.

See my last post in http://moodle.org/mod/forum/discuss.php?d=125702

So even mysql on a physical server and the whole rest as you say on the VM DID NOT work for me. I definitely had to install EVERYTHING on a new physical server.

And this besides all the performance tuning questions on mysql, apache, POSTGRES which is better than mysql because it locks only records and not whole tables, etc. etc.

My issue was not an issue to tune a few milliseconds it was an issue of getting a Moodle page in 0.2 Secs on the physical server and 2.4 secs on the VM. And even worse if I use my own siteNavigation and myCourses Blocks which do render a lot of DHTML Code to simplify navigation.

Rosario
Average of ratings: Useful (1)
In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by Jon Witts -
Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Your University put their DCs on a VM??? Why would you do that?
In reply to Jon Witts

Re: Large site, VMWare, Advice needed

by Rosario Carcò -
Print-Servers and DCs for all redundancy and expanding possibilities as mentioned in this thread. But I can never guess why they did it. Personally I would have preferred a Solaris with SAMBA or simply a linux-cluster. (which also offers the possbilities to run virtual machines, just in case, I mean, but I'm not linux or virtualization expert neither)

BUT we have a set of DCs and Exchange-Servers on real physical servers. The DCs just in case the whole ESX-cluster goes mad (and we had 1 week of downtime, but not because of the EXS-cluster, but because the bottleneck failed which are the two redundant Host-Adapters to our SAN, both of which had the same firmware and hence the same bug, whatever it was, our SAN was down for a week and so did the servers, physical and VMs) and Exchanges because they do advice not to use them on VMs.

Rosario
In reply to Jon Witts

Re: Large site, VMWare, Advice needed

by sakai user -
Practice is you place all DCs in VM land except 1 physical smile

Read the DC forums both at M$ and VMware

kevin
In reply to heli g

Re: Large site, VMWare, Advice needed

by Rosario Carcò -

Dear all, last week I saw that SUSE SLES has a new separate distribution called SLES 11 SP1 for VMware:

http://www.vmware.com/products/sles-for-vmware/overview.html

I think this proves enogh that with my normal SLES 11 SP 1 distribution Moodle would never have worked as it does on my physical server.

I will give it a try, since my virtual machine is still there on our ESX cluster, and report back.

Rosario

Average of ratings: Useful (1)
In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by Rosario Carcò -

And last week I even upgraded to SP2, as this is the latest SP for both the physical and VMware version.

I still do have to repeat my load tests I made during 2009. I will report back as soon as possible.

Rosario

In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by Rosario Carcò -

But pay attention, there was a bug in PHP 5.3.8 I was not aware of and which broke file upload. Matteo Scaramuccia helped me to find the needed patches to solve this. The patches were released around April 6th, roughly one month after release of SP 2. If you have a registered copy you should get the latest upgrades and patches through yast online update. Otherwise download them manually through Novell's patch finder.

Rosario

In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by Rosario Carcò -

Unfortunately I had to abandon this VMware version. Actually I installed the latest open Suse 12 because I needed the PHP and mySQL versions to run Moodle 2.3 upwards. But I am using it only as a test and moodle-php-development server. I still have no figures about performance yet. My advice is still to use physical servers for production sites. We even noticed a performance loss when using two physical servers configured as a failover-cluster.

In reply to Rosario Carcò

Re: Large site, VMWare, Advice needed

by J S -

There always seems to be a debate between physical servers and VMWare, with VMWare typically getting the shaft in these forums.  It really depends on your setup (networking, storage, servers) and how they are configured.  In addition, its just as important as to how you've configured your software to work with Moodle.  Give me the fastest physical server you have and I can make it run as slow as a 5yr old laptop by using some bad tuning parameters.  Today its just as much about software as it is about hardware.

I do agree that you will most likely get the absolute fastest setup by using physical servers (disk, cpu, ram, network must be sufficient) if tuned appropriately.  I'm not a huge fan of VMWare but you can lose out to a lot of other benefits by ignoring some of the enhancements VMWare can get you.  What was true about VMWare and physical servers back when this thread was started is most likely not relevant today.

My advice is to use the setup that makes the most sense for your environment by weighing the pros and cons of the entire architecture and your budget.