Preliminery observations for 100 concurrency test

Preliminery observations for 100 concurrency test

by rupesh patil -
Number of replies: 24
Following were the observation after running a 100 concurrency test.

Test Conditions:

100 users concurrency
3 sec/user ramp up
10 sec delay
30 min duration (run)
Thinktime: 80-120% of recorded

Result:

Just after Ramp up the following error is coming up(image Attached)(database erro)
screenshot error.JPG
The Response times are very high in the range of 30s and above.

CPU utilization of APP server always 95% during test run

From Browser , application becomes completely inaccessible while test run.


so what should do to improve performance .


server configuration of the servers


Application server
Intel Server Compaq prolaint DL 360 2.2GB RAM, 72GB HDD, 2*1.33 GHz CPU


Databcse Server
Intel Server Compaq prolaint DL 360 2 GB RAM, 36GB HDD, 2*1.33 GHz CPU

Attachment screenshot_20error.jpg
Average of ratings: -
In reply to rupesh patil

Re: Preliminery observations for 100 concurrency test

by Justin Haaga -
have you done any fine tuning to apache and mysql? That is the first step. Plus what are your hadware specs?

Memory is going to be a key component on the web server to process those concurrent connections. Sounds like you have a seperate db and web server so that should help.
In reply to Justin Haaga

Re: Preliminery observations for 100 concurrency test

by rupesh patil -
we are using postgresql database .
we have done fine tuning of apache and postgresql but its not working .

In reply to rupesh patil

Re: Preliminery observations for 100 concurrency test

by Don Hinkelman -
Picture of Particularly helpful Moodlers Picture of Plugin developers
Hi Rupesh,

You might check out our report two years ago with 300 simultaneous users. It is with Moodle 1.7 but we have continued the same set up with Moodle 1.8 and 1.9 with nearly similar results. See: http://moodle.org/mod/forum/discuss.php?d=68579

Don
Average of ratings: Useful (1)
In reply to Don Hinkelman

Re: Preliminery observations for 100 concurrency test

by rupesh patil -
What changes that i have to do to increase the performance.
i have attached the postgresql configuration file and also apache configuration file.


the server configurations having apache is :


Application server
Intel Server Compaq prolaint DL 360 2.2GB RAM, 72GB HDD, 2*1.33 GHz CPU


and for the database server having postgresql DB is :


Databcse Server
Intel Server Compaq prolaint DL 360 2 GB RAM, 36GB HDD, 2*1.33 GHz CPU

as in above we have 2 gb RAM for application as well as for the database server.
so, plz tell me the required changes in the configuration file as soon as possible for increasing the performance and to reduce the load of system.


In reply to rupesh patil

Re: Preliminery observations for 100 concurrency test

by Justin Haaga -
turn off "use persistent connections" in config.php. Should help with load, otherwise up the max connection settings in your postgresql cfg file. You simply have to many connections open to the DB and it's running out. At least from that screenshot pic.

Also, if for your app server you may want to try lighttpd. It will handle a influx of high requests a lot better than apache. Check out my post in the performance forum.
In reply to Justin Haaga

Re: Preliminery observations for 100 concurrency test

by rupesh patil -
I have done this setting already but then also problem is same.

My CPU utilization is also very high ,i have run the ab command .

ab -n 1000 -c 100 http://172.17.66.33/moodle/mod/quiz/attempt.php?q=262

and then result adopted by " top " command is:
[rupesh@app admin]$ top
top - 15:59:02 up 8 days, 19:14, 2 users, load average: 24.98, 29.90, 27.69
Tasks: 131 total, 38 running, 92 sleeping, 0 stopped, 1 zombie
Cpu(s): 96.1%us, 3.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 1924096k total, 1017008k used, 907088k free, 79396k buffers
Swap: 4390904k total, 0k used, 4390904k free, 396916k cached
In reply to rupesh patil

Re: Preliminery observations for 100 concurrency test

by Justin Haaga -
based on that screenshot it's a DB bottleneck. Even though your apache server has high utilization, if apache was saturated you would just get a blank page or it wouldn't load at all (just hang).

Also, based on your stats I don't think that is abnormal. 100 concurrent connections on 1 dual processor-seems about right. You can look into lighttpd which will help reduce the cpu and memory load of your web server which in turn will give you a little bigger boost to performance.

Go back and look at your DB settings. I can help with tweaking mysql cfgs but i think you said you were running postsql.
Average of ratings: Useful (2)
In reply to Justin Haaga

Re: Preliminery observations for 100 concurrency test

by rupesh patil -
yes we are using postgresql and the configuration setting for postgresql is also attached in above post.



After hitting this ab command if we try to connect the site from browser then it gives the error database connection failed


plz tell me appropriate setting in postgresql conf if u can.
In reply to rupesh patil

Re: Preliminery observations for 100 concurrency test

by Justin Haaga -
I don't use postgresql so can't help. I use mysql.
In reply to Justin Haaga

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
My production server started to show the same symptoms two days ago, and I use mysql.

When I do queries in the mysql command line, everything works fine (no apache).

As soon as I start accessing by Moodle, i.e. starting php-scripts over apache, mysqld jumps to incredible CPU utilization in the range between 100 and 300%

Response times are bad even with only me working as admin.

I already optimized and repaired the tables more than once, even if phpMyAdmin reported them to be ok.

If I access by phpMyAdmin to make queries it works fine also. Only when Moodles php scripts are fired up everything gets stalled.

Any Ideas are very appreciated. Rosario


In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Greg Lund-Chaix -
What is the output of a vmstat and iostat when the server is under load? That should give some indication of where the problem may lie.

Also, are you running mysql on the same machine as the Apache server? Are you running an opcode cache like APC?

Apache can be a huge memory hog, especially when it gets hit with a lot of concurrent sessions. It could very well be that you're out of RAM and swapping like mad.

-Greg
Average of ratings: Useful (1)
In reply to Greg Lund-Chaix

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
Unfortunately nothing of all that: no cache, no php-Accelerators, only bare SUSE Linux 11 with it's Apache, mysql and php Versions, no other software or servers running, 6GB Memory, 4 CPUs, etc.

The Server has been running well for 2.5 years.

There is only ONE change: I ported everything from a physical machine to a VMWare ESX virtual machine and installed the newest SUSE 11 whereas on the physical machine I was using SUSE 10 SP 1 with it's versions of apache, php, mysql.

I did not remark the high CPU utilization of mysqld during my tests. They arised roughly 1 week after I put this server into production.

So maybe I have to research possible problems or settings for best performance on VMWare virtual machines of apache, php and mysql.

Again, any hints are very apreciated, as my moodle is nearly down.

==
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 904 1030984 187708 3767672 0 0 112 128 74 19 24 26 49 1 0
===

Are there linux commands to show better what the mysqld is really doing?

Rosario
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Greg Lund-Chaix -
Well, it's good to see you're likely not swapping. However, two things jump right out at me:

1) No cache/PHP accelerator - install APC. Now. It has exactly zero side effects beyond using a bit of RAM (of which you seem to have plenty) and will dramatically reduce CPU utilization. Right now your server has to compile every single line of PHP on every single connection before it can deliver content to the users. That's expensive and unnecessary. APC will keep cached pre-compiled copies of your PHP in RAM. Load the APC PHP extension, allocate a couple hundred megs of RAM to it and watch your CPU utilization drop.

2) "ONE change" - switching from a physical machine to a virtual machine? YIKES! That's a *HUGE* change and is rife with opportunities for problems. It adds a whole other level of complexity to the issue and brings up all sorts of possible resource contention and conflict.

Some things to look at:

* Is all of the server's 6GB of RAM available to the VM?
* Is the VM set to be able to use multiple processors?
* Are you running mysql and apache both on the same VM? We've seen very poor performance running mysql in virtualized environments (on Xen and KVM) and generally try to avoid it. Generally, on all but the smallest of sites, our experience has been that it's best to have the database running on its own separate (physical) hardware. VMs are quite good for Apache front end nodes, but I'd stay away from running your database in one.
* Check the slow queries log (usually located in /var/lib/mysql if it's enabled in the mysql config). There may be some indications of possible problems there.
* Since you seem to have quite a bit of RAM on the system, you may want to look at firing up memcached.
Average of ratings: Useful (3)
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by François Marier -
Virtualisation does add quite a bit of overhead so you should expect reduced performance after that switch.

One of the main problems I've seen with some of our clients' use of virtualisation is atrocious disk performance. There are options to improve this in vmware (using direct disks or whatever it's called) but they are not always enabled.

You can quickly compare the performance of your virtualised disks with the ones from say a desktop computer by running:

hdparm -tT /dev/sda

(replacing /dev/sda with whatever the physical device is)

Of course, make sure that all of your partitions are mounted "noatime" to remove unnecessary writes on every read.

Francois
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
Thanks to both, here my answers:

>>Is all of the server's 6GB of RAM available to the VM?

YES, we just raised to 8GB, with no success

>>Is the VM set to be able to use multiple processors?

YES, we rose from 2 to 4 two days ago, with no success. Today we set them to HIGH priority and allocated them statically (I do not know the correct term), but the 4 Processors should be always available to the machine now, not only when the machine requests more CPU.

>>Are you running mysql and apache both on the same VM?

YES, and my last tests showed what you are saying: as long that I am alone, either in phpMyAdmin or on the mysql command line prompt, the mysqld works correctly at 1% of CPU even if I fire up a complicated query on mdl_course table in 10 or more ssh shells.

If I put Moodle in maintenance mode and only a few Admins log in, the CPU consumption of mysqld raises at 100 to 350% and response times get already bad.

And if I let go and 30 or more Students access, most often they receive the Database-Connection failed error.

>>We've seen very poor performance running mysql in virtualized environments (on Xen and KVM) and generally try to avoid it.

I did a serious research on Novell SUSE and VMWare to find out if there were known problems speaking against such a move. But there are not! The SUSE 11 is even certified to be the best SUSE OS to work ontop of a VMWare ESX machine. Incredible!

I am on the point to migrate back all data and databases on to the physical server, as our Semester has just begun and people want to work.

There is only one thing I fear: if for any reason phpMyAdmin did not tell the truth about the health of the databases, if they are sort of corrupt, even if they seem to be ok - and what about the indexes? - then I would have two damaged systems.

So do you know whether a dump - even of a corrupt database - will do any harm on my old system when I import the dump? Or are the mysql statements along with the data guarantee that the data imported will be ok??

>>but I'd stay away from running your database in one.

Many thanks for your wise advice.

>>Check the slow queries log

I checked it with phpMyAdmin's runtime info and they seemed to be not too baad to me. Actually 29 with only Admins working and maybe quite more (about 300) with 40 People this morning. But Moodle is the same version as before and 40 People is no load at all, so the slow queries should stay the same as before, unless I do not understand correctly this type of queries...

>>Since you seem to have quite a bit of RAM on the system, you may want to look at firing up memcached.

As there is enough Memory for apache AND php AND mysql, I would simply want it to work as it did before on my physical server. Caching and optimizing would come after the system works correctly and not to correct sort of a bug or misconfiguration of the php, apache or mysql parameters.

I wonder if there is an apache configuration parameter which could help to solve this CPU Problem between apache, php and mysql?

>>disks

We have SAN-Disks attached and allocated to this virtual machine. Maybe we could tweak it a little bit, but again, we have to resolve a major problem first. Any other server out of 70 uses the same disks and even if they are slower than mysqld I do not think this could be the reason of our high CPU utilization accompanied by shut outs of the users.

One disk for OS, one disk for swap and one for Moodles php and data.

>>hdparm -tT /dev/sda - noatime

Thanks, I will check both.

Thanks for all your help, Rosario (facing a long night of work...)
Average of ratings: Useful (1)
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
Dear all, I continue to document my nightmare here, in case others fall in a similar scenario. I will report back also whether our second attempt to migrate to a VM succeeds or not. For the moment, here is what happened in the meantime:

On Friday sept 18th, as the system was of no use on the VM, I had to make a choice:

a) making productive the old physical server as it was on August 29th, and reimport the courses modified and created in the meanwhile (over 300!) on the VM

b) making a dump and porting back data and dump to the physical server

Before taking the decision I made a dump and imported it on the PC where I had tested SLES 11 and Moodle before porting it to the VM.

On the PC the same symptoms arised, so it was not quite clear whether this proved that there was sort of a bug in apache-php-mysql on SLES 11 or whether the DBs were corrupt, even if CHECK, OPTIMIZE and REPAIR TABLEs reported everything ok.

It was not even clear whether I migrated a potentially corrupt database to the VM.

The only clear thing was, that on the physical server everything worked fine until I migrated to the VM.

As we were under stress already because teachers and students could not work during two weeks, we decided to migrate back everything.

I backed up everything I had on the physical server onto an external HD and migrated back.

The SURPRISE unfortunately came some days after: the physical server began to show also high CPU consumption by mysqld and - again - the users were confronted with the Database Connection Error

You can imagine my desperation. The only consolation was, that the physical server recovered more quickly from those peeks and worked for a longer period, 2 or 3 days until mysqld stalled so badly that it had to be restarted.

The night before I left for vacations (Friday Sept 25th) I realized that the mdl_log table contained 1.4 million of entries, as I had set the option to keep the logs for 1 year. So I emptied this table with phpMyAdmin and lowered the setting to 6 months.

Immediately mysqld seemed to behave more healthy.

During my vacations the service continued to stall and a collegue of mine wrote a batch-script to monitor and restart the mysqld (he did not know that I aready had such scripts and that I had removed them from the crontab to avoid further interferences).

After my vacations, on Thursday (Oct 15th) I emptied again the mdl_log table which had rughly 150'000 entries. Someone reported an error when accessing the statistics of a moodle user. So I disabled course backups and statistics which are run as moodle internal cron jobs and emptied also the mdl_backup_log table.

Until now we experience 4 days of healthy behaviour.

The question now is, whether all the tables are in a healthy state again or not. During the past 4 days I clicked on rughly 3 to 4 hundred courses while observing the mysqld cpu-usage, to see whether there are certain courses causing a problem. But of course I was not able to dive deep into each course to check also forums, chats, lessons, and so on.

And whether I can migrate again to the VM on this sort of stable basis after having enabled and tested backups and statistics.

Rosario
Average of ratings: Useful (1)
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
Our physical server is still running with a stable mysqld. But we observed, that with every moodle's internal course backup cycle Memory Load is rising from 1.8% to actual 2.7%

If I am not mistaken, this means that mysqld is not releasing memory allocated during course backups.

I have to search the forums on that issue still.

Rosario
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
Since end of october the physical server is stable again. To reduce total CPU-time of mysqld I switched off statistics and reduced Course-Backups to Mo, We and Fr.

The load tests on the VM still yield amazing results. With Server->Debugging-> Performance info (perfdebug) on and the setting

define('MDL_PERFDB', true); //to show number of queries per page in the performance footer

in the config.php I get access times of

0.421631 secs
RAM: 13.9MB
RAM peak: 14MB
Included 80 files
DB queries 48
Log writes 1
Load average: 0.08
Record cache hit/miss ratio : 0/1

if NOT under load. If I put the server under load with a simple bash script looping over 1800 courses in a loop calling moodle pages with wget, I get horrifying

2.447763 secs
RAM: 13.7MB
RAM peak: 13.8MB
Included 79 files
DB queries 47
Log writes 1
Load average: 6.59
Record cache hit/miss ratio : 0/1

So I still do not understand where the SUSE SLES OS 11-Apache-PHP-mySQL combination does stall.

Any hints are still appreciated. I have to decide now, whether we will migrate to a new physical server or whether to test e.g. with Red-Hat or the latest OPEN SUSE.

Rosario
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -
I am sorry for the delay, but my investigations showed up one very simple calculation, after months of testing and researching my VMware ESX issue:

a) we have a cluster of 5 Servers

b) we have more than 100 virtual machines running on them

c) we have NO reservations

So our VMware Engineer lets ESX do all the necessary work to balance the load between the 100 Virtual Machines.

My nightmare experience showed that without Reservations you do not get enough CPU and maybe even not enough memory despite having set up the VM with 4 CPUs and 8GB of RAM.

And now it comes, if you reserve ONE Server for your own machine because you need it all the time, the other 99 VMs must run with the power of the 4 remaining servers.

So at the moment being I asked for a new server in january and installed everything on a 8 core 1.8 MHz Standalone Server with 8 GB RAM, built in systemdisk and 400 GB Data-Disk on our SAN.

I tested also separating Apache and mysql Servers: I left Apache on the VM and used the physical mysql server on which my production system runs. No avail, I got the same performance issues. This lead me to think, that it is not a matter of mysql running on a VM, nor Apache running on it, it must be the VM itself.

As we have several RedHat Servers also, I simply modifyed my bash-script to call some pages of those Web-Servers, AND YES, under load they do not respond either.

So the problem is really ESX- related. A third party VMware expert came to analyse our performance problems and he simply told that our 3 year old hardware was already out of date, mainly because of the switch between 32 and 64 Bit OS-ses we run concurrently in our VMs. So new hardware would yield better performance he says. But my simple calculation is still valid: even with 10 servers of the newest generation, 100 VMs would only get 10/100 of the power. And maybe a little more if 90 VMs run in idle mode.

There is ONE LAST test for me to do: run the VM with a Reservation of CPU and RAM, to see whether I will get the needed power.

And if we get the needed power by a reservation, the next question is how much power the other VMs are going to loose. And finally you have to decide between power and all the redundancy ESX can offer.

You will find the VMware docs on reservations easyly searching for them on the web.

I will be migrating my Moodle server again in the next two months. So my nightmare took almost one year. Be warned, if you try to use VMware ESX systems. Have a look also at this thread with similar issues: http://moodle.org/mod/forum/discuss.php?d=146521

As the first thing Moodle tries to do is to write into the log_table, when something goes wrong because of CPU-hiccup behaviour, not getting the power when mysql would ask for it, we get those insert into Moodle log errors.

Rosario
In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by sakai user -
Rosario - Are you still wrestling with moodle/apache, and mysql in a VM env? Or are you good with your setup?

Just curious ..

our moodle => entire vm based (db servers / app servers / balancers)... ESX 3.5 (moving to ESX4 soon)


kevin
In reply to sakai user

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -

As announced, I migrated from the old physical server to a new physical server on August 4th 2010:

CPU 1 Quad-Core Intel Xeon, 1866 MHz 
CPU 2 Quad-Core Intel Xeon, 1866 MHz 
Memory 8192 MB

And since August 2010 we had the following increments:

users: 9'400 -> 12'774

courses: 2'663 -> 4'030

moodle-data: 143 GB -> 316 GB

I had even to change the SAN-Disk in January because of that terrible increment. Just for comparison: we started in 2007 with some 500 courses I migrated from a previous Moodle 1.5 server.

The actual physical server has all the needed power and free resources for further increments.

Rosario

In reply to sakai user

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -

Starting this year we moved to ESX 4.5 too. And I migrated my previous Moodle VM onto it. I did not have the time to repeat my load-tests and the mentioned RESERVATION of resources, but I already noticed that the nightly backups finish in time and do not overlap each other. I guess that 4.5 is already dealing smarter with resources. But for the moment being, I am content with the physical server.

Nevertheless I have been asked to continue the VM-experience and so I will report back as soon as possible.

Rosario

In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -

Dear all, last week I saw that SUSE SLES has a new separate distribution called SLES 11 SP1 for VMware:

http://www.vmware.com/products/sles-for-vmware/overview.html

I think this proves enogh that with my normal SLES 11 SP 1 distribution Moodle would never have worked as it does on my physical server.

I will give it a try, since my virtual machine is still there on our ESX cluster, and report back.

Rosario

In reply to Rosario Carcò

Re: Preliminery observations for 100 concurrency test

by Rosario Carcò -

And last week I even upgraded to SP2, as this is the latest SP for both the physical and VMware version.

I still do have to repeat my load tests I made during 2009. I will report back as soon as possible.

Please see all future reports in this thread:

http://moodle.org/mod/forum/discuss.php?d=146521

Rosario