Why are we running so slowly?

Why are we running so slowly?

by d morte -
Number of replies: 20

For the past several months we have attempted to tweak our system in order to alleviate an ongoing slowness problem. We are running Moodle 1.9.9, CentOS 5.3, PHP 5.3.2, and Apache 2.2.3 - This is on a dual 2.6GHZ Opteron with 6GB of RAM - utilizing VMWARE ... Our installation has around 5000 users, with a max concurrent online around 200. We exprience slowness when only one or two people are online, so the issue doesn't seem to be concurrent user load based.

Here is a snapshot of our system. Any information about why we are crawling would be greatly appreciated.

 


# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# ServerLimit: maximum value for MaxClients for the lifetime of the server
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule prefork.c>
StartServers       3
MinSpareServers    3
MaxSpareServers    3
ServerLimit       50
MaxClients        50
MaxRequestsPerChild  1000
</IfModule>

# worker MPM
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule worker.c>
StartServers         2
MaxClients         150
MinSpareThreads     25
MaxSpareThreads     75
ThreadsPerChild     25
MaxRequestsPerChild  0
</IfModule>


top - 12:38:30 up 13 days,  4:31,  1 user,  load average: 0.04, 0.08, 0.07
Tasks: 106 total,   1 running, 105 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.2%us,  1.6%sy,  0.0%ni, 93.6%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6106868k total,  5968476k used,   138392k free,   105400k buffers
Swap:  1048568k total,       84k used,  1048484k free,  5525600k cached


[root@Moodle6 /]# free
total       used       free     shared    buffers     cached
Mem:       6106868    5968268     138600          0     105420    5525600
-/+ buffers/cache:     337248    5769620
Swap:      1048568         84    1048484
[root@Moodle6 /]#



[root@Moodle6 /]# ps -aux
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  10348   704 ?        Ss   Sep22   0:01 init [5]
root         2  0.0  0.0      0     0 ?        S<   Sep22   0:00 [migration/0]
root         3  0.0  0.0      0     0 ?        SN   Sep22   0:00 [ksoftirqd/0]
root         4  0.0  0.0      0     0 ?        S<   Sep22   0:00 [migration/1]
root         5  0.0  0.0      0     0 ?        SN   Sep22   0:00 [ksoftirqd/1]
root         6  0.0  0.0      0     0 ?        S<   Sep22   0:09 [events/0]
root         7  0.0  0.0      0     0 ?        S<   Sep22   0:09 [events/1]
root         8  0.0  0.0      0     0 ?        S<   Sep22   0:00 [khelper]
root        65  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kthread]
root        70  0.0  0.0      0     0 ?        S<   Sep22   0:04 [kblockd/0]
root        71  0.0  0.0      0     0 ?        S<   Sep22   0:01 [kblockd/1]
root        72  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kacpid]
root       231  0.0  0.0      0     0 ?        S<   Sep22   0:00 [cqueue/0]
root       232  0.0  0.0      0     0 ?        S<   Sep22   0:00 [cqueue/1]
root       235  0.0  0.0      0     0 ?        S<   Sep22   0:00 [khubd]
root       237  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kseriod]
root       309  0.0  0.0      0     0 ?        S    Sep22   0:00 [khungtaskd]
root       312  0.0  0.0      0     0 ?        S<   Sep22   0:37 [kswapd0]
root       313  0.0  0.0      0     0 ?        S<   Sep22   0:00 [aio/0]
root       314  0.0  0.0      0     0 ?        S<   Sep22   0:00 [aio/1]
root       520  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kpsmoused]
root       567  0.0  0.0      0     0 ?        S<   Sep22   0:00 [mpt_poll_0]
root       568  0.0  0.0      0     0 ?        S<   Sep22   0:00 [mpt/0]
root       569  0.0  0.0      0     0 ?        S<   Sep22   0:00 [scsi_eh_0]
root       573  0.0  0.0      0     0 ?        S<   Sep22   0:00 [ata/0]
root       574  0.0  0.0      0     0 ?        S<   Sep22   0:00 [ata/1]
root       575  0.0  0.0      0     0 ?        S<   Sep22   0:00 [ata_aux]
root       582  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kstriped]
root       595  0.0  0.0      0     0 ?        S<   Sep22   0:00 [ksnapd]
root       619  0.0  0.0      0     0 ?        S<   Sep22   2:32 [kjournald]
root       645  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kauditd]
root       678  0.0  0.0  13292  1180 ?        S<s  Sep22   0:00 /sbin/udevd -d
root      2215  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kmpathd/0]
root      2216  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kmpathd/1]
root      2217  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kmpath_handlerd]
root      2242  0.0  0.0      0     0 ?        S<   Sep22   0:00 [kjournald]
root      2719  0.0  0.0      0     0 ?        S<   Sep22   0:07 [vmmemctl]
root      2826  0.0  0.0  56868  2880 ?        Sl   Sep22   0:22 /usr/sbin/vmtoolsd
root      3205  0.0  0.0  92884   948 ?        S<sl Sep22   0:05 auditd
root      3207  0.0  0.0  81808  1000 ?        S<sl Sep22   0:03 /sbin/audispd
root      3230  0.0  0.0   5908   668 ?        Ss   Sep22   0:03 syslogd -m 0
root      3233  0.0  0.0   3804   428 ?        Ss   Sep22   0:00 klogd -x
root      3243  0.0  0.0  10760   376 ?        Ss   Sep22   0:04 irqbalance
rpc       3254  0.0  0.0   8052   580 ?        Ss   Sep22   0:00 portmap
root      3281  0.0  0.0      0     0 ?        S<   Sep22   0:00 [rpciod/0]
root      3282  0.0  0.0      0     0 ?        S<   Sep22   0:00 [rpciod/1]
rpcuser   3289  0.0  0.0  10160   800 ?        Ss   Sep22   0:00 rpc.statd
root      3313  0.0  0.0  55180   772 ?        Ss   Sep22   0:00 rpc.idmapd
dbus      3328  0.0  0.0  21500  1272 ?        Ss   Sep22   0:02 dbus-daemon --system
root      3337  0.0  0.0  10432   788 ?        Ss   Sep22   0:00 /usr/sbin/hcid
root      3341  0.0  0.0   5936   552 ?        Ss   Sep22   0:00 /usr/sbin/sdpd
root      3365  0.0  0.0      0     0 ?        S<   Sep22   0:00 [krfcommd]
root      3403  0.0  0.0  31284  1340 ?        Ssl  Sep22   0:02 pcscd
root      3413  0.0  0.0   3800   580 ?        Ss   Sep22   0:00 /usr/sbin/acpid
68        3422  0.0  0.0  31380  4312 ?        Ss   Sep22   0:03 hald
root      3423  0.0  0.0  21692  1056 ?        S    Sep22   0:00 hald-runner
68        3432  0.0  0.0  12324   840 ?        S    Sep22   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68        3437  0.0  0.0  12324   836 ?        S    Sep22   0:00 hald-addon-keyboard: listening on /dev/input/event0
root      3446  0.0  0.0  10228   676 ?        S    Sep22   0:18 hald-addon-storage: polling /dev/hdc
root      3463  0.0  0.0   8516   488 ?        Ss   Sep22   0:00 /usr/bin/hidd --server
root      3506  0.0  0.0  26324   536 ?        Ss   Sep22   0:00 ./hpiod
root      3511  0.0  0.1 155164  6716 ?        S    Sep22   0:02 python ./hpssd.py
root      3547  0.0  0.0  21644   884 ?        Ss   Sep22   0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
root      3619  0.0  0.0  71104  2352 ?        Ss   Sep22   0:00 sendmail: accepting connections
smmsp     3627  0.0  0.0  57692  1776 ?        Ss   Sep22   0:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root      3637  0.0  0.0   6452   376 ?        Ss   Sep22   0:01 gpm -m /dev/input/mice -t exps2
root      3656  0.0  0.0  74844  1156 ?        Ss   Sep22   0:03 crond
xfs       3685  0.0  0.0  20964  1764 ?        Ss   Sep22   0:00 xfs -droppriv -daemon
root      3825  0.0  0.0  19624   468 ?        Ss   Sep22   0:00 /usr/sbin/atd
avahi     3852  0.0  0.0  23280  1280 ?        Ss   Sep22   0:00 avahi-daemon: running [Moodle6.local]
avahi     3853  0.0  0.0  23148   340 ?        Ss   Sep22   0:00 avahi-daemon: chroot helper
root      3915  0.0  0.0  18416   476 ?        S    Sep22   0:00 /usr/sbin/smartd -q never
root      3920  0.0  0.0   3792   480 tty1     Ss+  Sep22   0:00 /sbin/mingetty tty1
root      3921  0.0  0.0   3792   484 tty2     Ss+  Sep22   0:00 /sbin/mingetty tty2
root      3922  0.0  0.0   3792   480 tty3     Ss+  Sep22   0:00 /sbin/mingetty tty3
root      3923  0.0  0.0   3792   484 tty4     Ss+  Sep22   0:00 /sbin/mingetty tty4
root      3926  0.0  0.0   3792   484 tty5     Ss+  Sep22   0:00 /sbin/mingetty tty5
root      3933  0.0  0.0   3792   480 tty6     Ss+  Sep22   0:00 /sbin/mingetty tty6
root      3934  0.0  0.0 167640  2584 ?        Ss   Sep22   0:00 /usr/sbin/gdm-binary -nodaemon
root      4019  0.0  0.0 194752  2344 ?        S    Sep22   0:00 /usr/sbin/gdm-binary -nodaemon
root      4021  0.0  0.0 189876  4100 ?        Sl   Sep22   0:01 /usr/libexec/gdm-rh-security-token-helper
root      4024  0.0  0.0  83252  5548 tty7     Ss+  Sep22   0:04 /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
gdm       4045  0.0  0.2 221684 17040 ?        Ss   Sep22   0:03 /usr/libexec/gdmgreeter
root      4048  0.0  0.2 257596 16332 ?        SN   Sep22   0:07 /usr/bin/python -tt /usr/sbin/yum-updatesd
root      4050  0.0  0.0  12916  1164 ?        SN   Sep22   0:03 /usr/libexec/gam_server
root      8722  0.0  0.0      0     0 ?        S    Oct04   0:00 [pdflush]
root      8723  0.0  0.0      0     0 ?        S    Oct04   0:00 [pdflush]
root     10621  0.0  0.0 133440  2748 ?        Ss   Sep30   0:00 cupsd
root     10821  0.0  0.0  54396  1512 ?        Ssl  Sep30   0:07 automount
root     10854  0.0  0.0  62624  1216 ?        Ss   Sep30   0:00 /usr/sbin/sshd
root     19712  0.0  0.0  90124  3388 ?        Ss   12:29   0:00 sshd: root@pts/1
root     19715  0.0  0.0  66208  1648 pts/1    Ss   12:29   0:00 -bash
apache   19869  0.9  0.8 378472 49472 ?        S    12:34   0:02 /usr/sbin/httpd
apache   19881  0.9  0.4 353916 29692 ?        S    12:35   0:02 /usr/sbin/httpd
apache   19925  1.6  0.4 349772 26720 ?        S    12:37   0:01 /usr/sbin/httpd
apache   19976  0.5  0.4 347752 24616 ?        S    12:38   0:00 /usr/sbin/httpd
root     19991  0.0  0.0  65624   980 pts/1    R+   12:39   0:00 ps -aux
root     26346  0.0  0.0 135604  2712 ?        Ss   Oct04   0:00 smbd -D
root     26349  0.0  0.0 107764  1488 ?        Ss   Oct04   0:04 nmbd -D
root     26351  0.0  0.0 135604  1404 ?        S    Oct04   0:00 smbd -D
root     26371  0.0  0.2 273132 13048 ?        Ss   Oct04   0:02 /usr/sbin/httpd
postgres 26472  0.0  0.0 122072  4236 ?        S    Oct04   0:14 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 26478  0.0  0.0 109900   760 ?        S    Oct04   0:00 postgres: logger process
postgres 26480  0.0  0.1 122200  9928 ?        S    Oct04   0:00 postgres: writer process
postgres 26481  0.0  0.0 110900  1716 ?        S    Oct04   0:01 postgres: stats buffer process
postgres 26482  0.0  0.0 110108   896 ?        S    Oct04   0:01 postgres: stats collector process
[root@Moodle6 /]#

Average of ratings: -
In reply to d morte

Re: Why are we running so slowly?

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
> We are running Moodle 1.9.9, CentOS 5.3, PHP 5.3.2, and Apache 2.2.3

This is the virtual machine, right?

> This is on a dual 2.6GHZ Opteron with 6GB of RAM - utilizing VMWARE ...

That must be the host. How much CPU and RAM did you assign for the VM? What is the host OS? Any other VMs and/or services running there?

> Our installation has around 5000 users, with a max concurrent online around 200. We exprience slowness when only one or two people are online,

That is very low. Was it like that from the beginning or did it creep in later?
In reply to d morte

Odp: Why are we running so slowly?

by Bartosz Cisek -

Do you have some monitoring system like Cacti or Munin? Graph is worth 1000 words (:

In reply to Bartosz Cisek

Re: Odp: Why are we running so slowly?

by Michael Lowery -

To start with, have you followed all the suggestions on Performance?

Yes we have, we are currently running APC and have followed many of the other performance details outlined in this section.

 

 

 

This is the virtual machine, right?

> This is on a dual 2.6GHZ Opteron with 6GB of RAM - utilizing VMWARE ...

That must be the host. How much CPU and RAM did you assign for the VM? What is the host OS? Any other VMs and/or services running there?

> Our installation has around 5000 users, with a max concurrent online around 200. We exprience slowness when only one or two people are online,

That is very low. Was it like that from the beginning or did it creep in later?Show parent | Reply - Rate...Useful
Odp: Why are we running so slowly? by Bartosz Cisek - Wednesday, October 6, 2010, 05:47 AM

Yes this is a Virtual Machine, we have dedicated 2 cores and 6GB of ram to this machine.  the server has 48GB total on it.

No VM services are running on the Centos install. there are anywhere from 4-8 other VMs running on this host as well. we have a cluster of 4 like machines. running Vshpere 4.1

Has been slow from the begining. but have been experiencing it more so since it has been being used more.

 

 

Do you have some monitoring system like Cacti or Munin? Graph is worth 1000 words (:

not currently running a monitoring software. what do you recommend and what stats would you like to see?

 

 

 

In reply to Michael Lowery

Re: Odp: Why are we running so slowly?

by d morte -

We have now installed Munin and the only alarm that is sounding to me is the memory, but I'm really not sure what is normal to know what to look for... I have attached the memory by day graph, but I can post any graph that would be helpful.

In reply to d morte

Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

Hi,

This graph comes from virtual machine?

Please post also CPU (not load), IO stat, apache request rate and postgresql graphs. On a first look, as you mentioned, memory usage is too low. Maybe your database has too smal buffers and is reading everything from disc. That would show up on IO stat graph.

In reply to Bartosz Cisek

Re: Odp: Re: Odp: Why are we running so slowly?

by Michael Lowery -

A copy of our stats can be found at http://moodle.iu5.org/Stats/web/monitoring/localhost/localhost.html

 

if there are any other stats other then these you need to see please let us know.

In reply to Michael Lowery

Re: Odp: Re: Odp: Why are we running so slowly?

by Michael Lowery -

Also here is the postgresql conf file incase there is something incorrect with that

In reply to Michael Lowery

Re: Odp: Re: Odp: Why are we running so slowly?

by d morte -

Thank you Michael for posting that information. I'll add that over the weekend we experienced a new problem that may or may not be connected. The image below shows the error we received - twice. The first time happened on Friday night, but somehow resolved itself over night before we could restart in the morning. The second time happened on Sunday night, and also was resolved before this morning. Any thoughts?

In reply to d morte

Re: Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

@Drew

I'll ask again about graphs? Single error message says nothing about root of problem. Did you look into apache logs?

It looks like your databases was overloaded and didn't accept new connections.

In reply to Michael Lowery

Re: Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

@Michael

Graphs show that you had a lot of networ traffic, that coused lots of load, and saturated disk IO (iostat and iowait -- pink graph on CPU usage).

It's hard to day what has caused that situation, if it was apache that was serving heavy files, or database that read a lot from disk.

For now we know that there is a IO bottleneck problem. Do you have whole instalation on this server or is it only PostgreSQL running?

Try to set up grapths for Apache and PosrgreSQL like http://muninpgplugins.projects.postgresql.org/#plugins or http://exchange.munin-monitoring.org/plugins/search?keyword=postgresql

It seems that postgresql.conf needs some tuning but it depends on if you use this machine only for database or whole Moodle instalation.

In reply to Bartosz Cisek

Re: Odp: Re: Odp: Why are we running so slowly?

by Michael Lowery -

Bartosz,

Everything installed on this server so network traffic should be low.  running both the Postgresql server and the Apache for moodle. i will get the plugins installed so we can can see some graphs on those peices.

 

 

 

In reply to Michael Lowery

Re: Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

How are things going? I can't see graphs you wrote about.

In reply to Bartosz Cisek

Re: Odp: Re: Odp: Why are we running so slowly?

by d morte -

Barotsz,

Things are not going well. We are having almost daily meltdowns. We are looking at starting over with fresh servers, and I'll be posting shortly to that end. In our latest meltdown we were able to see five open processes with Postgres that were staying open for 40+ minutes and were each using 25%-45% of our processors... Read: maxing out two dual core processors. It caused everything on that virtual server to come to a crawl. Eventually those open connections to Postgres crashed the DB and it restarted itself. No problems similar since then.

In reply to d morte

Re: Odp: Re: Odp: Why are we running so slowly?

by Justin Reeve -

We have one separate virtual server that just runs MySQL, for all our web sites. Moodle itself is on another VM. Does Postgres have anything like memcached that might work for this?

In reply to Justin Reeve

Re: Odp: Re: Odp: Why are we running so slowly?

by d morte -

It happened again just about two hours ago: We had a single Apache process that was open and consuming 100% of our CPU - and was apparently open for 13 hours. It crippled our virtual server and so we had to kill the process manually. We have no idea where these random processes are coming from that are staying open for extended periods of time and using massive amounts of CPU.

 

@Justin - I don't know about the Postgres memcached question. I'll look into that though.

In reply to d morte

Re: Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

@drew

Hi,

Throwing more hardware on the problem is not always best solution. Main problem is that you don't know where your problem really is.

Graphs, graphs and againg graphs. You can't fix things knowing so little about what your system is really doing. Do youself a favour and install munin with plugins I mentioned earlier.

In reply to Bartosz Cisek

Re: Odp: Re: Odp: Why are we running so slowly?

by d morte -

Bartosz,

I've started a dozen responses and have deleted all for various reasons. You are absolutely correct that throwing more hardware is not always the best solution. You are also correct that we don't know exactly what is causing our problem. Right now we are attempting to get to a point with our hardware and software that fits a standard model to allow ourselves to get assistance from a wide range of sources.

As I type this we are installing our new server that has more resources than our entire clustered virtual farm combined. We will be moving Moodle to this server for one main reason: We can keep adding resources (RAM and Processors) for as much as is needed. We were quite limited in our previous environment.

Secondly, we have decided to move to Red Hat Enterprise. A partial reason for this is because there is training available for Red Hat and there is potential for support contracts. We are generally a Windows environment, and so the transition to Linux will take some time. Also, we have spoken with a few institutions that run large installation of Moodle on Red Hat servers. We were torn between Red Hat and Ubuntu, but because we had already started down the road with Centos, Red Hat should provide a stable transition.

Thirdly, we have decided to move to MySQL. While we have heard from many people that Postgres is fine and works well, the fact that there are a very limited number of Postgres users out there really makes life difficult when it comes to tweaking. Furthermore, where we lack in experience with Postgres/MySQL, we can gain training and support for MySQL a bit easier.

Lastly, we have decided to move the database to its own server and thus attempt to balance our load a bit. We are looking into using a load balancer for multiple webservers and are planning that into various scenarios.

We will absolutely be installing Munin to analyze those graphs, but I think that we will be on a bit of better footing once we can come back to the community and say that we have a system that is fairly common and has plenty of resources to allocate.

In reply to d morte

Re: Odp: Re: Odp: Why are we running so slowly?

by Ellen Walker -

What happened when you got your new server up?

We are having similar problems, only with over 50,000 users.

In reply to Ellen Walker

Re: Odp: Re: Odp: Why are we running so slowly?

by Bartosz Cisek -

Hi Ellen,

Please provide some more details about your setup.