Response time issues - strace analysis help needed

Response time issues - strace analysis help needed

by Chris Martel -
Number of replies: 7

I posted this in the General Problems forum and am re-posting here.

Our environment has been having sporadic issues where pages are taking ten  seconds to several minutes to load at random intervals throughout the day.  If usage is particularly heavy, requests pile up, the DB max connections pool is exceeded, and users begin receiving error messages.

I have run strace on httpd processes during these outages and I am attaching the output of five processes that took between 30 seconds and three minutes to complete a page request.  My analysis of these indicates that the "log jam" is occurring when Moodle is trying to access files in the moodledata folder, which is stored on a separate server and mounted via NFS.  In each of these 5 cases, the file/folder being accessed is /mnt/moodledata/temp/typo3temp/cs, which is being requested by /lib/typo3/class.t3lib_div.php.

A few questions- first of all is that hypothesis correct?  I am basing this on the timestamps in the traces before and after the request for /mnt/moodledata/temp/typo3temp/cs is made.  Second, could this particular file/library (typo3) be the cause of this problem, or is it more likely to be a server/network/NFS issue?  If it is not a Moodle issue, what are some linux tools available that could help pinpoint the problem further, or other steps that may help mitigate or better manage this problem? Restarting apache does resolve it temporarily, but that is not a viable long term solution.

Thanks,

Chris

Average of ratings: -
In reply to Chris Martel

Re: Response time issues - strace analysis help needed

by Taylor Judd -

Have you looked at sessiosn? A while back when we started out with Moodle many years ago we had an xraid handling NFS. We were also storing sessions on this system. As we grew NFS on that xraid was not able to handle the session locking from multiple applicaiton servers. It looks like you might have a similar issue though I admit to not knowing how to read traces that well.

"14147 open("/mnt/moodledata/sessions/sess_6rg2c821k7u6a77orcfc7n33h2", O_RDWR|O_CREAT, 0600 <unfinished ...>14147 <... open resumed> )              = 13"

To solve our problem we moved sessions to the database. This dramatically increased the load on our database by nearly 3x. But we had a lot of RAM on the database server which works well for sessions. I bring this up as an option to consider, but if you don't have the database horsepower to handle this I do not reccomend it. You might also consider memcached or another tool for session mangment. In anycase taking sessions off of NFS may decrease the load to the point that NFS doesn't get the log jam.

Average of ratings: Useful (1)
In reply to Taylor Judd

Re: Response time issues - strace analysis help needed

by Chris Martel -

Thanks, that is definitely something worth considering.  It is worth noting though, that the issues we are having do not seem to be associated with periods of heavy traffic, so I don't think any server is being overloaded.

We are running on a VMWare environment though, so it's possible that other virtual machines on the cluster are experiencing heavy usage at these times.

In reply to Chris Martel

Re: Response time issues - strace analysis help needed

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
There are two places I would dig. The first one, session files in a NFS-volume, is already brought up by Taylor.

The second one is VMware. There were many discussions in this forum where the database behaved poorly in VMware environments, http://moodle.org/mod/forum/discuss.php?d=146521 for example. Why would you run a production Moodle-server in a virtual environment?
In reply to Visvanath Ratnaweera

Re: Response time issues - strace analysis help needed

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Chris

Any progress?

Talking of NFS and lock files, you must have seen this note in apache2.conf (at least in the Debian package):

# NOTE! If you intend to place this on an NFS (or otherwise network)
# mounted filesystem then please read the LockFile documentation (available
# at );
# you will save yourself a lot of trouble.
In reply to Chris Martel

Re: Response time issues - strace analysis help needed

by Daniel Tran -

if you are using an nfs mount to store your moodledata; the system will take a i/o hit when someone perform backup/restore/import (site will be slow while someone doing this). Moodle will use that moodledata/temp to write to.  Create a symlink for that temp directory to go to local drive instead of the nfs mount.

if the session (/moodledata/session) is on nfs mount, you can do the same thing.  I think you can also define $CFG->sespath ='some local path'; in config.php

In reply to Daniel Tran

Re: Response time issues - strace analysis help needed

by Chris Martel -

We are running two web servers in a load balanced configuration.  Even though we are using "sticky sessions" I believe we could run in to problems storing temp or session data on the local file system.  Would you agree?

In reply to Chris Martel

Re: Response time issues - strace analysis help needed

by Daniel Tran -

I don't know. I don't know how load balancer work and how it handles sessions therefore I can't comment ... smile