available sockets on a Moodle web server I'm running. I've Googled & RTFM'ed but am
still stumped. Sure would appreciate any ideas.
I've recently migrated a Moodle site running against PostgreSQL from a
single server, running FreeBSD, to a webserver and a database server consisting of:
- two virtual machines, both running CentOS 5.4, Linux version
2.6.18-14.10.1.el5 both with 3 Gb RAM and two dual-core
Intel processors allocated.
- the web server is running Apache 2.2.14 & PHP 5.31.
- the database server is running PostgreSQL 8.4.1, with pg_hba.conf set up
to trust the webserver on port 5432.
- both Apache & PostgreSQL are set to accept 225 max connections, otherwise
the conf's are pretty much default.
- web server is running OpenSSL for secure login, but serving general html
pages without https.
- tcp_keepalive_time in both is default 7200 seconds (which, as I read in
various posts, etc., shouldn't really matter anyway, but...)
- we are running against an external LDAP server, but the TIME_WAIT processes are only recorded against the Moodle database server, not the LDAP server...also using OpenSSL for logins only.
A cron job restarting Apache every hour is keeping the webserver alive, but
I'd sure like a better solution...
Any ideas would be greatly appreciated...
In short, TIME_WAIT is harmless, except in really extreme conditions. (CLOSE_WAIT states would be a bad sign.)
Your problem must be related to something else. Maybe it would help if you post the output of "netstat -tn" here (taken when the problem occurs), perhaps there's a hint to the actual problem.
Look to your communication between the two database servers.
This sounds like once a request is given to the database, the signal gets lost either coming or going.
I had three sites which were connected via one provider (in a ring), and each had access to the internet via local providers. Our tech sections had our internet accesses set up to fail over through another site if our local provider failed. The problem was, the fail over occurred too fast, and requests would then go through our private ring from A to B to C to A to B to A to C to.... and none of the sites would actually send it out through their local provider.
So, since the problem started when you created two load sharing virtual servers for the database, are they talking properly? Test it by turning one of them off, then if the problem goes away, you know that it is the set up between the 2 servers and how they interact. (OOps first test justserver a, then just server b, to see if it is just one of them introducing the error.)
Hope this helps solve it.
In the end, it had nothing to do with PostgreSql...