Here is a valuable posting by Martin L. about MySQL vs. Postgres. I hope it goes into the docs somewhere. Thanks!
There are several reasons to go with Postgres,
I'll try to make a brief outline. We run a variety of RDBMSs at
Catalyst, and have a lot of in-house experience with them: Oracle, Postgres, MySQL and Progress, plus a few others. We also have experience with replicated databases, clustering and other tricks -- which we use for the backend of the .nz root domain servers as well as a few other mission-critical systems.
On the performance side, Postgres requires a bit more up-front configuration than MySQL. A well tuned Postgres is pretty close on SELECT performance to MySQL with small databases. With large tables MySQL has some bad performance problems, and Postgres performs much better.
Write performance is also an issue with MySQL -- with a lot of traffic, it has serious problems with concurrent writes. Under heavy load, Postgres performs much better.
But to tell you the truth, the real reason for choosing Postgres is reliability. We maintain a lot of databases, and Postgres
is rock-solid reliable and has a focus on ACID-correctness: when it
returns from a commit, the data is safely on disk and won't be lost --
barring actual disk problems, which we offset using RAID-1.
No matter how hard we try, MySQL databases with a lot of usage have recurring index corruption issues. If you look at the startup scripts for MySQL
on most Linux distributions, they check for data corruption on every
startup -- this is to mask the fact that it is a frequent occurrence.
And while this is passable with small installations where the data
isn't mission critical, you have to consider how much you can trust
such approch. And with large datasets, runing isamchk/myisamchk can
take hours -- we cannot afford that.
The clustering solution for MySQL
is being touted a lot, and I think it is a red herring. My main concern
about is that it writes "asynchronously" -- that is, there is no
guarantee that your data is on-disk safely. It'll get to the disk
sometime. It'll get to the slaves... sometime. Hmmm.
Given that the MySQL
cluster uses async writes, splitting read/writes between the master and
the slaves breaks down in cases where we write some data, and read it
back in immediately (or soon after). And this does happen in quite a
few places.
And you also have to consider the performance boost of using async writes: if you tell a standalone Postgres or MySQL
to use async writes, it'll run scale much better (should be able to
handle up to 3-4 times more simultaneous writes). Once you do that, the
performance advantage of the MySQL cluster mostly vanishes. It still has semi-hot takeover in case the master goes down, but Postgres can do that using Slony, and with better guarantess of consistency of the data in the slave.
In a nutshell, MySQL
isn't normally very solid when it comes to ensure my data is safely
stored on-the-disk, even if it theoretically guarantees that it's been
saved. And MySQL Cluster says up-front that there isn't a guarantee any more. Riiiiiight
Michael is talking about having UPSs. We have a car-sized UPS and a
container-sized on-site generator that auto-starts. And yet, I wouldn't
depend on that for my DB consistency on a large installation. So many
things other than power can (and do) go amiss. If a process has a
problem storing the data, the right thing is to tell that back to the
user. With async writes, you end up with a queue of data that hasn't
been stored yet, but you already told the user it was.
That's not what a database is supposed to do.
I am currently exploring some techniques similar to those being used in
livejournal and slashdot. We should be able to increase Moodle
scalability by cutting down on DB load by about 50%. This is happening
slowly in the gaps between more urgent projects. Feel free to ping
Richard or me if you're interested in that track.
In reply to Don Hinkelman
Re: Moodle documentation: Server Database Software Guide
by Przemyslaw Stencel -
Thank you, Don.
Within a few days I'll put it in the admin section.
Przemek
Within a few days I'll put it in the admin section.
Przemek
In reply to Don Hinkelman
Re: Moodle documentation: Server Database Software Guide
by Ghassan Geara -
Searching for a detailed discussion / document on clustering, scalability and security for a Moodle environment. Any tips around ?
Thx!
Gass