The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Toni Soto -
Number of replies: 8

Hi,

I've been struggling to improve the poor performance in our Moodle2.4 since we migrated from 1.9. I tuned Apache, I installed APC and I tested for long time with MySQL to increase performance.

My LAMP server is:

Pentium(R) Dual-Core  CPU      E6500  @ 2.93GHz

8 MB RAM

2 x 1TB SATA Disks (RAID1 Software)

I know that RAID1 Software with SATA disks is not the best option for storage but it's the one we can/could afford it in our School.

Today, after reading some post about MySQL performance I decided to test for the forst time changing innodb_flush_log_at_trx_commit=1 to innodb_flush_log_at_trx_commit=2.

As a result of this I experience a huge improvement in performance. Probably this was (one of) the bottleneck in my installation.

But according with the MySQL documentation (http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit)

When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.

....

With a value of 2, only an operating system crash or a power outage can erase the last second of transactions. InnoDB's crash recovery works regardless of the value.

We can afford to lose for example some submissions from a quiz or an assignment done in the last hour. If this happens, we can sort it out at School.

And my questions are:

What would be the worst scenario for a Moodle admin if a crash caused by this issue occurs?

Would it be possible to recover (with usual MySQL tools: analyse, repair,.. ) the integrity of the database after such a crash?

Would Only the last DB records (transactions) be affected (lost) or is there any risk of losing even everything?

I'd really appreciate your feedback and suggestions about this issue.

TIA,

Toni Soto

 

 

Average of ratings: Useful (1)
In reply to Toni Soto

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

My experience is that (in general) when InnoDB breaks it usually does a very good job of it. Operating system crashes and power outages are hardly unusual events. You've reduced the potential integrity of your database to gain some performance which is a trade-off you have to weigh up. Remember that if a problem occurs, it's not going to be a high level event like losing some submissions. It's likely to mean mysterious database errors in unconnected processes or possibly a complete loss. 

Check your backups are working wink

In reply to Howard Miller

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Toni Soto -

Thanks for commenting Howard.

So, if I stay with innodb_flush_log_at_trx_commit=2 would be as having the same level of 'risk' that we already had with MyIsam tables in old Moodle versions, right?

Yes, daily DB Backups are the only way to preserve a full integrity of the Moodle DB if something goes wrong. I also though (I'm not an expert! only a Science teacher) in doing some kind of MySQL replication (Master-Slave) but experts say that Master server must keep innodb_flush_log_at_trx_commit=1 to preserve integrity/consistency. So, I'm afraid Master-Slave replication is not a choice to sort it out. sad Am I right?

As I said before I'm not an expert in database replication but I wonder if it's possible to keep a Master-Slave replication model but delaying the synchronisation to take place only at night. The idea would consist on keeping an automated backup system (Master-Slave replication) but with a sufficient synchronisation margin/delay in time to be not too affected by a crash in master. Any expert could give me advice about how to deal with keeping a consistent backup other than the usual mysqldump in a cron?

TIA

Toni Soto

 

 

In reply to Toni Soto

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

With the huge caveat that I am not an expert either but I do have a fair bit of practical experience. Some thoughts...

- If InnoDB breaks it breaks properly and you are reloading a backup. If MyISAM breaks the REPAIR command seems to save you every time. InnoDB seems less problem prone but when it does break it's usually bad. 

- I suspect that replication would turn out to be another point of complexity. What do you hope to achieve that would not be done by a decent backup regime? Where will your replication server be located? If they are simply next to each other in the rack then they may well suffer from the same problems as each other. 

- If you really care about database integrity what are you doing using MySQL? Postgres is significantly less likely to chew your data in the first place and, in a loaded environment, likely to give superior performance.

Average of ratings: Useful (1)
In reply to Howard Miller

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Toni Soto -

Hi Horward,

What do you hope to achieve that would not be done by a decent backup regime? Where will your replication server be located? If they are simply next to each other in the rack then they may well suffer from the same problems as each other.

I have another old server next to the current one at the School. I was thinking about how fast I'd be restoring the system in case of failure in MySQL server. As soon as It was detected I could login (ssh) and simply change config.php to point to the slave MySQL server. I'd return back to innodb_flush_log_at_trx_commit=1 and would go on running Moodle.

If I have to restore a dumped backup file it will take (theoretically) more time. But, of course you're right; I could also write an script to backup DB every n hours and even to restore in another MySQL server.

 If you really care about database integrity what are you doing using MySQL? Postgres is significantly less likely to chew your data in the first place and, in a loaded environment, likely to give superior performance.

I go on using MySQL because is the one I know. I've never tried PostgresSQL. I made some tuning to my.cnf and now I found the huge effect in my installation when changing innodb_flush_log_at_trx_commit=2. All my previous changes in my.cnf showed a small increase in performance but this one is amazing!

I'm going to post in my blog some metrics related to performance as soon as I finished a migration to a new server. I'm also testing with SSD disks for first time to store the database.

I'm going to give you a comparative data to show the performance gained with innodb_flush_log_at_trx_commit=2 in my installation:

My access time (as Moodle admin) to 'manage my private files' has improved from around 55 s to only 2.3 s (586/453 reads/writes involved). Yes, it's not a typo.. it was around 55s!

All test were done after a Apache/MySQL/APC restart and repeated many times to normalised the metrics. I haven't moved to the new server yet.

Toni

 

In reply to Toni Soto

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by jason everling -

The worst scenario, 1-2 seconds of data loss.

You can use innodb_flush_log_at_trx_commit=2 on a master/slave setup without issue, you would only lose an avg 1 second of data on the slave if the master crashes (if your school can deal with losing that 1 second of data use it, if not switch back to 1 for ACID compliance). Depending on why it fails, your slave should be unaffected thus switching slave to master and reconfigure config.php. You would also have better consistent backups by using the slave for the full/weekly/hourly incremental backups and then you could use the master if you wanted to for nightly backups giving you redundant backups.

Thanks,

JASON

In reply to jason everling

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Toni Soto -

Thanks a lot for commenting Jason.

My first concern was not about losing the last X minutes in activities because, as I already said, we can sort it out. I was worried about not being able to re-use tables or even the whole database due to some kind of index corruption. The few I've read about ACID databases scared my about using innodb_flush_log_at_trx_commit=2 without references/advice.

This 'small' change in MySQL configuration gives us the possibility of working again (we used to do it in Moodle 1.9) with adaptative quizzes. With Moodle 2 versions (innoDB) students had to wait too long for a response at every submission so we decided to avoid them when doing quizzes with class groups (usually 25 students) at School.

Another big problem was when teachers needed to edit/change marks manually. They spent long time when going from one quiz attempt to another.

Now I know that you're one of the MySQL gurus at Moodle Forums so I'm going to monitor your posts to learn from your experience as much as possible wink

Thank you gain and Happy New Year from Spain!

Toni

In reply to Toni Soto

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

"should"

That's all I want to say. Check your backups are good wink

In reply to Toni Soto

Re: The worst scenario for a DB crash caused by innodb_flush_log_at_trx_commit=2

by jason everling -

As Howard says, create a process to verify backups at the very least, bi-weekly. We have ours set to 2 and we are running a Galera cluster, we have not had any issues in the past year. We also used 2 even before we moved to a Galera Cluster and did not have any issues since I have been working here, that is since 2007.

I would say use it if you are satisfied with the performance. In the event that MySQL fails and you have a Slave running you would lose those few seconds of data while if you didn't have the slave you would lose as much data after your most recent backup.

 

JASON