mysqldump breaks character encoding?

mysqldump breaks character encoding?

ved Visvanath Ratnaweera -
Antal besvarelser: 11
Billede af Particularly helpful Moodlers Billede af Translators
I need to understand how mysqldump (the cli tool) handles the character encoding.

I had a case where taking a db dump and playing it back broke the environment check
(Site administration -> Server -> Environment). As a result the next upgrade also stops
at the initial unicode check. The db dump was taken with the mysqldump cli tool, played
back with mysql cli tool.

I localized the problem in a test setup.  These steps are reproducible - all on Debian
Squeeze, 64 bit:

1. Create a fresh 1.9.16 instance on http://my.site/test/. Worked error free, including
the environment check after install.

2. Check the character set:
mysql> use db_test;
mysql> show local variables like 'character_set_database';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| character_set_database | utf8  |
+------------------------+-------+

3. Take a sql dump,
$ mysqldump -u db_user -pSECRET1 db_test > db_test.sql

4.  drop and recreate the database, play back the sql dump.
$ mysqladmin -pSECRET1 drop -f db_test
$ mysqladmin -pSECRET1 create db_test --default-character-set=utf8
$ mysql --user=db_user --password=SECRET2 db_test < db_test.sql

Now the enviroment check fails!

5. Check the character set:
mysql> use db_test;
mysql> show local variables like 'character_set_database';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| character_set_database | latin1  |
+------------------------+-------+

Character set has changed from utf8 to latin1 !

What am I doing wrong?

P.S. Sorry for the broken formatting. For whatever the reason: '"Plain text format" is not
monospace'  http://moodle.org/mod/forum/discuss.php?d=200351.
Gennemsnitsbedømmelse: -
I svar til Visvanath Ratnaweera

Re: mysqldump breaks character encoding?

ved Iñaki Arenaza -
Billede af Core developers Billede af Documentation writers Billede af Peer reviewers Billede af Plugin developers
Hi Visvanath,

when using mysqldump, I always use the '--opt' option (and '--routines' and '--triggers' if I suspect the database has stored procedures and/or triggers, which I think Moodle doesn't use so far with MySQL). Maybe --opt is the default now, And I always use the mysql root user to dump the database. I also use the mysql root user when restoring it.

That helps restoring the database exactly as it was when the backup was taken (including character set options, even at the column level).

(please, keep me updated on this via your authentication forum post, as I'm not suscribed to this forum)

Saludos.
Iñaki.
I svar til Iñaki Arenaza

Re: mysqldump breaks character encoding?

ved Visvanath Ratnaweera -
Billede af Particularly helpful Moodlers Billede af Translators
@all

The problem still remains. It must be my understanding of MySQL character encoding and the way the tools mysqldump and mysql corporate.

The problem was, if I do
===
$ mysqldump -u db_user -pSECRET1 db_test > db_test.sql
$ mysqladmin -pSECRET1 drop -f db_test
$ mysqladmin -pSECRET1 create db_test --default-character-set=utf8
$ mysql --user=db_user --password=SECRET2 db_test < db_test.sql
===
it break Site Adminstration -> Server -> Environment -> unicode.

If I don't drop and create the database, i.e. not execute the second and third commands above, the site passes the unicode test.

Note: The db dump contains
===
DROP TABLE IF EXISTS 'tablename';
CREATE TABLE `tablename` (
....
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='some comment';
/*!40101 SET character_set_client = @saved_cs_client */;
===
kind of statements for each table.

Apparently I need a different "CREATE db" command from the one above. Any hints?
I svar til Visvanath Ratnaweera

Re: mysqldump breaks character encoding?

ved Iñaki Arenaza -
Billede af Core developers Billede af Documentation writers Billede af Peer reviewers Billede af Plugin developers
Hi Visvanath,

this is what I just did with my local test install using 1.9.17+ (Build: 20120419)

mysqldump --opt -u root -pSECRET moodle_19 > moodle_19.dump
mysql -u root -pSECRET -e "drop database moodle_19"
mysql -u root -pSECRET -e "create database moodle_19 default character set utf8 collate utf8_unicode_ci"
mysql -u root -pSECRET moodle_19 < moodle_19.dump

and I don't get any problems in "Site Adminstration -> Server -> Environment -> unicode".

Saludos.
Iñaki.
I svar til Iñaki Arenaza

Re: mysqldump breaks character encoding?

ved Visvanath Ratnaweera -
Billede af Particularly helpful Moodlers Billede af Translators
Hi Iñaki

I read somewhere that --opt is the default and wasn't convinced whether this would make a difference. But you are right.

After changine the two lines:
$ mysqladmin -u root -pSECRET drop -f db_test
$ mysqladmin -u root -pSECRET create db_test --default-character-set=utf8
to
$ mysql -u root -pSECRET -e "drop database db_test"
$ mysql -u root -pSECRET -e "create database db_test default character set utf8 collate utf8_unicode_ci"
"Site Adminstration -> Server -> Environment -> unicode" is OK.

Apart from a different tool (mysql instead of mysqladmin) now there is the additional option "default collate utf8_unicode_ci". I think collate was the crucial thing.

Thank you very much for going out of way to help me solving this problem!
I svar til Visvanath Ratnaweera

Re: mysqldump breaks character encoding?

ved G. M. -

Hi Visvanath,

The problem is caused because the default-character-set option is ignored by mysqladmin (whether that is by design or it is a bug, I don't know).

The mysqladmin program reads and uses the value from character-set-server parameter defined in the my.ini file.

I svar til Visvanath Ratnaweera

Re: mysqldump breaks character encoding?

ved Iñaki Arenaza -
Billede af Core developers Billede af Documentation writers Billede af Peer reviewers Billede af Plugin developers
Hi Visvanath,

when using mysqldump, I always use the '--opt' option (and '--routines' and '--triggers' if I suspect the database has stored procedures and/or triggers, which I think Moodle doesn't use so far with MySQL). Maybe --opt is the default now, And I always use the mysql root user to dump the database. I also use the mysql root user when restoring it.

That helps restoring the database exactly as it was when the backup was taken (including character set options, even at the column level).

(please, keep me updated on this via your authentication forum post, as I'm not suscribed to this forum)

Saludos.
Iñaki.
I svar til Iñaki Arenaza

Re: mysqldump breaks character encoding?

ved Visvanath Ratnaweera -
Billede af Particularly helpful Moodlers Billede af Translators
Hi Iñaki

Thanks for the hints. They lead to the tons of information here http://docs.moodle.org/en/Converting_your_MySQL_database_to_UTF8. Sure, I'll keep the other thread "auth/ldap/auth_ldap_sync_users.php behaves differently in 1.9.17+ than in 1.9.16+" http://moodle.org/mod/forum/discuss.php?d=200291 updated.

@all
If you have further hints to narrow down my search, I appreciate.
I svar til Visvanath Ratnaweera

Re: mysqldump breaks character encoding?

ved Howard Miller -
Billede af Core developers Billede af Documentation writers Billede af Particularly helpful Moodlers Billede af Peer reviewers Billede af Plugin developers
I'm not sure if this helps (have not read this thread in detail) but we have had lots of fun with the collation settings in MySQL.

mysqldump does not comprehensively dump the collation settings for the fields/tables (regardless of --opt as far as I can tell). To make matters worse the way in which the default collation for fields is defined when the table is created is really weird. I assumed that it would come from the parent table/database but this turns out not to be the case. It is derived from (I *think* anyway) whataver locale the MySQL server happened to be built with!

So, it makes it remarkably easy to migrate a database from one machine to another and screw up all the collations.

Our solution was to run a script over the dump file so that the collations are specified explicitly in all cases.

Increasingly annoyances like this are pushing me more and more into the PostgreSQL camp which is altogether a more civilised database.
I svar til Howard Miller

Re: mysqldump breaks character encoding?

ved Visvanath Ratnaweera -
Billede af Particularly helpful Moodlers Billede af Translators
Hi Howard

I got is solved in the other subthread: http://moodle.org/mod/forum/discuss.php?d=200610#p878358.

Side note: Pity one can not tag discussions as [SOLVED] in moodle.org.

Well, the "solution" didn't explain what was wrong with the old method. Still it is a consolation to know that I'm not alone in this.
;-(
I svar til Visvanath Ratnaweera

Re: marking discussions as solved

ved Helen Foster -
Billede af Core developers Billede af Documentation writers Billede af Moodle HQ Billede af Particularly helpful Moodlers Billede af Plugin developers Billede af Testers Billede af Translators

Although it's not possible to tag discussions, [Solved] can be added to the subject line of a post, as suggested in the Moodle.org forums Code of Conduct.