and what was the DB encoding of your original DB? If nothing was specified when you created it (previously to Moodle 1.6), it should be, by default, 'latin1' and PHP DB connection is also set to 'latin1' by default (if you didn't modified php.ini nor my.cnf in your original server).
This means that everything in you old server was sent over one 'latin1' channel (between PHP and MySQL) and stored in 'latin1' tables in MySQL, although it seems that you've used non 'latin1' characters in your old server.
All this implies that the data currently stored in you DB is really wrong (from a purist technical point of view) because it isn't proper 'latin1' data (because you have stored some non-latin1 data) nor proper 'utf8' data (because you have used one 'latin1' communication channel to send the data).
Luckily, this is only from a "purist technical point" of view and MySQL allows to store non-latin1 data in latin1 tables and, when data is retrieved from DB back to PHP the inverse encode is performed and everything seems to work pretty fine.
But this imposes serious limitations. For example, you cannot change the communication channel encoding (as you are doing in you edited my.cnf example above), nor can force any encoding in your web server (because its' possible that you have one mix of different encodings in your DB).
This was one of the main reasons to the new UTF-8.-ized Moodle ASAP, because DB internals were mixing contents in really different encodings and everything was based in the user lang and difficult to handle, more every day, with Moodle trying to communicate with other systems.
Obviously, as that nightmare of different encodings co-exist in a lot of servers, the conversion process isn't as simple as reconvert all the fields from 'latin1' to 'utf8', set the communication channel encoding to 'utf8' and to continue working, because your data isn't proper 'latin1' data as I explained 2 paragraphs above.
This implies that, if your site was being used by users using different encodings, every content (every field, every record!) has to be transformed to 'utf8' from its ORIGINAL encoding (user based) and, well, it cannot be performed by hand.
So, who execute/handle this really expensive task? Can you imagine it? Yes, it's Moodle 1.6
. Once installed it will detect that your DB isn't running if the proper 'utf8' mode and will offer you the possibility to process all the info in order to convert every content to 'utf8' introspecting in each field, analysing who sent that content to DB and performing the required conversion.
(great job, Yu!)
At the end of the looong process (depending of its size), once everything has been converted, Moodle itself will put the communication channel to 'utf8' and since that moment, everything (database, communication channel and http) will be running under 'utf8'.
Sounds simple, eh?
So, for sites having 'mixed' contents (contents stored in DB under different encodings) like you, the upgrade path should be something like:
- Backup everything before upgrade!
- Use MySQL 4.1.12 (and upwards), avoiding to set anything in their configuration files to force any encoding at all (mysql, php, apache). It must work exactly the same than the old DB.
- Standard Upgrade Moodle with the newer version.
- With this, you'll have one non-utf8 Moodle site running and you should be able to see everything exactly as it was before.
- Backup everything again!
- Execute the utf8 migration utility. If something stops the migration process it can be safely continued by launching it again (it remembers where it ended).
- At the end of the 'utf8' migration, one message will appear telling you what languages you have to install (annotate them).
- Login to your new site and install the required languages.
- Voilà, everything should be utf8 and be working like a charm!
Sites being 100% sure about they are using ONLY ONE encoding can specify it in the process above, and it would save a lot of CPU cycles to the looong migration process. But do it, ONLY if you are sure of the encoding of all your DB contents. Else, data loss could arrive!
And this is all, I hope it had explained a bit more about the utf8 thing, do's and dont's...