Databases: Issue with Database collations

Databases: Issue with Database collations

by Tim Hunt -
Number of replies: 6
Piksa bilong Core developers Piksa bilong Documentation writers Piksa bilong Particularly helpful Moodlers Piksa bilong Peer reviewers Piksa bilong Plugin developers

There is a long-standing issue caused by using case-insenstitive database collations, which I have just written up here: http://docs.moodle.org/dev/Database_collation_issue

There is also a propose solution, which is pretty good, apart from one scary issue, which is also described there.

We now need feedback on this proposal. Does it make sense to you? Should we do this? Can you suggest ways to deal with any of the difficulties?

Average of ratings:Useful (1)
In reply to Tim Hunt

Re: Databases: Issue with Database collations

by Ray Morris -

The proposal may be the best solution to a thorny problem.  However, let me throw out a thought.  II'm in strong agreement with earlier comments like "the fact that MySQL thinks that 'x' == 'X' for the purposes of unique indices is a horrific bug in MySQL".   The root of the problem is that MySQL, not Moodle, is somewhat braindead, so in an ideal world it would be fixed in MySQL.

 

In working out a solution, it sounds like the the available collations in MySQL  may hve been an issue.  If MySQL doesn't have the collation you'd like, it's simple to add a new collation, either by defining out in index.xml or by defining only how it differs from the base collation.  I'm not sure if the abiliy to add an appropriate collation could lead to a better solution, but I thought I'd mention it as it might spark some ideas.

http://dev.mysql.com/doc/refman/5.6/en/adding-collation-unicode-uca.html

In reply to Ray Morris

Re: Databases: Issue with Database collations

by Tim Hunt -
Piksa bilong Core developers Piksa bilong Documentation writers Piksa bilong Particularly helpful Moodlers Piksa bilong Peer reviewers Piksa bilong Plugin developers

Thanks for your input.

Well, MySQL is the most common platform for Moodle, so we have to work-around its inadequacies, just like we have to make Moodle work in Internet Explorer.

Also, most people running Moodle on cheap web hosting probably do not have the luxury of creating a new collation for their database, but thank you for pointing that out, I did not realise you could do that.

In reply to Tim Hunt

Re: Databases: Issue with Database collations

by Dan Poltawski -

Hi Tim,

I've just been re-reminded of this spec. I think it covers the problem well (and even if we never make any progress on this, its good to have the problems outlined like this).

I'm definitely not an expert in these issues at all (especially not speaking a different language to english), but one thought that occcured to me. I don't know if a langpack would be enough to specify the collation (for example, some lesser translated languages might use a different language for the moodle interface to their actual language, due to lack of translations). I don't know if thats a common problem, or if you get that sort of diffeent in collation between parent languages and child langpacks (e.g. pt vs pt_br). Actually, i'm sure there are people using en as langpack, but need a different collation to sort the data they input into Moodle

With regard to Backwards compatibility of identifiers, which seems to be the main barrier for this - we really need to try and engage site admins/bespoke integrators to response to this. I'm sure we could try and get the Moodle partners attention at HQ for this, but apart from that, if people are not responding to this spec then it makes me think the only way to go forward is to do it and document the changes heavily.

Average of ratings:Useful (1)
In reply to Dan Poltawski

Re: Databases: Issue with Database collations

by Tim Hunt -
Piksa bilong Core developers Piksa bilong Documentation writers Piksa bilong Particularly helpful Moodlers Piksa bilong Peer reviewers Piksa bilong Plugin developers

Some of the other things that are now normally specified in the lang pack (e.g. how to combine firstname and lastname to give fullname) allow to to override them using $CFG. (But actually, that might just be a historic thing.)

In fact, we don't even need to provide a $CFG override. If you are stuck having to use the 'wrong' languages for your Moodle site; and you don't want to start work on a lang pack in your own language; but you do really want to fix the sort order in your UI; then you can just use the standard Language Customisation UI to change that langconfig string. Doing that is probably easier than going into your DB and changing the collation (on every table and column).

(Also, as it happens, I know you picked a bad example. My experience is that pt_br is one of the better named lang packs. Certainly they have translated some of my add-ons in the past.)

However, fundamentally, like you, I am a bad person to be considering this. Anything you can do to get input from Moodle site admins, or others, would be really great.

In reply to Dan Poltawski

Re: Databases: Issue with Database collations

by Russell Smith -

Dan, you are missing so much fun.  Even english has different sort orders based on country code.  And PHP sorts in C rather than english, so even with db fixes, you then need to ensure you don't sort arry's in code that prpduce output.

But back to the topic a little more.  I don't completely understand the identifier issues.  Moodle has required lower case usernames since 2.0 and we had a data cleanup before upgrading to 2.3 as webservices in 2.0 allowed uppercase to be accepted even though it's impossible to log in.  So what other identifiers are at risk?  As this appears to be the major blocker to a smooth implementation.

 

In reply to Russell Smith

Re: Databases: Issue with Database collations

by Tim Hunt -
Piksa bilong Core developers Piksa bilong Documentation writers Piksa bilong Particularly helpful Moodlers Piksa bilong Peer reviewers Piksa bilong Plugin developers

It is not just usernames. There are the idnumber columns for courses, users and activities, variable names in calculated questions, ... others that I can't remember now.

In Moodle we have a library lib/textlib.php, which includes local-aware sort routines. So, we are not stuck with PHP's brokenness. (This is a key point that probably not everyone knows. If you are sorting an array of user-visible strings, use collatorlib::asort(), not asort().