PostgreSQL db driver: Add support for unaccent() / accent insensitive search

PostgreSQL db driver: Add support for unaccent() / accent insensitive search

by Jan Philipp Timme -
Number of replies: 2

Hi all, I've got interest in getting support for the postgresl module unaccent added into the Moodle PostgreSQL database driver, in order to add support for accent insensitive search.


Intro / How did I get here?
Our local Moodle contains ancient modifications in our German language pack, which adds the umlauts "Ä", "Ö", and "Ü".
They were added, because the class initials_bar fetches them using get_string('alphabet', 'langconfig'), so filtering the list of participants in a course can consider those, too.

Just as sidenote: Adding buttons was so popular in our region, we made it to the Moodle issue tracker, too.
See https://tracker.moodle.org/browse/MDL-68274 for the (already closed) discussions leading down that rabbit hole.

But why would you want to add those in the first place? Shouldn't names starting with "Ä" be shown when filtering for "A"?
  • My personal opinion: Yes, that makes sense from a user perspective.
  • With MySQL/MariaDB: Yes, this seems to be default behaviour.
  • With PostgreSQL: No, because default strict string comparison says "Ä" != "A".

Also, adding more letters to the "alphabet" has downsides:

  1. You don't need them if MySQL/MariaDB is present and behaves "correctly".
  2. What about other diacritics like "Å", or even letters like "Ç" as in "Çinkaya"? That list would grow to an unusable size fast.
  3. Modifying language packs is not fun, because we have to drag our modification through each update. We shouldn't need to modify our language packs for this.
So we chose to investigate how to get PostgreSQL to support our wanted behaviour.



Accent insensitive search with PostgreSQL
In lib/dml/pgsql_native_moodle_database.php, there is the function sql_like.

According to the code and comments, there is already a placeholder for accent sensitive/insensitive search present, which is currently unused:

@param bool $accensensitive use accent sensitive search (not all databases support accent insensitive)
Now let me introduce unaccent - "A contrib module for removing accents from words"
See https://pgpedia.info/u/unaccent.html for an intro and more links to even more docs.

The database administrator has to enable the (usually already present) unaccent module via SQL queries.
If you're curious, you can take a peek into its dictionary file unaccent.rules to inspect the mapping it brings.

Once set up, an accent insensitive search query could look something like this:

[...] WHERE unaccent(fieldname) LIKE unaccent($paramvalue) [...]
Once that feature is in place, components like the initials_bar could make use of it to provide the wanted behaviour for PostgreSQL, too.

Thanks for reading!
What do you think? Is this approach going in the right direction?

Average of ratings: Useful (1)
In reply to Jan Philipp Timme

Re: PostgreSQL db driver: Add support for unaccent() / accent insensitive search

by Michael Milette -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators
I work a lot with French which uses accents too. It would be great if this could be resolved.
In reply to Jan Philipp Timme

Re: PostgreSQL db driver: Add support for unaccent() / accent insensitive search

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
MySQL behaviour is certainly not correct, or even "correct" in many contexts. It thinks 'a' = 'A' which, in the presence of unique indexes, is just wrong.

Anyway, this is another aspect of a general issue here, which I summarised years ago at https://docs.moodle.org/dev/Database_collation_issue.

I think we should be reluctant to rely on a postgres contrib module - at least requiring it adds a clear cost: it makes Moodle harder to install.