Hi all, I've got interest in getting support for the postgresl module unaccent added into the Moodle PostgreSQL database driver, in order to add support for accent insensitive search.
Intro / How did I get here?
Our local Moodle contains ancient modifications in our German language pack, which adds the umlauts "Ä", "Ö", and "Ü".
They were added, because the class initials_bar fetches them using get_string('alphabet', 'langconfig'), so filtering the list of participants in a course can consider those, too.
Just as sidenote: Adding buttons was so popular in our region, we made it to the Moodle issue tracker, too.
See https://tracker.moodle.org/browse/MDL-68274 for the (already closed) discussions leading down that rabbit hole.
But why would you want to add those in the first place? Shouldn't names starting with "Ä" be shown when filtering for "A"?
- My personal opinion: Yes, that makes sense from a user perspective.
- With MySQL/MariaDB: Yes, this seems to be default behaviour.
- With PostgreSQL: No, because default strict string comparison says "Ä" != "A".
Also, adding more letters to the "alphabet" has downsides:
- You don't need them if MySQL/MariaDB is present and behaves "correctly".
- What about other diacritics like "Å", or even letters like "Ç" as in "Çinkaya"? That list would grow to an unusable size fast.
- Modifying language packs is not fun, because we have to drag our modification through each update. We shouldn't need to modify our language packs for this.
Accent insensitive search with PostgreSQL
In lib/dml/pgsql_native_moodle_database.php, there is the function sql_like.
@param bool $accensensitive use accent sensitive search (not all databases support accent insensitive)Now let me introduce unaccent - "A contrib module for removing accents from words"
See https://pgpedia.info/u/unaccent.html for an intro and more links to even more docs.
The database administrator has to enable the (usually already present) unaccent module via SQL queries.
If you're curious, you can take a peek into its dictionary file unaccent.rules to inspect the mapping it brings.
Once set up, an accent insensitive search query could look something like this:
[...] WHERE unaccent(fieldname) LIKE unaccent($paramvalue) [...]
Once that feature is in place, components like the initials_bar could make use of it to provide the wanted behaviour for PostgreSQL, too.
Thanks for reading!
What do you think? Is this approach going in the right direction?