Some ideas about glossary index to non-Latin languages

Some ideas about glossary index to non-Latin languages

by Zhigang Sun -
Number of replies: 0

Hi all,

Im a newbie of moodle, charset, mysql and php, and especial of English. Wish what I say wont drive you mad. J

The glossary index works well only on Latin languages. Ive done some research to make it support all other languages (Chinese at least, its my native language). The index keys are still A-Z. Many languages (at least, Chinese. Can anyone tell me other non-Latin languages do the same?) have their own way to map native character to Latin letters.

The following is code in mod/glossary/sql.php about it (Hope I found the right position).

case 'letter':

if ($hook != 'ALL' and $hook != 'SPECIAL') {

switch ($CFG->dbtype) {

    case 'postgres7':

        $where = 'AND substr(upper(concept),1,' . strlen($hook) . ') = \'' . strtoupper($hook) . '\'';

        break;

    case 'mysql':

        $where = 'AND left(ucase(concept),' . strlen($hook) . ") = '" . strtoupper($hook) . "'";

        break;

    }

}

if ($hook == 'SPECIAL') {

    //Create appropiate IN contents

    $alphabet = explode(",", get_string("alphabet"));

    $sqlalphabet = '';

    for ($i = 0; $i < count($alphabet); $i++) {

    if ($i != 0) {

        $sqlalphabet .= ',';

    }

    $sqlalphabet .= '\''.$alphabet[$i].'\'';

}

switch ($CFG->dbtype) {

    case 'postgres7':

        $where = 'AND substr(upper(concept),1,1) NOT IN (' . strtoupper($sqlalphabet) . ')';

        break;

    case 'mysql':

        $where = 'AND left(ucase(concept),1) NOT IN (' . strtoupper($sqlalphabet) . ')';

        break;

    }

}

break;

Its obvious that moodle uses SQL to index. Since database dont support the mapping, we must do it by ourselves. Four rules:

1. Not affect other codes outside the case-break scope;

2. Support all charsets in a standard way (extendable);

3. Not use the special function of special database (Such as MySQLs charset support)

4. Performance.

I designed two function:

function LANG_glossary_index($hook);

function LANG_glossary_special_char();

They make a string about select relevant entities and return it to $where. LANG can be changed to any charsets and use eval() call them by the server charset (NOT the interface charset).

I paid more attention on UTF-8 charset and made the following supposition.

Since there is not any rules in UTF-8 to support the mapping, we must create a mapping table which has two columns: letter and character. The letter is an enum of A-Z and the corresponding character is one character which can be mapped to the letter. For example:

letter     character

A             啊

Z             中

Z             志

The utf8_glossary_index() return (l_map_c is the mapping table):

AND

left(concept, 1) in SELECT character FROM l_map_c WHERE letter = $hook AND char_length($hook) = 1

OR

left(concept, 2) in SELECT character FROM l_map_c WHERE letter = $hook AND char_length($hook) = 2

OR

left(concept, 3) in SELECT character FROM l_map_c WHERE letter = $hook AND char_length($hook) = 3

OR

left(concept, 4) in SELECT character FROM l_map_c WHERE letter = $hook AND char_length($hook) = 4

Maybe it works. utf8_glossary_special_char() is in reverse. Forget the alphabet string because it only supports Latin charset.

I said Im a newbie of moodle, charset, mysql, php and English. But Im eager to do something to improve moodle. So I post these words. All I said is just in mind, not in code. EAGER FOR YOUR ADVICES.

Thank you for reading!

Average of ratings: -