New Chinese translations of the editor.php and the admin.php,assignment.php,chat.php,choice.php,countries.php,dialogue.php,editor.php and so on
I notice the character set has changed in moodle.php, from:
$string['thischarset'] = 'GB18030';
$string['thischarset'] = 'gb2312';
(This is the same thing He Enji did, but didn't explain to me).
Is this intentional? Can you explain why you changed it? Are they fully compatible? Will it affect existing texts in Moodle?
In China,GB2312 is our offical character encoding standard,GB18030 is only one product of a company. See it from here:
Gb18030 can instead of the GB2312 character encoding standard and the Big5 character encoding standard.GB18030 is much strongger than GB2312,but I need more times to learn it.
GB18030-2000 is a new Chinese character encoding standard. The standard contains many characters and has some tough new conformance requirements.
It is illegal to sell products in China that do not conform to the standard.
All character set standards that originate in the PRC have designations that begin with "GB". GB is an abbreviation for Guojia Biaozhun, meaning "national standard". The GB 2312-1980 character set standard was established in 1981 to represent simplified Chinese characters. GB 2312-1980 is a coded character set that contains 7,445 characters, including 6,763 Hanzi and 682 non-Hanzi characters. With the release of ISO 10646-1/Unicode 2.1 in 1993, the PRC expressed its fundamental consent to support the combined efforts of the ISO/IEC and the Unicode Consortium through publishing a Chinese National Standard that was code- and character-compatible with ISO 10646-1/Unicode 2.1. This standard was named GB 13000.1. Whenever the ISO and the Unicode Consortium changed or revised their common standard, GB 13000.1 subsequently adopted these changes.
To accommodate all additional Hanzi characters specified in GB 13000.1 that are not included in GB 2312-1980, a new specification known as GBK was then introduced. GBK is an abbreviation for "Guojia biaozhun kuozhan", which is the Chinese for "Rules/Specifications defining the extensions of internal codes for Chinese ideograms". GBK is an extension of GB 2312-1980 and the key significant property of GBK is that it leaves the characters and codes as defined in GB 2312-1980 untouched and positions all additional characters around it. The additional characters are mainly those of the Unified Han portion of Unicode 2.1 that go beyond the character repertoire of GB 2312-1980. Thus, code and character compatibility between GBK and GB 2312-1980 is ensured while, at the same time, the complete Unicode Unified Han character set is made available. At the time when GBK was defined, other characters were added that were not available in Unicode.
GBK defines 23,940 code points containing 21,886 characters. At the same time, GBK provides mappings to the code points of Unicode 2.1. However, due to the packed code space used to define GBK, it became obvious that there was no space left for a major addition. The 1,894 code points of GBK's three user-defined areas were not even close to providing sufficient space for the CJK Unified Ideographs Extension A, which defines 6,582 new characters in plane 0 of Unicode, version 3.0, the Basic Multilingual Plane (BMP).
Therefore, GB 18030-2000 was created as an update of GBK for Unicode 3.0 with an extension that covers all of Unicode. It is fully backward-compatible with GB 2312-1980 and GBK. The mapping table from GB 18030-2000 to Unicode is backward-compatible with the mapping table from GB 2312-1980 to Unicode, however, the GBK to Unicode table has a few differences. GBK contains characters which were not defined in Unicode 2.1, but were added in later versions of Unicode.
GB 18030-2000 specifies a mapping table that covers all Unicode code points and maintains compatibility of GB-encoded text with GBK and GB 2312-1980.
Properties of GB 18030-2000
GB 18030-2000 has the following significant properties:It incorporates Unicode's CJK Unified Ideographs Extension A completely. It provides code space for all used and unused code points of Unicode's plane 0 (BMP) and its 15 additional planes. While being a code- and character-compatible "superset" of GBK, GB 18030-2000, at the same time, intends to provide space for all remaining code points of Unicode. Thus, it effectively creates a one-to-one relationship between parts of GB 18030-2000 and Unicode's complete encoding space. In order to accomplish the Unihan incorporation and code space allocation for Unicode 3.0, GB 18030-2000 defines and applies a four-byte encoding mechanism.
GB 18030-2000 encodes characters in sequences of one, two, or four bytes. The following are valid byte sequences (byte values are hexadecimal):Single-byte: 0x00-0x7f Two-byte: 0x81-0xfe + 0x40-0x7e, 0x80-0xfe Four-byte: 0x81-0xfe + 0x30-0x39 + 0x81-0xfe + 0x30-0x39
The single-byte portion applies the coding structure and principles of the standard GB 11383 (identical to ISO 4873:1986) by using the code points 0x00 through 0x7f.
The two-byte portion uses two eight-bit binary sequences to express a character. The code points of the first (leading) byte range from 0x81 through 0xfe. The code points of the second (trailing) byte ranges from 0x40 through 0x7e and 0x80 through 0xfe.
The four-byte portion uses the code points 0x30 through 0x39, which are vacant in GB 11383, as an additional means to extend the two-byte encodings, thus effectively increasing the number of four-byte codes to now include code points ranging from 0x81308130 through 0xfe39fe39.
GB 18030-2000 has 1.6 million valid byte sequences, but there are only 1.1 million code points in Unicode, so there are about 500,000 byte sequences in GB 18030-2000 that are currently unassigned.
What I understand from this is that GB18030 is backward-compatible with GB2312 text anyway, which is great. It also appears to be a the official standard for new products.
So, it seems better to me to keep the encoding at GB18030 so that the additional characters are at least possible (even if we don't necessarily use those extra characters).
Is that all OK with you? I'm using all your new language files on this site now, with this change and it looks very good to me!