Databases: Handling special Windows characters in Oracle

Databases: Handling special Windows characters in Oracle

by Jonathan Newman -
Number of replies: 4
Hi,

At Catalyst we've supporting many enterprise/corporate clients that are running Moodle on Oracle.

One problem we've recently solved that may help other Moodlers who are running Moodle installs on Oracle: special Windows characters (magic quotes, bullet points, etc) which display improperly (ie, display as ??? or similiar) when users enter them into forums, user profile descriptions, or anywhere text and html is accepted from users in Moodle.

The root of the problem is that these characters need to be converted to Oracle's database encoding before they are entered into the database.

The solution is to tell apache which encoding the data is expected to be sent from. The steps involved:

1) Set the encoding in the environment variable as NLS_LANG as WE8MSWIN1252. Assuming your language and territory are "american" and "america", respectively, the command on a *nix system should be:

export NLS_LANG=american_america.WE8MSWIN1252

2) Stop and start your webserver, assuming your using apache:

apache2ctl stop
apache2ctl start

(note: a restart of apache is not sufficient).

Making this change, and your Oracle database should properly store the Windows characters--converted to it own encoding (eg, AL32UTF8).

References:

"Oracle nls_lang tips" on dba-oracle.com
http://www.dba-oracle.com/t_nls_lang.htm

Oracle Database Globalization Support Guide 10g Release 2
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14225.pdf


Cheers,
Jonathan
Average of ratings: -
In reply to Jonathan Newman

Re: Databases: Handling special Windows characters in Oracle

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers
This sounds wrong to me. All data in Moodle is meant to be UTF-8 encoded Unicode since Moodle 1.6. What you suggest will appear to work in the short-term, but are you sure you are not storing up trouble for the future?
In reply to Tim Hunt

Re: Databases: Handling special Windows characters in Oracle

by Jonathan Newman -
Hi Tim,

Thank you for your comment and question. I've done a bit more testing and confirmed that your hunch was right.

Below are the results of testing on an Oracle 10gR2 database with NLS_CHARACTERSET=UTF8 (and using Moodle 1.8, apache2, php5.1.2).

NLS_LANG character encoding
entered ->
-> stored ->
-> displayed
not defined
[ ’ “ ” •] [��� ��� ��� ���] [ ??? ??? ??? ???]
WE8MSWIN1252 [ ’ “ ” •] [’ “ †•] [’ “ ” •]
UTF8 [ ’ “ ” •] [’ “ ” •] [’ “ ” •]

So to answer your question based on the above: Using NLS_LANG=WE8MSWIN1252 is storing up for trouble in the future. For example: later changing NLS_LANG to UTF8, the example text converted when stored as WE8MSWIN1252 will display as [ ’ “ †•]!

So the best practice is then setting NLS_LANG to the same character encoding as the database (UTF8, AL32UTF8, etc), as long as the client operating system supports it.

This is basically what the Oracle's Database Globalization Support Guide says, and I was thrown off by the case when it NLS_LANG is not defined. I would have thought Oracle would have defaulted to the same encoding as the database, and it appears as if it doesn't.

Cheers,
Jonathan
In reply to Jonathan Newman

Re: Databases: Handling special Windows characters in Oracle

by Martín Langhoff -
Watch out with NLS_CHARACTERSET set to UTF8 on the DB side -- I just realised that in lib/setuplib.php:setup_is_unicodedb() Moodle will only accept AL32UTF8 as a valid unicode mode.

So with NLS_CHARACTERSET='UTF8' it will refuse to set a new moodle up, and refuse to upgrade too. And it will probably corrupt non-ASCII data trying to perform wonky charset conversions when writing/reading from it.

Perhaps this check is too tight -- What are valid UTF8 values for NLS_CHARACTERSET on Oracle?
In reply to Martín Langhoff

Re: Databases: Handling special Windows characters in Oracle

by Scott Karren -

Martin

I don't know if this is exactly what you are looking for but Oracle has some information on recommended character sets.  Click the link below to view it.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#g694776

Now here is a question for you or anyone else with Oracle experience.  I have a database with a default character set of UTF-16 (it is a very long story as to why it is this way and I have no way to change it).  I believe that by setting an NLS_LANG environment variable on the web server to NLS_LANG=American_America.AL32UTF8 that Oracle will perform the conversions necessary to get the data into the database.  Am I correct in this belief, and what are the implications in doing this?

Scott