Strange characters in forum posts...

Strange characters in forum posts...

by Seth Dickens -
Number of replies: 13
...and by this I don't mean my students smile

When I or my students post to our forums (Moodle v1.7) we often have inexplicable characters and letters added into our writing.

Here are a few examples:

How's going with your new house? smile I have written some ideas about the article, now I write them to you. I would start with

 

Â

 

The purpose of our article is to show that student’s reaction to the decision of the town council is strictly unwilling. Moreover, We’ll suggest you how to resolve the parking-problem without destroying the one green area of our city.


Does anyone have any ideas what this might be, or how I can fix it? It looks really strange!

Thanks,

Seth.
Average of ratings: -
In reply to Seth Dickens

Re: Strange characters in forum posts...

by Stephen Waldie -
Did you ever find a solution to this?

Im having the same problem, with an installation Im involved with?

I think there running 1.8.1,IIS and SQL Server 2003

Any time you type whitespace in the editor you get Â

Thanks

Steve
In reply to Seth Dickens

Re: Strange characters in forum posts...

by Carlos Miguel Imbach -
I have the same problem, apparently its the "tildes" in Spanish, its something with the special characters and forum notification to through e-mail.

This might have something to do with it.
http://tracker.moodle.org/browse/MDL-6905
Unfortunately there is no fix for the time being sad

Migue


In reply to Carlos Miguel Imbach

Re: Strange characters in forum posts...tidles missing

by Timothy Takemoto -

Dear Migue Stephen and others, hopefully Eloy

Do you still have this problem?

Moodle uses typo3 conversion libaries when it is set to convert email to a non utf8 encoding in, for example forum notification emails and also excel exports.

There seems to be a bug, or bugs, in the typo3 libraries, or in the php function "iconv" that the typo3 libary uses, such that some characters are not converted correctly.

E.g. utf8 tildes ~ are not converted to shift_jis tildes. The tildes just disappear. And since there is a tilde in my site url, this is a big problem since links in the notification emails do not work.

I have been working on this problem (in my novice way) and may have found a solution for myself, but I am not sure it even works in all situations for myself, let alone others with other languages.

It involves forcing the typo3 conversion libary class.t3lib_cs.php  to use mb_convert_encoding instead of iconv.

Moodle Tracker
http://tracker.moodle.org/browse/MDL-6905
Typ03 tracker
http://bugs.typo3.org/view.php?id=8417

I am scared of performing this sort of hack on a production server until others have tested this, or php programmers have passed comment.

The last change to the moodle version of the typo3 file class.t3lib_cs.php, made by Eloy Lafuente, is precisely that which relates to the bug above. Eloy Lafuente writes "Changing //TRANSLIT to //IGNORE because some weird bug in the OS iconv libraries..." With "//TRANSLIT" everything after the tildes dissappeared. But, alas, changing "//TRANSLIT" to "//IGNORE" means that only the tidles dissappear. Eloy, Eloy, lama sabachthani?

Tim

In reply to Timothy Takemoto

Re: Strange characters in forum posts...tidles missing

by Timothy Takemoto -

This is really does seem to be an incov problem, that perhaps relate the multiple instances of how tildes are represented c.f.
http://www.miraclelinux.com/english/technet/samba30/iconv_issues.html
I posted a php bug (not sure if it is a php bug)


So what I could do is convert all tidles to say "1btydes', convert the string and then convert them back again using the jsis tidle

The following program seems to work by replacing all tildes with a dummy string, performing the conversion and then re-replacing the dummy string with a sjis tilde. (Please upload attached to server to test)

Is it the way to go?

<?PHP
$string = 'where are the (~) (~) tildes?'; // This is what we start off with, you can put any string in here that contains problematic characters in utf8 format
echo ('this is what we start with = '.$string.'<BR />'); //print string at start
$conv_str = iconv('utf-8','shift-jis'.'//TRANSLIT',$string);
echo ('this is not working = '.$conv_str.'<BR />'); //Just to show that this is not working.

$rstring = preg_replace ('/~/','1bytetilde',$string);   //modify before conversion
echo ('this is modified string here = '.$rstring.'<BR />'); //This is the modified string

$conv_str2 = iconv('utf-8','shift-jis'.'//TRANSLIT',$rstring); //convert
$rereplace=chr(126); //$rereplace is a one byte tilde in shift_jis
$rerstring = preg_replace ('/1bytetilde/',$rereplace,$conv_str2); //rereplace with tildes
echo ('this is the correct result = '.$rerstring.'<BR />'); //the correct result
?>

In reply to Timothy Takemoto

Re: Strange characters in forum posts...tidles missing

by Timothy Takemoto -
Someone at php says that the tilde does not exist in shift jis!
http://bugs.php.net/bug.php?id=45017&thanks=2
And references this character set to prove it
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P130-1999&ShowSet&s=ALL#ShowSet

But then how do Japanese computers express tildes? The encoding of the following page is set to shift jis, and the browser setting can put set to shift jis, but the tilde can be seen.
http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php

Assuming that there really is not tilde in shift jis can see why //INGORE does not produce a tilde, but //TRANSLIT should display something  (an overbar or a tilde) and not stop at the tilde.
In reply to Timothy Takemoto

Re: Strange characters in forum posts...tidles missing

by Timothy Takemoto -

In fact I see that Eloy has faught a gallant fight and reported a similar bug (with Japanese - thanks Eloy) before
http://bugs.php.net/bug.php?id=38425&edit=1
but the Php people say it is a "bogus" bug because it is libconv (the library of conversions) problem.

So I have written to the libconv people asking them to provide TRANSLITS for the problematic characters. It is my fault but the bug seems to be passing moodle>typo3>php>libconv

And even if libconv were to say "yeah sure" then
1) Many other web sites might complain since they are used to the present libconv
2) This is especially true since chr(126) is meant to code to the overbar (I am not sure why it does not).
3) my server administrator would take a long time to upgrade libconv anyway.

So if I am to be able to link to my moodle (the URL contains a tilde ~)
http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php/
in shift_jis I guess need to hack the typo3 library in moodle.

Tim

In reply to Timothy Takemoto

Re: Strange characters in forum posts...tidles missing

by Timothy Takemoto -

A kind libiconv person called Bruno responded with the solution.

The encoding that has the tilde is strictly not Shift_JIS but the  "CP932 (also called WINDOWS-932). Windows uses CP932, not Shift_JIS. Shift_JIS is what has been standardized by Japanese standards organizations; but it is not what is used today normally."

Looking on the net, many people seem to be under the missaprehension that Shift_JIS and CP932 are the same thing. They would be but for the all important tilde.

CP932 is also supported by libiconv

Would it be possible therefore, for the "Shift JIS" setting in the site variables be made to map in fact to CP932, which is I believe the defacto standard, or for there to be abother "CP932" option? I will post a bug.

Is should be able to do this myself. Yes...

lib/moodlelib.php
Around line 5500, the change in red.

function get_list_of_charsets() {
    $charsets = array(
        'EUC-JP'     => 'EUC-JP',
        'ISO-2022-JP'=> 'ISO-2022-JP',
        'ISO-8859-1' => 'ISO-8859-1',
        'SHIFT-JIS'  => 'CP932',
        'GB2312'     => 'GB2312',
        'GB18030'    => 'GB18030',
        'UTF-8'      => 'UTF-8');
    asort($charsets);
    return $charsets;
}

When using the university computers my students use inhouse anti-viral email software called "MaiYu" which is not UTF8 compatible. It claims to use Shift JIS but my feeling is that it almost definately uses CP932 because tildes are important.

I think that at last students will be able to click email links to my moodle.

Tim

In reply to Seth Dickens

Re: Strange characters in forum posts...

by Seth Dickens -

Hi Stephen, Carlos and all folks,First off Stephen I'm sorry I didn't reply to your posting, I was off on holiday at that period. I didn't manage to solve the problem though, did you?

I'm just wondering if anyone else might have an idea what might be causing this fault now? After the long summer break our Moodle activity is hotting up again and we're seeing the same fault once more.

To recap the problem:

  • Strange characters appear in my Moodle v1.7 web page editing and also when I write forum posts.
  • The odd thing is, the characters are not there when writing the pages neither in the html version of the textbox when writing the post as in this example below:

<p><strong>Task 1.1: </strong><strong>Who are you? <br /></strong><strong>Aim: </strong>To introduce yourself to your virtual colleagues and find out a little more about each other by completing your Profile. </p>

  • Nor in the "Rich text" version when writing the page, again see below

Task 1.1: Who are you?
Aim:
To introduce yourself to your virtual colleagues and find out a little more about each other by completing your Profile.

  • But when I "save" the changes, this is the odd response I get:

Task 1.1: Who are you?
Aim: To introduce yourself to your virtual colleagues and find out a little more about each other by completing your Profile.

Does anyone have any ideas what might be causing this? As a language school it reflects badly on us if it seems liek there are mistakes in our instructions. These anomolies also appear in student posts too.

Your help is very much appreciated folks!

Seth

P.S. If this post is in the wrog forum, please do tell me and I'll post it elsewhwhere. Thanks!

In reply to Seth Dickens

Re: Strange characters in forum posts...

by Manish Verma -
Looks like utf related issue.

I have experienced those strange characters and have made quite a few posts about them. One of them that can be useful to you is here. There can be others if you look at my profile to see the posts I have made related to strange characters or utf.
In reply to Manish Verma

Re: Strange characters in forum posts...

by Seth Dickens -
Thank you Manish,

Hmm... that seems really complicated though. I wouldn't know where to start. I guess if I put my mind to it I could figure it out though, suppose I'd like an easy solution.

I'm thinking of upgrading in the next days / weeks to Moodle v8.x though. Do you think by upgrading the problem might just magically go away wide eyes, or is it more serious than that dead?

Thanks again for your help Manish, much appreciated,

Seth.


In reply to Seth Dickens

Re: Strange characters in forum posts...

by Manish Verma -
In my knowledge, utf issue can be a time taking issue. If you are planning to upgrade to 1.8 then before this upgrade 1.8 requires that data is migrated to utf-8. So, utf migration should be done at 1.7 stage or earlier. Considering this future requirement, the time spent in database migration now can be an investment for the future.

As far as the specific removal of strange characters issue is concerned, there is a simple solution for that which may work for you. You need to run replace.php script whose URL should be:

http://www.yourdomain.com/moodledirectory/admin/replace.php

I have used it several times. But, you may like to first try it out in your clone non-production installation.
In reply to Manish Verma

Re: Strange characters in forum posts...

by Seth Dickens -
Once again Manish, many thanks for your help!

I had a look at my database for the Moodle I'm running, and either:

  • You've helped me find the solution,
  • or, I've got an even stranger problem!
It seems that My database is already running UTF-8 (at least as far as I understand - see sceengrab below.)

Does Moodle v1.7 support UTF-8? If it doesn't, maybe that's my problem? If it does... well I guess I've not got anything to migrate it to. Hmm...

As for you idea of the replace.php, I don't (rather stupidly) have a production version of my site. I do have a "muck around" Moodle, but that's on a different hosting server and running v 1.8 (and has no problems at all! sad . I guess I could set up a "copy" of my production server.

Do you think running the replace.php script is a risky thing to try? At the end of the day, I have backups, I could always "rollback" if something went wrong wouldn't you think?

Once again, many thanks Manish, your help is much appreciated.

Seth


In reply to Seth Dickens

Re: Strange characters in forum posts...

by Seth Dickens -
P.S. I've just noticed the setting in the picture attached below Manish.

Do you think I should change this? I live in Italy, so I don't think we have any "special" characters.

Thanks again,

Seth
Attachment 3rd_Screengrab.jpg