Weird character encodings in profile pages

Weird character encodings in profile pages

by Zbigniew Fiedorowicz -
Number of replies: 10

I've just noticed that the "Full Profile" page for Moodle users does some weird decimal/hex encoding of characters of the user email address. I thought at first that maybe something was screwed up on my Moodle site, but I found that the same thing occurs on this Using Moodle site.  For example upon clicking on one of the new users to this site: Jose Ramon Rodelgo, and clicking on "Show Source" in my browser, I find that his email address jr2@telefonica.net is encoded in the following weird way:

<a href=

"&#109&#97&#105&#108&#116o:%6a%722@t%65%6c%65%66%6f%6eica%2e%6ee%74"

title=

&#106&#114&#50@&#116&#101&#108&#101&#102&#111&#110&#105&#99&#97&#46ne&#116

>

&#106&#114&#50@&#116&#101&#108e&#102o&#110&#105&#99&#97&#46&#110&#101&#116

</a>

Most, but not all, characters are decimal encoded in the form &#xxx or hex encoded as %xx.  For example in the href part, the "ica" in telefonica is not encoded.  Now I don't see any reason to encode the email address.  It is supposed to consist of standard 7-bit ascii characters by some Internet RFC.

The reason I noticed this, is that this weird mixture of decimal/hex encoded and nonencoded characters sometimes results in incorrect characters being rendered by the browser: eg. if the 5 in 54 is decimal encoded as &#53, while 4 is not encoded, the result is &#534, which is some weird unicode character.

By the way there are also some weird characters showing up in the "Moodle sites" page on moodle.org.  Perhaps that is another manifestation of this problem.

Average of ratings: -
In reply to Zbigniew Fiedorowicz

Re: Weird character encodings in profile pages

by Zbigniew Fiedorowicz -

OK, I get it, it's an antispam measure.  However, it sometimes misfires. Here's an example:

<a href=

"&#109&#97i&#108&#116&#111:%70%61t%65%6c.%36%38%34@%6f%73u.%65%64%75"

title=

"p&#97&#116e&#108&#46&#548&#52@os&#117&#46&#101du">

p&#97t&#101&#108.&#548&#52&#64o&#115&#117&#46&#101&#100&#117</a>

Note the &#548

 

In reply to Zbigniew Fiedorowicz

Re: Weird character encodings in profile pages

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
This is actually a feature to protect against spammers stealing email addresses. This is the first problem I've heard about it (it's been running on the this site for months) ... What browser are you using?

We may have to make it an option.
In reply to Martin Dougiamas

Re: Weird character encodings in profile pages

by Zbigniew Fiedorowicz -

Martin, take a look at my profile page at this site:

http://moodle.org/user/view.php?id=2466&course=5

I believe that this is something that got broken either in version 1.0.9 or 1.1. I'm pretty sure I would have noticed it during my spring class running under 1.0.8.1

Zig

 

In reply to Zbigniew Fiedorowicz

Re: Weird character encodings in profile pages

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
You had put in some strange address (fiedorow+123456789012345678901234567890123456789012345678901234567890@math.ohio-state.edu)
but I do see the behaviour - it seems to be only addresses with numbers in them. I wasn't seeing the corruption on Jose's http://moodle.org/user/view.php?id=6970&course=5 page until I reloaded a few times.

Definitely needs fixing to ignore numbers.
In reply to Martin Dougiamas

Re: Weird character encodings in profile pages

by Zbigniew Fiedorowicz -

Re my strange address: in recent versions of sendmail one can add a parameter of the form "+some_string" to the user name, which is ignored when the mail is delivered. This can be useful for filtering mail.

 

In reply to Martin Dougiamas

Re: Weird character encodings in profile pages

by Zbigniew Fiedorowicz -

Here's my fix for this in weblib.php:

function obfuscate_text($plaintext) {
/// This function takes some text and replaces about half of the characters
/// with HTML entity equivalents.   Return string is obviously longer.
    $i=0;
    $length = strlen($plaintext);
    $obfuscated="";
    $prev_obfuscated = false;
    while ($i < $length) {
        $c = ord($plaintext{$i});
        $numerical = ($c >= ord('0')) && ($c <= ord('9'));
        if ($prev_obfuscated && $numerical ) {
          $obfuscated.='&#'.ord($plaintext{$i});
        } elseif (rand(0,2)) {
          $obfuscated.='&#'.ord($plaintext{$i});
          $prev_obfuscated = true;
        } else {
          $obfuscated.=$plaintext{$i};
          $prev_obfuscated = false;
        }
      $i++;
    }
    return $obfuscated;
}

 

In reply to Zbigniew Fiedorowicz

Re: Weird character encodings in profile pages

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Many thanks, Zig - nice work. All safely tucked into bed.
In reply to Martin Dougiamas

Re: Weird character encodings in profile pages

by Peter Ruthven-Stuart -
Picture of Plugin developers
I am a new user and have set up an experimental Moodle on my Mac.

Looking at the source code I realised that the coding of email addresses was to fool spammers. However, the problem is that the addresses are rendered incorrectly, and furthermore the 'mailto:' bit at the head of an email address is often messed up. e.g. it becomes 'mil:' or 'mal:'.

Any advice?
In reply to Peter Ruthven-Stuart

Re: Weird character encodings in profile pages

by Martin Dougiamas -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
Can you identify some exact cases where this occurs? As far as I know there are no problems with this feature in 1.1.1 and later.
In reply to Martin Dougiamas

Re: Weird character encodings in profile pages

by Peter Ruthven-Stuart -
Picture of Plugin developers
I am using Moodle 1.1.1 running on a Macintosh.

The problem mentioned above (garbled 'mailto:' at the head of email addresses) seems to be an IE Mac problem.

When I click on a name in the list of users I am taken to the profile of that user - fine so far.

With mac IE 5.2.2 (5010.1) if I move the mouse arrow over the email address I can see the 'link' in the status bar. Here I can see that the 'mailto:' part has been altered, always with 'm' at the beginning and the colon at the end, but there are amost always letters missing; e.g. 'milt:' or 'malt:' or 'mil:' or 'malto:' or 'mailo:' etc.

When I reload the page this changes. Sometimes the correct 'mailto:' appears.

I have checked this on a Windows machine using both IE and Netscape and the problem is not duplicated. Furthermore, the problem does not occur in Mac Netscape or Safari.

Certainly not a big problem, but can this be fixed?