File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Sarveswaran Kengatharaiyer -
Number of replies: 12
Picture of Translators
When I try to upload a file which has Tamil name, the Moodle replaces the filename with '_'. What could be the problem?
Average of ratings: -
In reply to Sarveswaran Kengatharaiyer

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
Sarveswaran - I suspect that this is a feature to ensure a valid OS filename. I've not looked at the restrictions but I know that I have seen similar behavior with spaces being removed. I suspect that Moodle wants to ensure that the name of the file only contains characters that the operating system will recognize as valid for a filename. Since not all OSs are created equally Moodle may have (I have not looked at the code) chosen a least common denominator. Does this explanation make sense? Peace - Anthony
In reply to Anthony Borrow

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Sarveswaran Kengatharaiyer -
Picture of Translators
yes Anthony.. it does. I also have similar problem with thunderbird mail client. it gives an error message when we try to upload a file with indic language file name. But Moodle silently replace the filename with '_'.
Anyhow We need to somehow get this done. I will try to go through the code and findout the thing...
In case, we solved this in code level, Moodle accepts?

Sarves

In reply to Sarveswaran Kengatharaiyer

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
Sarves - The decision as to whether Moodle would want to incorporate any changes is above my pay grade wink It would depend on a number of factors the largest one being how would it impact other users. The major issue is portability and the current way of handling it probably tries to provide a solution that will work on virtually all of the known/used file systems. So any solution you might provide would need to ensure that at the very least it does not break that. You may want to create an issue in the tracker as an improvement (check and make sure one does not already exist). Regardless of whether Moodle decides to incorporate the change or not, if you create a working patch for yourself it could either be a custom patch that you maintain on your server or if you want to share the code with others we could consider adding it to CONTRIB as a patch. Peace - Anthony
In reply to Anthony Borrow

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Sarveswaran Kengatharaiyer -
Picture of Translators
Yes you are right Anthony.... I will put an issue.
But I like to make one comment. If Moodle doesnt support for Unicode in the future... I don't know what to say...
As far I know Since Windows XP and almost all the latest Linux distributions supports for Unicode and especially they support for Tamil language. So I don't think they will be a portability issue.

~Sarves
In reply to Sarveswaran Kengatharaiyer

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Martín Langhoff -
We do support unicode - no problem. But we are kind of paranoid on what's written to the filesystem. And for good reasons.

There is a plan to de-couple where/how the files are saved from what the user sees. At that point, we can probably show the user fancier names. But at download time for example we would have to cleanup the filename so that the user downloading the file does not get a potentially dangerous filename (ie: with backticks or with a tricky '.gif.pif' extension).

Such is the sad state of OS security these days. Moodle has to work hard to avoid being a vehicle of attacks from to users to the server, and from users to other users.
In reply to Anthony Borrow

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Martín Langhoff -

Moodle wants to ensure that the name of the file only contains characters that the operating system will recognize as valid

Exactly - and also only chars that pose no risk. Lots of characters have special meanings -- and filenames can in some cases be used to attempt an attack on a server. Google for "backticks exploit" -- wink

The code that does that cleanup was my first patch to Moodle (Eloy patched it at the same time -- we had a bit of a race condition there!). I was doing a security audit on v1.3 and found that I could get it to save files with backticks -- but not execute anything. The audit covered several other LMSs and in one case -- that was not moodle but shall remain nameless -- I managed to get code executed on the server. This was using a backticks-based trick.

Unfortunately, it's not safe to just replace the dangerous characters as that list is ever-growing sad

In reply to Martín Langhoff

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Anthony Borrow -
Picture of Core developers Picture of Plugin developers Picture of Testers
Martín - Thanks for explaining the security implications here and for your work at keeping things secure. Peace - Anthony
In reply to Martín Langhoff

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
hi,

the security is not a main problem here, because we could have "less secure" mode in theory. The real problem is the unicode characters in zip files. It sort of works if you have the same windows version+windows configuration+unzipping binary - once you change any of this it breaks really badly.

We depend on zipping in many areas and I do not know a way around this problem. Zipping fully supports only ascii characters, there is no alternative for zip in PHP afaik.

skodak
In reply to Petr Skoda

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Iñaki Arenaza -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

This may be a bit convoluted, but we can encode filenames in ASCII using MIME Header enconding[*] when putting the files in the ZIP archive, and get UTF-8 back from that ASCII when extracting them.

Of course this assumes the original filename encoding was UTF-8 (which I'm not really sure Moodle currently enforces or not).

Saludos. Iñaki.

[*] See mb_encode_mimeheader()/mb_decode_mimeheader(), from the mbstring extension we currently 'recommend'.

In reply to Iñaki Arenaza

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Petr Skoda -
Picture of Core developers Picture of Documentation writers Picture of Peer reviewers Picture of Plugin developers
The encoded filenames would not be compatible with winzip, windows unpacking and similar tools. It would not be also trivial to implement if infozip binary used on server (files would have to be copied to temp area, renamed and zipped there).

Partial solution could be:
1/ Support unicode in student submitted files only - we store these files in moddata, we do not need the real filenames in the filesystem, we could store them in database instead
2/ Move resource files into moddata and do the same as 1/

I hope the file handling will be reworked in 2.0, we could also obsolete both current zipping methods and used the builtin zipping from PHP 5.2
In reply to Martín Langhoff

Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?

by Sarveswaran Kengatharaiyer -
Picture of Translators
Yes Martín, you are correct and thanks for the information.
Still I have one doubt...
Unicode is based on code points and there is a code point range for special characters. So by looking at these code point ranges and filenames' code points, cant we tell whether the filenames have special characters or not?

Which php script this clean up is done?

~Sarves