File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Number of replies: 12Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Anyhow We need to somehow get this done. I will try to go through the code and findout the thing...
In case, we solved this in code level, Moodle accepts?
Sarves
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
But I like to make one comment. If Moodle doesnt support for Unicode in the future... I don't know what to say...
As far I know Since Windows XP and almost all the latest Linux distributions supports for Unicode and especially they support for Tamil language. So I don't think they will be a portability issue.
~Sarves
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
There is a plan to de-couple where/how the files are saved from what the user sees. At that point, we can probably show the user fancier names. But at download time for example we would have to cleanup the filename so that the user downloading the file does not get a potentially dangerous filename (ie: with backticks or with a tricky '.gif.pif' extension).
Such is the sad state of OS security these days. Moodle has to work hard to avoid being a vehicle of attacks from to users to the server, and from users to other users.
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Moodle wants to ensure that the name of the file only contains characters that the operating system will recognize as valid
Exactly - and also only chars that pose no risk. Lots of characters have special meanings -- and filenames can in some cases be used to attempt an attack on a server. Google for "backticks exploit" --
The code that does that cleanup was my first patch to Moodle (Eloy patched it at the same time -- we had a bit of a race condition there!). I was doing a security audit on v1.3 and found that I could get it to save files with backticks -- but not execute anything. The audit covered several other LMSs and in one case -- that was not moodle but shall remain nameless -- I managed to get code executed on the server. This was using a backticks-based trick.
Unfortunately, it's not safe to just replace the dangerous characters as that list is ever-growing
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
the security is not a main problem here, because we could have "less secure" mode in theory. The real problem is the unicode characters in zip files. It sort of works if you have the same windows version+windows configuration+unzipping binary - once you change any of this it breaks really badly.
We depend on zipping in many areas and I do not know a way around this problem. Zipping fully supports only ascii characters, there is no alternative for zip in PHP afaik.
skodak
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
http://www.phpit.net/article/creating-zip-tar-archives-dynamically-php/
Peace - Anthony
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
This may be a bit convoluted, but we can encode filenames in ASCII using MIME Header enconding[*] when putting the files in the ZIP archive, and get UTF-8 back from that ASCII when extracting them.
Of course this assumes the original filename encoding was UTF-8 (which I'm not really sure Moodle currently enforces or not).
Saludos. Iñaki.
[*] See mb_encode_mimeheader()/mb_decode_mimeheader(), from the mbstring extension we currently 'recommend'.
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Partial solution could be:
1/ Support unicode in student submitted files only - we store these files in moddata, we do not need the real filenames in the filesystem, we could store them in database instead
2/ Move resource files into moddata and do the same as 1/
I hope the file handling will be reworked in 2.0, we could also obsolete both current zipping methods and used the builtin zipping from PHP 5.2
Re: File upload_manager replaces the Indic UTF8 with '_'? is it a Bug?
Still I have one doubt...
Unicode is based on code points and there is a code point range for special characters. So by looking at these code point ranges and filenames' code points, cant we tell whether the filenames have special characters or not?
Which php script this clean up is done?
~Sarves