(Moodle 2.1.2) Problem with Unziping zip files that contain special characters

(Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Pedro Crispim -
Number of replies: 15

Hello everyone!

I have a problem with unzipping zip files that contain files with special characters in their names, it ignores them and do not unzip those files.

Wonder if someone else has noticed this and started an issue in Moodle Tracker. I searched, but only found an issue in Moodle < 1.9, so I created MDL-30436

Moodle is running on LAMP (CentOS 6). Used both 7-zip and windows internal zip to create zip files (in Windows 7).

Average of ratings: -
In reply to Pedro Crispim

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Dan Marsden -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Plugins guardians Picture of Testers Picture of Translators

sounds like this php bug:
https://bugs.php.net/bug.php?id=51929
bit of other info in MDL-24928

In reply to Dan Marsden

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Pedro Crispim -

Dan:

Indeed, it was MDL-24928. Thank you!

In Moodle 1.9, we could configure Path to ZIP, so that Moodle could use zip instead of PHP/ZIP. Is that still possible?

In reply to Pedro Crispim

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Gary Sutcliff -

I have been using zipped files for a few years now and many times students have illegal characters in the names of the files.  I never "unzip" a file as such, I double click on the zipped file and drag the enclosed file to a folder where the system unpacks them for me.  So far, I have not seen any problems.

What characters give you problems?  I can check my system to see what happens here in 1.9 moodle.

In reply to Gary Sutcliff

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Dan Marsden -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Plugins guardians Picture of Testers Picture of Translators

the way 1.9 handles zip and the way 2.0 handles zips is quite different - also you are referring to using your local OS to handle the zip/unzip - I presume the original reporter is using the "built-in" PHP/Moodle unzip rather than using the unzip on their local machine and there is a known issue with UTF8 file names when using built in PHP zip handling in Moodle 2.x as referenced above.

In reply to Gary Sutcliff

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Why don't people use portable file names, like posix?
http://en.wikipedia.org/wiki/Filename
In reply to Visvanath Ratnaweera

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

Why didn't you write your forum post in Chinese?

Presumably becuase you don't speak/write chinese, and therefore it would be difficult for you. Why should we force the whole world to use the roman alphabet?

In reply to Tim Hunt

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Dan Marsden -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers Picture of Plugins guardians Picture of Testers Picture of Translators

heh - I was just going to post a "don't feed the troll" image but Moodle didn't have a smiley that worked easily..... evil <- that will have to do.....

In reply to Tim Hunt

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Sorry sir, we are not talking about the content within rather about meta information, which interacts directly with the OS. How about translating the whole Unix command language into Chinese?

I work in a multilingual country. The people here know that they can't have their umlauts, accents, "Scharfes s", etc. in keywords - not to mention my mother tongue http://en.wikipedia.org/wiki/Sinhala_language.

(Tim, you know it better, this is for the benefit of the casual visitor. I'm not biting the "troll bait" in the follow up, BTW.)

In reply to Visvanath Ratnaweera

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

File names are not just meta-information. In some scenarios they are an important part of the file.

That was one of the reasons why we did the file changes in Moodle 2.0, so the user-visible file name would be stored in the database, in UTF-8, while the files stored in moodledata have filenames that are just [a-f0-9]*.

It sounds like the zip standard includes UTF-8 file-names, but the zip libraries do not always handle that. That is a bug in the zip libraries, that needs to be fixed.

In reply to Tim Hunt

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Tim

You wrote:
> File names are not just meta-information. In some scenarios they are an important part of the file.

Could you bring those examples? http://en.wikipedia.org/wiki/Filename says, "The filename is metadata about a file; ...".

@all
There is a related discussion http://moodle.org/mod/forum/discuss.php?d=205474 in the "Moodle documentation" forum wating for an answer.
In reply to Visvanath Ratnaweera

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Tim Hunt -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

I am absolutely not going to get into a semantics argument on wikipedia about whether the filename is data or metadata.

Suffice it to say that, in Moodle, when teaching, some teachers care a lot that they can call their file Café.ppt, so Moodle has to be able to handle that.

In reply to Tim Hunt

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
I'm sorry, sir. The question in this sub-thread is not whether Wikipedia is right. You made the statement: "File names are not just meta-information. In some scenarios they are an important part of the file." Where is the evidence?

The broader topic, what characters should be allowed in file names continues in the sub-thread http://moodle.org/mod/forum/discuss.php?d=190958#p831268. You are welcome to bring (technical) arguments there. "Suffice to say" is not an argument! Not to mention that this thread was started by somebody who had problems due to special characters in file names.
In reply to Visvanath Ratnaweera

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Pedro Crispim -

Visvanath:

The purpose of UTF-8 was to allow everyone to use their language characters in filenames (among other things, of course). Moodle already supports UTF-8, which is a great thing.

But I think it's not OK to ask every teacher and every student in my school: "please, dont use special characters in your filenames or directories".

In portuguese, there are words that have different meanings if they lack the accents, and we do use a lot of accents in our words. And we also have the "ç", that is different from the "c", and that could cause different words to be written.

So, the effort taken to give computers the capacity of "knowing" how to deal with almost all characters, in different types, is indeed a good thing.

Don't take me wrong, I do understand your point of view. But it should not be necessary for us to adapt to computers: they have the computational power to be adapted according to our needs - and that's how it should be!

Kind regards.

In reply to Pedro Crispim

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Pedro

> The purpose of UTF-8 was to allow everyone to use their language characters in filenames (among other things, of course). Moodle already supports UTF-8, which is a great thing.

If your people use UTF-8 characters in file names and you all don't have problems, why not?

Let me describe my environment. Switzerland is highly multilingual. They have four official languages, in my area German is the first language, and French is the second. So we get all ä, ö, ü and ß from German and a whole bunch of characters like à, é, è, ô. And peoples names can have anything from ISO-8859-1 or -15.

I think because they are multilingual they are used to work with different alphabets and have much less problems in understanding that computer has its own alphabet. André Müller knows that his e-mail is andre.mueller@his.org or, in the worst case, mulleand@his.org! And I don't think they try to save a digital recording of a song in http://en.wikipedia.org/wiki/Romansh_language in its Rumantch name!

On the other hand, files with special characters in their names might "work" in one computer system but not in another. Just an hour after my previous post, I tried to backup a huge directory from my ext4 file system to an external USB drive on NTFS and got endless "wide charactor" errors. The reason, collegues using other OSes have used ä, ö, ü in file names. Somehow I could transfer them to my Linux system, I guess either through Samba or HTTP. Now before moving them to the external drive, I have to rename them. The greatest adventure I had was with a file name starting with "spaces" - yes, there were two of them!
sad

> So, the effort taken to give computers the capacity of "knowing" how to deal with almost all characters, in different types, is indeed a good thing.

That is exactly what I feared. It is not UTF-8 but a subset of it. Is it clear to everybody what that subset is?

Look at this: http://moodle.org/mod/forum/discuss.php?d=171081. Guillermo M. reverse engineered the 7-bit ASCII set to find out the subset of special characters allowed in passwords. Imagine repeating the exercise for UTF-8!

> But it should not be necessary for us to adapt to computers: they have the computational power to be adapted according to our needs - and that's how it should be!

Considerate computers? I'll be glad to meet them.
sad

P.S. Since version 2.0 Moodle has a new way of saving files. Unlike 1.x the file system of the server does not see the "real" name of the file. The real name is just a database entry. So Moodle make you to believe a fancy file name (this may not be accurate, I still don't have 2.x in a production system). So main point of this discussion is general, "should one use the whole UTF-8 character set in file names".
In reply to Visvanath Ratnaweera

Re: (Moodle 2.1.2) Problem with Unziping zip files that contain special characters

by Guillermo Madero -

Hi all,

I do agree with the idea of avoiding special characters in filenames. As a Moodle admin, the policy I gave the course developers was not to use them at all.

My own environment is Windows (though my servers are Linux), and eventhough the OS doesn't have any problem with accented characters or others like "ñ", utilities do tend to have them. For example, I use a Windows version of "tar" and it doesn't handle them (e.g. "á" gets converted into "ß").

Even with M 2.x, the server does get to see the real filename when one uses the File system repository.

Average of ratings: Useful (1)