Questions about code causing new lines to be replaced...

Questions about code causing new lines to be replaced...

by Marc Lavoie -
Number of replies: 0
Allo!

I have noticed a rather interesting effect. I have many pages created via the free html editor Nvu. Note that Nvu has the capcity of reformating the page so that it is easy for humans to read while ensuring the html is still valid. This means that the editor may rearrange where new line charaters are located in the webpage, as appropriate, to make the code easier for humans to read.

The various web pages had images and/or tables. I pasted the page from Nvu where the page as created, and save the page. When you look at it, everything seems fine. I go to edit or tweak the page later on, and the page is garbled, even if I do nothing but go into edit mode. If I save it at this point the garble is permanent. The result is that table and images tags are either missing or garbled.

In fact, I could see what was going on when you look at the source code of the page. All of the unix new lines were converted to 
 numerical entities. This, in combination with further processing to remove any invalid code, seemed to garble the page to the point that the browser is no longer able to make out image tags and table tags went missing altogether.

So, in short, things are fine when you put them in the first time. However, any subsequent editing using the editor can ruin a page.

Ran a large number of experiments. If I turn off the HTML Editor (via Appearance--->HTML Editor, the "Use Editor" setting) the issue does not happen any more. This meant that the editor itself is doing the tweaking.

After taking some time figuring out what is involved with the text editor, I discovered the folder: moodle/lib/editor/

I started to analyse the files in that folder. Discovered a function, toHtml, in the file: htmleditor.php


function toHtml(){
 if ($this->_canUseHtmlEditor && !$this->_flagFrozen){
 ob_start();
 use_html_editor($this->getName(), '', $this->getAttribute('id'));
 $script=ob_get_clean();
 } else {
 $script='';
 }
 if ($this->_flagFrozen) {
 return $this->getFrozenHtml();
 } else {
 return $this->_getTabs() .
 print_textarea($this->_canUseHtmlEditor,
 $this->_options['rows'],
 $this->_options['cols'],
 $this->_options['width'],
 $this->_options['height'],
 $this->getName(),
 preg_replace("/(\r\n|\n|\r)/", '
',$this->getValue()),
 $this->_options['course'],
 true,
 $this->getAttribute('id')).$script;
 }
} //end func toHtml


Notice the line preg_replace("/(\r\n|\n|\r)/", '
',$this->getValue()).

This line basically replaces all instances of new lines, whether dos, unix or mac, with 


This, in of itself, is not necessary bad if the new line is in a convenient place, such as the outside of tags. However, if the new line is inside a tag just after an attribute or tag name, (which is perfectly legal html), this would cause the editor to have a fit, not to mention browsers themselves failing to display the resulting html code properly, if at all.

Overall, I am not sure what was the reasoning behind doing the replacement. Perhaps this was trying to fix some other bug. Perhaps they were trying to make sure that the newlines were all consistent. In any case, I have commented out the line

preg_replace("/(\r\n|\n|\r)/", '
',$this->getValue()).

and replaced it with the following line:

$this->getValue().

The function is now as follows:


function toHtml(){
 if ($this->_canUseHtmlEditor && !$this->_flagFrozen){
 ob_start();
 use_html_editor($this->getName(), '', $this->getAttribute('id'));
 $script=ob_get_clean();
 } else {
 $script='';
 }
 if ($this->_flagFrozen) {
 return $this->getFrozenHtml();
 } else {
 return $this->_getTabs() .
 print_textarea($this->_canUseHtmlEditor,
 $this->_options['rows'],
 $this->_options['cols'],
 $this->_options['width'],
 $this->_options['height'],
 $this->getName(),
// Commented out by Marc Lavoie
// preg_replace("/(\r\n|\n|\r)/", '
',$this->getValue()),
// Replacement line by Marc Lavoie
 $this->getValue(),
 $this->_options['course'],
 true,
 $this->getAttribute('id')).$script;
 }
} //end func toHtml


My questions are as follows:

Why was the code written to do this replacement? Is there a technical reason? I see no reason to perform such a thing, so I am confused why it was in the code in the first place. As I discovered, such replacement could render a page invalid. Was it a bug fix or an attempt to introduce consistency in new lines? However, more importantly, is the fix that I preformed going to introduce or reintroduce another bug that I have yet to see?

I am running moodle 1.8 and it was updated a few days ago. I am using FireFox 1.5 and FireFox 2.0 depending on the computer I am on.

Marc Lavoie



Average of ratings: -