How should the URL auto-linking filter work?

How should the URL auto-linking filter work?

av Tim Hunt -
Antall svar: 7
Bilde av Core developers Bilde av Documentation writers Bilde av Particularly helpful Moodlers Bilde av Peer reviewers Bilde av Plugin developers

Consider the following three example URLs in context:

  1. See this wikipedia page: http://en.wikipedia.org/wiki/Slash_(punctuation).
  2. Here is a picture of a smile: http://example.com/render/emoticon.php?s=smiler.
  3. See your favourite news source (e.g. www.bbc.co.uk).

I typed those as plain text, so you can see how Moodle's filter_urltolink handles them. Currently it gets 1. and 2. right, but 3. wrong.

I filed MDL-22390 ages ago because I am always doing things like 3., and I had never thought of examples 1. and 2.

I am pleased to say that this filter has unit tests, including those tricky cases, so I am now aware of them.

Clearly we cannot correctly handle all three examples (at least not without serious artificial intelligence). So, I have come here to ask what we think would be the better behaviour. Comments please.

Gjennomsnittlig vurdering: -
Som svar til Tim Hunt

Re: How should the URL auto-linking filter work?

av Hubert Chathi -

As of right now, all three links are incorrect, and #3's problem has nothing to do with the bracket.  Here's what I see:

My guess is that #2 is a less-important case to worry about.  Maybe it should ignore closing brackets, unless the URL already has an opening bracket?  Then it would handle #1 and #3 correctly.  If you really wanted to, I guess you could try to match the number of opening and closing brackets to handle "(e.g.http://en.wikipedia.org/wiki/Slash_(punctuation))." correctly but of course, then you can't use a regexp (at least, not without some extra processing).

For all the problems that auto-linking causes, I'm surprised that there isn't a "best practices" somewhere.  (Or at least, there isn't one that I'm aware of.)

Som svar til Tim Hunt

Re: How should the URL auto-linking filter work?

av Gareth J Barnard -
Bilde av Core developers Bilde av Particularly helpful Moodlers Bilde av Plugin developers

Dear Tim,

In continuation from the dev chat I consider that:

http://www.w3.org/Addressing/URL/url-spec.txt

Needs to be supported in terms of allowable characters.  In that stipulation therefore with the example being the ')' character then there is a Catch-22 in terms of automated URL detection based upon characters alone.  Therefore there does need to be an element of additonal intelligent logic.

Therefore based upon the presumption that the URL that the user enters is valid and reachable, then the initial maximum possible URL be determined, then tested for existence.  If it exists then the filter calculation is valid.  If it does not exist, then reduce the number of filtered characters by one and repeat the check.  If that is then not valid, then flag up for the non-artificial intelligent human to state the intended destination.

Cheers,

Gareth

Som svar til Gareth J Barnard

Re: How should the URL auto-linking filter work?

av Hubert Chathi -

If you check that a URL is reachable, you have to be able to handle pages that the user is able to reach, but the server is not able to, for whatever reason.  e.g. the page is password-protected, or the server is behind a restrictive firewall.

You'd probably also want to limit the characters that you would try chopping off -- probably just to punctuation characters.  e.g. in the URL http://example.com/foo/bar)., the comma, period, and closing parenthesis are questionable whether they belong to the URL or not, but the rest of the URL is not questionable.  This way, you don't hammer some other server with 100 requests when someone types in an incorrect URL.

Som svar til Hubert Chathi

Re: How should the URL auto-linking filter work?

av Gareth J Barnard -
Bilde av Core developers Bilde av Particularly helpful Moodlers Bilde av Plugin developers

So, therefore, implement the 'request for comments' updated version and when the filter muffs it up allow the user to correct it.

A Google gives:

http://eureka.ykyuen.info/2012/01/19/php-convert-url-into-clickable-link-with-urllinker/#more-8656 -> https://bitbucket.org/kwi/urllinker/src

https://code.google.com/p/php-rfc-3986/

I have no idea if they are better than what is current in place, but might shed light on current issues.

Som svar til Gareth J Barnard

Re: How should the URL auto-linking filter work?

av Hubert Chathi -

From a quick glance at the code, urllinker has a set of characters that it doesn't consider to be part of the url if it's at the end of the code.  So with Tim's example #1, it would skip the closing parenthesis (incorrectly) and and period (correctly).

As far as I can tell, php-rfc-3986 only checks whether a string is a URL, and doesn't try to guess what the user's intent was.

Som svar til Hubert Chathi

Re: How should the URL auto-linking filter work?

av Tim Hunt -
Bilde av Core developers Bilde av Documentation writers Bilde av Particularly helpful Moodlers Bilde av Peer reviewers Bilde av Plugin developers

Please bear in mind that this is a Moodle text filter. It runs tens of times on every single page. It must be very, very fast.

Gjennomsnittlig vurdering:Useful (1)