Word Censorship (with overrides)

Word Censorship (with overrides)

by John White -
Number of replies: 21
Hi All,

At least two recent discussions in the General Problems forum have raised the issue of how 'Word Censorship' has no mechanism
for recognising when an 'acceptable' word is compounded from an 'unacceptable' one, e.g. Peacock.
This issue definitely worries some Moodlers more than others, and hitherto we could only really take out
some less offensive words from the badwords list to let such compounds through.

Attached is a new Censorship filter that gets round this problem by allowing a goodword list as well as the bad.
The default badword list remains as before, and can be superceded by a custom list in the filter Settings panel as before.
Any compound word you want to pass through, can be placed in the custom goodword list, without the need to edit the badword list.
However, there is no default goodword list - you have to add those as they arise.
Adding a goodword to your custom list can even be used to relax the badword list, without the need to replicate the rest
of the default list in the Settings panel, because when a goodword is identical to a bad the goodword has it!

The filter has also addressed a minor bug issue in the original Word Censorship filter, that you could stall the filter (so that it ignored part or all of your list)
by accidentally leaving consecutive commas or a trailing comma in the badword list.

Also, because the new filter does not make calls to the filter_phrases() function it is not attacked by having half its 'stars' removed in the process!
Fundamentally, calls to filter_phrases() could not have allowed for the goodword overrides.

To install the filter, the zip files need unpacking into filter/censorship (not censor),
but then the file censorship.php must be dragged into lang/en_utf8 to provide the English language strings.

Finally, there are 3 string definitions to add to the file: lang/en_utf8/admin.php in the alphabetic list (probably just above googlemapkey),
these are:

$string['goodwordsconfig'] = 'Enter your list of good words that you do not want censored, separated by commas.';
$string['goodwordsdefault'] = 'No default good word list is expected.';
$string['goodwordslist'] = 'Custom good words list';

I have tested it well on two 1.9.1 sites - and learnt a lot of bad words doing so!
Your feedback would be appreciated.
That's it. Regards,

John
Average of ratings: -
In reply to John White

Re: Word Censorship (with overrides)

by Ray Lawrence -
Hi John,

This looks like an interesting potential start to enhancement to core. Have you considered posting this in the tracker so that it can be considered as an Improvement?

I have doubts about the efficacy of this approach to moderation but this has come up repeatedly in the schools and colleges we've been working with over the last few weeks.

The addition of a white list option would be most welcome in some quarters (and as has been mentioned in these forums before, especially in a renowned North Lincolnshire town).
In reply to Ray Lawrence

Re: Word Censorship (with overrides)

by John White -
Thanks Ray.

I had not thought it was up to me to put this in the tracker, but I will take up the suggestion.

However, your caveat ('doubts about the efficacy of this approach') stings a little.

I can only respond in a technical fashion.

I have considered carefully the methodology needed to do this efficiently, with as few passes, as few tests and as few calculations as possible, and then only released the version that both carries the least code and works the fastest in very long passages of text.

On a local installation, this version takes 0.00028s to parse a topic that contains just a forum link, but then takes 0.00067s to find one 'goodword' and one 'badword' in 200 words of plain text in a topic summary. In other words, its presence is not detectable by humans! For obvious reasons it need take very little more from a remote server. This also makes it faster than censor.php that takes 0.00083s and 0.00124s respectively on the same tests (except that of course it messes up the 'goodword'). Note: none of these times include the one-time setup of the badwords and goodwords arrays, or the resulting filter objects; all these tests were on an iMac OS X in Firefox 3.

Within the code, I have used two lots of preg_match_all(), per loop, to find the instances of a given badword (a 'blot') and any already-censored badwords (a 'span'). The algorithm then uses an efficient 'double shuffle' process to advance both lists, censoring any blot not already in a span. Except, of course, that it calls the function goodword_check() to see if that blot is part of a compound 'goodword' before censoring it. In all, this makes for very few parsings per badword.

It is true that as there is only one place in the code from which goodword_check() is called, the function could be dispensed with and absorbed into the censorship_filter() function, but where it is it aids readibility. This is also true of the 'override' class, and of variables $lmask, $rmask, and $rspan that aid compactness.

A key to the whole process is that goodwords are only associated with the badwords they are compounded from within the filter object, this is considerably more efficient than finding all the goodwords with every badword pass.

Just about all variables have been considered and positioned for the least number of sets and resets within the exercise of the algorithm. For example, the strlen() of a badword is only accessed if one has already been found in the target text (bearing in mind that this is a very rare event), rather than when entering the while-loop with that badword to test for.

Finally the use of strcasecmp(substr(...)...) could be questioned. Why not a preg_match() instead? The answer is that the target and its expected position relative to the badword are known absolutely, so there is no need for any fuzzy pattern-matching or positioning, or the overheads that go with that.

Of course, I have annotated the code well, so that it would not be difficult for others to see whether there is a more efficient approach to the problem, and consequently I given this the version number: 0.1.1. More particularly, we need to know whether this approach will work in all language environments!

So I would certainly commend it to North Lincolnshire.
And hope that someone will test it in Hebrew & Arabic!

Regards,

John


In reply to John White

Re: Word Censorship (with overrides)

by Ray Lawrence -
Hi John,

I think you may have misunderstood my remark. I don't have a view on the technical implementation, rather the the behaviour of those who will be entering the text.

If you were to think of an inappropriate word and then I asked you to disguise it using alternative characters, unusual spacing etc. I'm sure you could come up with an alternative to beat most filters of this type pretty quickly.

I think that the best long term solution is skilful facilitation, establishing an understanding of expected behaviour, creating a climate of respect etc.
In reply to Ray Lawrence

Re: Word Censorship (with overrides)

by Marcus Green -
Picture of Core developers Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers
A potential blessing for the good people of Scun thorpe
In reply to Marcus Green

Re: Word Censorship (with overrides)

by John White -
...the very same North Lincolnshire town.

John
In reply to John White

Re: Word Censorship (with overrides)

by Ray Lawrence -
... and the way in which Marcus posted the name of the town makes the point more eloquently than I did. smile
In reply to Ray Lawrence

Re: Word Censorship (with overrides)

by John White -
Ray,

My misreading of your remark.
Still, I have now put the processes on record.

...

I entirely agree with you, especially in creating a climate of respect; my schools & colleges have always taken that view, within or without Moodle, and it does work.

Some of the Moodlers looking for the censorship solution (as a starting point) have the added problem of email self-registrations to contend with.
Of course, there does need to be a clear site policy focussing on the benefits of respecting persons and the site.

Regards,

John
In reply to John White

Re: Word Censorship (with overrides)

by Ray Lawrence -
Indeed. All the same a worthy contender for consideration for core. I hope you'll create an issue in the tracker for this patch. Please post the reference here if you do.
In reply to Ray Lawrence

Re: Word Censorship (with overrides)

by John White -
Ray,

I have done as suggested, the tracker reference is...

http://tracker.moodle.org/browse/MDL-15705

John
In reply to John White

Re: Word Censorship (with overrides)

by Helen Foster -
Picture of Core developers Picture of Documentation writers Picture of Moodle HQ Picture of Particularly helpful Moodlers Picture of Plugin developers Picture of Testers Picture of Translators
Hi John,

Thanks for sharing your enhanced word censorship filter and for creating an issue in the tracker for it. approve

Have you come across our guidelines for contributed code? Perhaps you could add your word censorship filter to the modules and plugins database to enable people to find it easily.
In reply to John White

Re: Word Censorship (with overrides)

by Web Developer -

John,

I was much exited when I found this improved filter. We are really having a bad time with Emily Dickinson studies. I followed your directions but it’s not working. We are getting stars in the name. Everything looked right when I installed it I had entered good words list and submitted but no result. What isn’t right? And if I want to add/append a new set of good words how do I do this? Any help would be greatly appreciated.

In reply to Web Developer

Re: Word Censorship (with overrides)

by Web Developer -

The issue was simple. Old filter has to be disabled and new filter enabled.

Site Administration > Modules > Filters > Manage Filters >

In reply to John White

Re: Word Censorship (with overrides)

by Steve Bilton -
The trackers states this hack effects moodle versions 1.8.6 through to 1.9+
However I have tested on a 1.8.8 moodle and this hack does not work.

I was quite thorough, no words are being filtered whether they be good, bad or ugly.

When the custom words list in the default filter has words in it AND is DISABLED this prevents the NEW censorship filter from working and uses the words specified only in this list.

Removing the custom bad words list allows the new censorship to over take, however it does not filter any terms.

I have disabled the default word censorship filter.
I have removed any custom words lists
I have enabled the new censorship filter
I have performed multiple tests within the modules: forum and lesson

Steve
www.sheilds-elearning.co.uk
In reply to Steve Bilton

Re: Word Censorship (with overrides)

by Don Mace -
Does this new filter work for 1.9.3? We are wanting to use it in place of the current filter but are having issues. Our Moodle site is third-party hosted so I will need to know if this filter has been updated for 1.9.3 in order for them to install it and specific instructions for that purpose. When we followed the instructions that were above the filter did not show up on the admin settings page. Any help will be appreciated, since we are a public school systems and some of our teachers are wanting to do a lesson on peacocks.
In reply to John White

Re: Word Censorship (with overrides)

by tleubek zakirov -

Hello,

Thank you for creating such a usdeful thing.

I have installed the files where and as instructed, but it won't work. I entered the word "cockatoo" in the good words list, but only "atoo" is there.

Could you please let me know what might be wrong?

thanks,

Tleubek Zakirov

In reply to John White

Re: Word Censorship (with overrides)

by Frankie Kam -
Picture of Plugin developers

Hi John.

Thanks for your improved censorship filter. I'm trying it out now. In creating a good word list, I've discovered this website that can help me to construct them:
http://www.morewords.com/contains/<type the word>

So, if you were to type the URL:
http://www.morewords.com/contains/ass
the output would be a very long list of words containing the substring "ass" as in "hassle".

I hope you find this website useful.
Frankie Kam
Melaka, Malaysia
http://moodurian.blogspot.com 

In reply to Frankie Kam

Re: Word Censorship (with overrides)

by Frankie Kam -
Picture of Plugin developers

It works! John, on my Moodle 1.9.7 site, your censorship filter filters the bad words, but doesn't touch the compound word that contains the bad word. That's how it is supposed to work.

Here's the proof:

In reply to John White

Re: Word Censorship (with overrides)

by Frankie Kam -
Picture of Plugin developers

Good word lists containing "bad" words:

List1
cockatoo,cockatoos,cockcrow,cocked,cockerel,cockerels,cockeye,cockeyed,cockfight,cockier,cockiest,cockily,cockiness,cockle,cockleshell,cockleshells,cockney,cockpit,cockpits,cockroach,cockroaches,cocksure,cocksurely,cocksureness,cocksurenesses,cocktail,cocktailed,cocktailing,cocktails,cockup,cockups,cocky,peacock,peacocked,peacockier,peacockiest,peacocking,peacockish,peacocks,peacocky,poppycock,poppycocks,recock,recocked,shuttlecock,shuttlecocks,stopcock,stopcocks,uncock,uncocked,uncocking,uncocks,weathercock,weathercocks,woodcock,woodcocks

List2
benedick, benedicks, dickens, dickenson, dickinson

List3
baseballs, basketballs, cannonballs, curveballs, eyeballs, footballs, meatballs, mothballs, racquetballs, screwballs, sleazeballs, slimeballs, snowballs, softballs, trackballs, volleyballs 

List4
accumulate,accumulator,acumen,capsicum,circumcenter,circumcise,circumcision,circumference,circumflex,circumfluent,circumfluous,circumnavigate,circumscribe,circumspect,cucumber,cucumbers,cumber,cumbered,cumbersome,cumquats,cumulative,cumulous,document,documentable,documental,documentary,documentation,documented,documenting,documents,encumber,incumbency,incumbent,locum,locums,scum,scumbag,succumb 

List5
swank,swanked,swanker,swankest,swankier,swankiest,swankily,swankiness,swankinesses,swanking,swanks,swanky 

For what they're worth, hope this helps.
Frankie Kam
big grin 

In reply to Frankie Kam

Re: Word Censorship (with overrides)

by Frankie Kam -
Picture of Plugin developers

List 6
cutwater, cutwaters, meltwater, meltwaters, outwatch, saltwater, wristwatch, wristwatches

List7
arsenal,arsenic,catharses,coarse,farseeing,hearse,hearses,hoarse,hoarsely,katharses,parse,parsed,rehearse,rehearsed,rehearser,rehearsers,rehearses,sparse,sparsely,sparseness,sparsenesses,unrehearsed

I have a question: If I put all these words, from List1 until List7, in a good word list and activate the censorship filter, will I be slowing down my Moodle site?

In reply to John White

Re: Word Censorship (with overrides)

by Frankie Kam -
Picture of Plugin developers

John, 

to solve the problem mentioned here (about the mouseovers over the **** that end up revealing the censored words), I had to modified some of the code inside the filter.php file of the censorship filter. The modified lines of code are shown in blue text below.

(Somewhere around line 67 of filter.php)

// Mask used to determine if already censored
//$lmask = '<span class="censoredtext" title="';
$lmask = '<span class="censoredtext" ';

//$rmask = '">';
$rmask = '>';

...

(and then somewhere about line 95 of filter.php)

//$bwords[$key] = new filterobject_with_override($badword, $lmask.$badword.$rmask, $rspan, str_pad('',strlen($badword),'*'));
$bwords[$key] = new filterobject_with_override($badword, $lmask.$rmask, $rspan, str_pad('',strlen($badword),'*'));

After this, in Moodle 1.9.x, go back to "Site Administration | Modules | Filters | Manage Filters" and disable (close the eye icon) of Censorship (with override) filter. This clears the cache. Then enable it again. It should now.

Frankie