WikiHome RecentChanges WikiNode Preferences chongqed.org

ContentBanning

Content banning is one of our AntiSpamRecommendations.

Use a blacklist to ban spammy content from your wiki. Good wiki software includes a blacklist file, for matching edit content against regular expressions. If after a user edits a page, the page contents get a match on any of the regular expressions, then the edit is blocked with a message. This method has proven to be more effective than IP banning at reducing spam.

Normally the same feature can be used to ban rude words, although in some software, the filter is only applied to external link URLs.

Automatic blacklist updates

Content banning becomes even more effective if you get automatic updates. You can block many of the latest spammers before they even visit your wiki, by getting updates from shared blacklists. We recommend updating your blacklist file with the list on here chonqed.org on a regular basis. The list is at: http://blacklist.chongqed.org/.

The best thing is to automate this process so that your wiki is always immune to the latest spammers. Such automation may be available as a built in feature of your wiki software, otherwise you could try using a cron job.

Effectiveness

Content banning is an effective way of keeping out the repeat offenders. Sadly it is not a silver bullet to end all spam problems, because new spammers appear every day (or SearchEngineOptimization companies take on new customers). Some spammers have a huge range of domain names, sometimes they lack imagination and just go for numerical domain names!

However we do recommend that administrators use the content banning features of their wiki software, and set up automatic updates. We also want to recommend to all wiki developers that they provide this as a built in feature of the software.

Disadvantages

There are no major disadvantages of using content banning, and it is simple to implement (already implemented in all good wiki software), which is why it is one of our AntiSpamRecommendations.

False positives could occurr if a user wanted to talk about spammers, and in doing so they innocently try to link to a spammer's website. In this case, there is no harm in preventing the edit, since we don't want to link to these sites, even within such discussions. It would be a mistake to ban them from the wiki in these circumstances, and so we don't recommend using automatic IP banning with the blacklist.

Other false positives could occurr if a mistake is made in defining a regular expression. Take care not to create an expression which matches on legitimate edits.

Standardised blacklist format

There is something of a standard emerging across several wiki engines (and blogging software), for the format of a content black list. It simply involves listing perl regular expressions one per line within the files. A match on any of the expressions means the content is bad. It is a good idea for wiki software to follow this standard (or at least import / export it) to allow sharing of blacklists between wikis.

Comments / Discussion

Wouldn't it be useful to define a more advanced standard format for a blacklist file? I'm thinking of extra information such as when the domain names was added to a blacklist, which site originally made the blacklist entry, when the regexp last caught a spammer. That way, even when people pass these file entries between each other, there is still a way to keep track of it, and allow old entries to timeout. Maybe this already exists. I've not looked into it really. – Halz - 2005-04-05 10:31 UTC

All of this would be useful. But I guess it cannot be done with our current approach that simply let's you grab a file filled with regular expressions. The file would become much larger and much more difficult to parse. Of course, another approach is something like DNS based blacklist. Although it would be ideal, this is way beyond my technical abilities and the limits of my hosting account. – Manni

This french chap seems to have a similar idea. He's has laid out a proposed format (described in french) for storing tracking information about a blacklist entry. Ultimately I suppose this information would allow people to build a kind of trust network, where changes are propagated a lot like DNS as you say. – Halz - 2005-04-25 13:21 UTC

That would increase the size of the blacklist a lot. Our list is already large. The simple format we use now is very easy to use and already implemented in several different wikis. More information would be good, but who is really going to use it? Some people may want to expire URLs that the spammers no longer spam for, but that requires us to keep updating records for URLs that are already in the database. We only see a small fraction of all spam at any one time so doing that would be very hard. If we integrated WikiMinion's data in some way to keep the last seen info fresh it might be better. But eventually spammers will learn to hide from WikiMinion protected sites so it wouldn't be effective for all spammers for long. – Joe - 2005-04-25 19:26 UTC


Halz. We should distinguish (well, actually we have to) on this page between link banning and content banning. For example, the original OddMuse content banning code will prevent any kind of content that looks like spam regardless of whether it's a link or not. In other words, OddMuse will prevent discussions about spam, even if you pre-tag it. Personally, I like link checking much more than (all) content banning. – Manni - 2005-04-11 07:23

Yeah the mechanism is almost exactly the same, so I dont think we need a separate page detailing link banning, but then again… calling it 'content banning' in this case is a little misleading. – Halz - 2005-04-25 13:21 UTC