WikiHome RecentChanges WikiNode Preferences chongqed.org

TarPit

The wiki tarpit idea

MattisManzel informed us about the wiki tarpit idea by Jim Weirich.

The tarpit is a second wiki behind the real thing where only spammers will end up. The real wiki will thus not be spammed and since search engines won't get to see the tarpit either, all the spammers do is waste time. With this system, however, they waste their own time and not ours.

Its obviously important that the spammer does not realize he is not actually spamming the wiki because spammers will try to get around most spam prevention methods as we have seen on this wiki. With the spam tarpit the only indication that the wiki is a tarpit is the large amount of spam on the wiki which is not unusual on many sites. In addition, the tarpit wiki is frequently refreshed with the main wiki to make the illusion more effective.

The big question, of course, is: How do you recognize the spammers?

The solution for this particular tarpit is to look for users that don't come with a remote hostname (only an IP address) and they don't set their username using the Preferences. Sounds very simple and also seems to work very well. Jim is very happy with the results. So far there have been only a few false positives. But watching for false positives, the wiki admins can add anyone accidently caught to the whitelist without them even knowing they were on the tarpit.

The next question you are probably asking is where can I download it?

As far as I know you can't. It takes some manual setup and modifying files to get a tarpit setup. Each wiki software would require its own version of the tarpit modifications. It may be best left this way. Its not a feature every wiki should have because it does require monitoring of the tarpit for false positives. And if every implementation was the same the spammers would soon figure out how to get around them.

What confused me a bit was this description: "Spammers almost never use an IP address that has reverse lookup enabled. This effectively means that it appears (to the wiki software) that your host name looks like a numeric IP address."

Under what circumstances will a cgi script not see a value for $ENV{REMOTE_HOST}? Is this really some reverse-IP-lookup magic? Or is the hostname sent with the IP-packets?

I will ask, I suspect reverse lookup. I have emailed the author a couple times. I wanted to make sure he was ok with publisizing his idea here (where spammers are likely to read about it) before I put up detailed info on it. Since this site is likely read by spammers I said I wouldn't directly mention the name of his wiki where it is implemented. I really like this idea, but its not much good if the spammer knows about it. I will put up some more info in a bit. – Joe - 2004-12-20 16:43 UTC

OK. From the Apache docs, it seems that the server must have DNS lookups enabled to get the REMOTE_HOST environment variable set. My guess is that there is a rather low time-out value and that those DNS lookups simply time out for Chinese spammers (we know how long it takes to connect to their sites). If that is the case, we don't have to worry about the spammers knowing or not. But, as I said, this is just a guess. – Manni - 2004-12-21 14:01

Hi. Jim Weirich here. I will confess that my understanding of DNS issues is fairly weak, but this is how I understand it. In order to perform a reverse DNS lookup, a system performs a normal DNS lookup on a specially formed name. E.g. to lookup 1.2.3.4, you actually do a normal lookup on 4.3.2.1.in-addr.arpa. In order for these in-addr.arpa addresses to work, the hosting service must provide special PTR records to DNS. Without these records, reverse DNS lookups will not work. More information can be found here. And evidently the spammers (as a rule) don't wish to enable reverse DNS. – Jim Weirich - 2004-12-22 02:07

Another idea: Make the tarpit take a long time to return data – not so long that a spam attempts will time out or give up, but maybe 2 or 3 extra seconds per request. (3 to 5-second response times are pretty common for some wikis, depending on the server hardware and software.) This will cause both manual spammers and simplistic scripts to waste more time waiting in the tarpit, and leave them less time to spam real sites. Some servers use this tactic for e-mail spam. – dsl-142-066.sea.blarg.net

That was discussed when this was originally being implemented, Jim didn't want to give the spammers any clue the site was a tarpit. Unlike email spammers, most wiki spammers are not automated so if they are paying attention they may notice. Most spammers from the major spamming countries probably already have pretty slow connections (like China). They may not even notice a wiki that was slow to respond. But unless its insanely slow its not much use. With email the spammers are sending out thousands of spams at one time. Since wiki spammers are usually only attacking a few pages per wiki its not going to bother them. – Joe - 2004-12-22 00:01 UTC

The TarPit alone doesn't really "hurt" the spammers. Combining "TarPit" with "chonging" does though. Here you have a "front" that sucks in the spammers, and their search words get re-directed. Behind you have the real Wiki. As long as you do not link to any portion of the "real" Wiki, you're cool. If the address of even one page in the real Wiki is compromised you're fucked. That's the weakness.

Mean R. Hey - seems to me, but what do I know?

r.

Well, thats not strictly true. If lookup isn't enabled or not even implemented, a single page without links to more of the 'real wiki' won't lead to the rest of the 'realWiki'. r. This whole thing leads me to / reminds me of, Games.

Maybe we need Private Games topic. Cause thats what this is. Virtual War.

R.

Yes, I've had several glasses of wine 2nite.