We have seen a couple of attacks on the chongqed.org wiki recently. Those were obviously targeted at blogs, blindly POSTed to the wiki script that doesn't look for a 'url' CGI variable and simply ignores it. That's what makes those attacks look pretty stupid. But we've seen this before with our old friend the BackToTheFutureII spammer.
I updated the spam catching module to look for a URL parameter coming in through the CGI interface. Maybe this is something that all blog and wiki (not to mention guestbook) software should do: Take a look at the CGI parameters and gracefully fail if there is an unexpected parameter. – Manni - 2005-10-28 11:06
That sounds like a pretty good idea to me. Under normal use there should not be any extra parameters. This is kind of like the BadBehavior? method of fighting spammers, look for what is different from a normal user. – Joe - 2005-10-28 10:25 UTC
Right. I just implemented this more carefully. I formed a whitelist of allowed CGI variables. If there is anything POSTed that is not on the list, Dan will interfere and do what he does best. I hope I didn't forget any variables that need to be on the whitelist. If I did, you may end tagged as a spammer. Clear your chongqed.org cookies in that case and contact me so I can fix the bug. – Manni - 2005-10-28 14:51
I was careless and somehow failed to notice that yesterday was WikiMinion's 1-year anniversary. One year ago yesterday WikiMinion went online and has been tirelessly cleaning spam from wikis ever since, 24-hours a day, every day (although I will confess to one or two outages). Since going online WikiMinion has reverted more than 200,000 spam edits. – RichardP - 2005-10-27 15:06 UTC
Congratulations. I am sure all the spammers you have reverted are celebrating too. ;-) – Joe - 2005-10-27 16:07 UTC
I was tempted to say 'Long live WikiMinion'. But I hope that WikiMinion will someday be useless and out of work. Nice job, Richard. – Manni - 2005-10-28 11:06
Read FightSplog.com on how not to fight splogs. – Joe - 2005-10-25 19:21 UTC
And now I have my own post on the issue. – Joe - 2005-10-26 09:48 UTC
I am in agreement with you Joe, click fraud isn't the answer. I wish I had a really good fix to the spam problem to suggest. I've been closely watching the ongoing discussion between the folks developing EFF's Tor (an anonymous Internet communication system) and the developers of Wikipedia. Currently Wikipedia bans the Tor server IP addresses, preventing Tor users from editing Wikipedia, due to ongoing abuse. However, the Wikipedia folks definitely would like to allow Tor users to edit Wikipedia, since they feel people should be able to edit Wikipedia anonymously. The discussion has been very interesting. I believe the problem of "how do you permit users to edit Wikipedia anonymously without inviting extensive abuse" has close parallels to the more general problem of comment spam and hope that a solution developed for the Tor/Wikipedia problem might have a more general application. – RichardP - 2005-10-27 15:36 UTC
Wow, we just had one of the largest spam attacks on the chongqed wiki that I can remember. Richard and I both caught it about the same time so while he set WikiMinion going I locked the wiki. By the time I got it locked though I think all I was blocking was WikiMinion. Looks like it didn't finish cleaning because of that. The wiki is unlocked now and the spammer's domain has been added to the DB so he won't be attacking with that domain anymore. – Joe - 2005-10-22 20:14 UTC
I noticed that you locked the wiki. Now that you've unlocked it I had WikiMinion finish up the rest. It was our friend the CSSHiddenSpammer that we discussed here under the 051004 entry. – RichardP - 2005-10-22 20:20 UTC
I wondered if that was the same guy. It is certainly the same style. I guess he finally got the bugs worked out of his program. Before his spam was always incomplete and missing the link. – Joe - 2005-10-22 20:26 UTC
Oops, I guess I was talking about a different guy (see PrivateSmartSpamTactics, the incomplete spam idiot was using a different variation of the CSSHiddenSpam trick). It does seem to be the guy in 051004. I was going to add some more analysis here, but the wiki had an error so I decided to turn it into a blog post. – Joe - 2005-10-23 03:24 UTC
Holy cow! We didn't lose those pages that are now marked as 'deleted', did we? Guess we never had them before this moron came along? – Manni - 2005-10-23 15:35
Manni, those pages were created by the spammer. WikiMinion marked them for deletion. Pages so marked will be automatically deleted by OddMuse after the KeptPages? expiry time, but chongqed has such a large value for KeptPages? that they will pretty much never get deleted automatically. Someone with admin privilidges should probably delete them manually. An easy way to a list of the pages is via this link. – RichardP - 2005-10-23 14:12 UTC
Thanks, Richard. As you can see, I just deleted those pages.
I wonder which page this guy spidered to come up with all those dead wiki links. – Manni - 2005-10-23 17:01
He had to have spidered lot of them. I searched a bunch of them and few appeared on the same pages. But since he was using proxies for the spamming I assume he was also using them for spidering so it would be very hard to tell. Was there anything you could identify him from his User Agent or was that random too? – Joe - 2005-10-23 19:06 UTC
Apparently, his hits weren't random at all. From what I can see in the logs, it seems that his user agent string is always "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)", but that's not very unique. His referrer string is much more interesting. I see lots and lots of hits that all have "wiki.chongqed.org". Note that there is not http:// in front of that. – Manni - 2005-10-24 09:42
Did he spider the site first or did he just find the new pages to create as he went? If he did spider first, when and how long? – Joe - 2005-10-24 07:59 UTC
There are a couple of GET requests before the avalanche of POSTs. But they are from a variety of hosts and not easy to track. It seems that those GET/spider requests weren't done within some narrow time frame. – Manni - 2005-10-26 10:18
Not sure where the mail problem lies, but sending to Manni from Earthlink I got this:
manni @chongqed.org
SMTP error from remote mailer after RCPT TO:<manni @chongqed.org>:
host mail.chongqed.org [212.88.144.20]: 554 Service unavailable;
Client host [207.69.195.67] blocked using multihop.dsbl.org;
http://dsbl.org/listing?207.69.195.67 - see http://multihop.dsbl.org or
http://www.dbdserver.de/help/rbl/207.69.195.67
I guess that means Earthlink is on a DBL. I am getting pretty tired of their now horrible POP server anyway.
– Joe - 2005-10-21 21:21 UTC
Yes, pop-tawny.atl.sa.earthlink.net [207.69.195.67] currently appears on the multihop.dsbl.org list, but not dsbl.org's primary list (i.e. it's not on list.dsbl.org). DSBL describes multihop as follows:
Note that the multihop and unconfirmed lists are very aggressive and have the potential for a high level of false positives. The decision to block mail based on multihop, therefore, is philosophical as much as practical: doing so effectively punishes ISPs who do not proactively secure their networks by refusing significant legitimate mail from their customers.– RichardP - 2005-10-22 11:41 UTC
Is this irony? How could they "proactively secure their networks by refusing significant legitimate mail from their customers"? – Manni - 2005-10-23 15:35
Blogger has a big post on what they are doing to solve the splog problem. It really doesn't say anything, but it is good marketing. Since direct links to it show up in its Backlinks list and this forum isn't really on topic for it I am not posting a direct link. You can find it from my blog post. – Joe - 2005-10-18 18:31 UTC
Well, splog news is making the rounds of major bloggers again, including Jay Allen and Chris Pirillo. The problem got so bad that IceRocket stopped indexing Blogsplot blogs for now. My blog post on the subject. I guess there is a reason Blogspot is on our LoserWebmasters page. – Joe - 2005-10-17 17:09 UTC
I was pretty surprised when I looked at my blog's referrers today and found a bunch from Netscape.com. I am pretty happy. My post about the FBI raid on the spammer made it to the Editor's Picks: What People Are Saying for their article on the raid. Maybe now is a good time to run spell check on my post. ;-) See the Netscape.com article. – Joe - 2005-10-17 15:19 UTC
And the next day Lockergnome picked up on my IceRocket vs. Google Splogs post. – Joe - 2005-10-18 12:45 UTC
I found an article about the FBI raiding a spammer under CAN-SPAM. My comments on my blog.
I also posted about Spyware being ruled illegal trespassing by a Federal court.
On that post I had a spammer comment about his post explaining how spam is good for blogs. Ann has a post on that. The spammer has already taken the blog post down, but the rest of the junk at his domain is still there. Ann has a copy if you still want to read how nutty spammers are when explaining why comment spam is a good thing.
With all these laws, how much longer till comment spam laws or splog laws? Splog laws may be a bit harder, but stopping them from being built by stealing other sites content should be possible. Existing copywrite laws should be albe to handle automated RSS stealing splogs anyway.
– Joe - 2005-10-16 22:50 UTC
A comment left at Splog Fighter's blog mentions how he used to run a fairly popular academic blog at blogspot and now it is a faux-academic splog pointing to porn sites. He says, "It's like he's stolen my identity." This is just a reminder if you want to remove your blog on a free host, just delete all the content and keep the account so no spammer can move in and take advantage of any traffic you might have built up. – Joe - 2005-10-13 20:34 UTC
Fred posted an idea found on Dvorak’s blog to block most spam. Don't allow a comment if the referrer isn't from a valid page on your site.
As mentioned in the comments this isn't a great solution since we know how easy it is to fake a referrer, but it will work on most spammers till they adapt.
Some other comments mention that some internet security software strips the referrers on HTTP requests. Seems like bad design to me since some sites do basic security by checking referrers (not very reliable, but that doesn't mean it isn't used). And there are those who set their browsers not to send referrers. Some people are just paranoid.
– Joe - 2005-10-11 19:38 UTC
Yeah, it is a nice idea, but as you mention the fact that Symantec's Norton Internet Security strips the contents of the referrer header on outgoing HTTP requests is a serious complication for any anti-spam scheme which requires referrer headers. – RichardP - 2005-10-11 21:15 UTC
The most recent chongqed entry added by Manni, joia.com, is a rather interesting case. Joia.com appears to be a legitimate web site for a band. However, in the joia.com whois the administrator has an email address at icon-stl.net, a site which has a distinctly spammy taste (site appearance, dubious search, pop-ups, obfuscated whois, and DNS from hitfarm.com). Further note that a quick search reveals that
http://www.joia.com/images/images/s/
is the directory for all of the spam. To me this sort of hints that joia.com has either (a) an unscrupulous site administrator, or (b) joia.com is a fabrication or a clone of a real (possibly now defunct) site, or (c) joia.com is legitimate but is hosted by a spammer who offered joia.com free or discount hosting services in order to disguise his spam. I suppose another possibility is that the joia.com server has been compromised by a spammer, but that doesn't explain the reference to icon-stl.net. – RichardP - 2005-10-11 09:23 UTC
It appears that the site is or at least at one time was for a legitimate band. Using some of the text on the page I searched Google and didn't come up with anything that looked like it had been stolen. And I found a radio station that lists them as performing and links to joia.com.
The date on all the spam files is Sep 29, 2005, the other files in the images directories are all much older. The most recent I saw was Aug 29, 2005, but it was only one. Most files were August 2003 with some in 2004. Maybe the band isn't keeping the site up so the maintainer decided to get some use out of it. But that doesn't really explain the odd whois stuff you found.
– Joe - 2005-10-11 16:38 UTC
I have emailed the webmaster asking for an explaination, and suggesting they remove those files. Seems there's also been referrer spam linking to these files judging by this thread - Halz - 2005-10-11 16:58 UTC
Halz, did you ever hear back from the webmaster of joia.com? I note that the directory I mentioned above is still on the joia.com server. – RichardP - 2005-10-19 08:28 UTC
No. nothing. – Halz - 2005-10-19 08:48 UTC
MT-Blacklist looks like it is in the process of shutting down. Due to either lots of badly designed software downloading the master blacklist far too often (or a large DDoS by spammers) Jay is having to close down. Like us he has never seen a centralized blacklist as the solution to spam. He says that the new version of MoveableType has enough antispam features it shouldn't need the blacklist anymore. It is just the other software using the list that is going to get hurt. But there are ways to fight spammers without blacklists. – Joe - 2005-10-10 18:03 UTC
I was just reading that. It's kinda sad. As a stop-gap solution, it's pretty important. I mean there's a lot of people making use of it. I would've thought some of the blogging heavy hitters could spare some bandwidth for such an important thing. But at the end of the day it's a bad technological solution. It doesn't take a genius to realise that it's vulnerable to denial-of-service attacks. I reckon that's probably what braught it down (rather than over-intensive downloading by ligit users of blogging tools). Where to go from there though? PeerToPeerBlacklist?s have some problems too. – Halz - 2005-10-12 09:26 UTC
All blacklists have problems and a peer to peer one may be worse. There wouldn't be the quality control that a privately run list would have, though at the level of spam MT-Blacklist was dealing with I am sure it wouldn't be hard to spoof. – Joe - 2005-10-12 14:35 UTC
CAPTCHAs can be accessible. Marco of Pivot-Blacklist has had a logic puzzle/question CAPTCHA for a while for the Pivot blog software. Now he has ported that function to WordPress. Marco's WP plugin is WP Spam Quiz. There is another similar plugin by Eric Meyer called WP-GateKeeper.
Since the questions can be customized by the site admin spammers won't be able to just collect default answers and try them with automated attacks. This is better than graphical word distortion CAPTCHAs because a well designed program can break most of those frequent enough to make them not an ideal solution. Currently spammers don't seem to be doing it much, but that day will come soon. Logic puzzle questions are much harder to break because good ones will not be understandable by a computer.
For example, on Marco's blog the question is "How many letters does the word 'asshole' have?" Other examples could be "What color is an apple?", "What is 2 plus three?", "What is my first name?", "What year is it?" The choices are endless and easily understood as long as the visitor understands the language. The questions might not always be quite so clear running them through a translator, but some versions of the graphical CAPTCHA require understanding English to pick out words. I think as long as the questions are simple an automated translator shouldn't mess them up too much. Anyone want to translate those questions into other languages and see how they turn out? Of course the other problem is some of them will be expecting an answer back in the original language. I guess that could be a problem too. But they are still better than the graphical solution.
– Joe - 2005-10-10 18:03 UTC
Did you know that Google is a spamdexer? We knew they supported spammers though AdSense, but to actually supply the spam pages is going too far. More on my blog. – Joe - 2005-10-08 17:07 UTC
Yup that's pretty unbelievable. Tried to leave a comment on your blog Joe. Not sure if that worked.
We need some more recent news on here, to make chongqed.org look a bit more fresh. Maybe we should duplicate some of Joe's spam news stories – Halz - 2005-10-10 16:37 UTC
Well, my blog has slowed down a bit again from its most active period last month with all the splog posts. Many of my recent posts have not been actually on the spam topic. I will post some of the good ones here over the next few days. – Joe - 2005-10-10 17:50 UTC
The CSSHiddenSpammer is spreading his net wider. I think it's just one spammer. Kayak wiki (UseMod) has been hit today (RecentChanges). They're spamming existing pages and creating non-existent linked pages. Sometimes they're not adding any spam, which is strange (and a bit annoying, since WikiMinion doesn't recognise it as spam, and so doesn't label it for deletion) – Halz - 2005-10-04 09:13 UTC
Yeah, that spammer has been making a real nuisance of himself (in fact, his spambot has corrupted the database of several UseMod wikis). The pages that appear to be the spammer not adding spam are actually the spammer replacing his spam links with a few carriage returns so that he can turn around and insert his spam again without triggering UseMod's identical page detection. In an effort to detect these edits, as well as the creation of pages, I've gone through and confirmed the several hundred open proxies being used by this spammer that WikiMinion wanted me to add to its database. After doing so, I had WikiMinion cleanup Kayak wiki. – RichardP - 2005-10-05 03:20
Any news from the wikimedia folk? Seems that you cannot see that spam in the diff output. You'd have to guess. – Manni - 2005-10-05 19:01
Sorry. I guess the above was a false alarm. Hadn't thought that the linked wiki was that badly spammed. At least it seems that some spammers are now targetting wikimedia wikis. – Manni - 2005-10-05 19:13