WikiHome RecentChanges WikiNode Preferences chongqed.org

SubmitSpammerForm

The 'submit spammer' form

On the 'submit spammer' form, there are two seperate text areas. One for 'spamvertised websites', and one for 'keywords'. But these SEO spammers often seem to have many different websites, each with just one keyword taylored to the content. As such, you presumably have to look at the old spam revision URL to figure out what keyword goes with what website, so the contents of the two text areas is then useless right? Maybe I'm missurdestanding what happens at your end, but wouldn't it be more useful to just have one text area where we paste in the spam wiki text (in case the old revision URL is not available). Certainly that makes it easier for the guy submitting. – Halz - 2004-11-08 14:51 UTC

If you have a complex one like that just don't fill it all in. We do always look at the old spammed revision to confirm it was spammed and see if we can gain any more information. Then we Google the spam to see what else has been hit and how bad the spammer is. The more info we can get the easier it makes it for us. Submitters deal with a few spammers at a time, we sometimes get several submissions a day, not counting the ones that look completely like URL spam themselves. I agree for actual heavy duty chongqing the form isn't well designed, but for the small spammers it helps us out if the reporter does a little of the work for us. – Joe - 2004-11-08 17:42 UTC

Yeah spam like these cigarette links is very difficult to chongq. I can't submit the spammer properly, because they're linking into different pages with different keywords… then when they are in the database (after you guys add them), I have to hunt around to find them all again in the spammer list, in order to chongq them.


The ultimate Chongqing Workflow

Idea for the manual and automated chongqing steps on a future chongqed.org:

  1. I take the wiki text of the spam and dump it in a text box
  2. chongqed.org figures out which of these websites is already in the database
  3. chongqed.org displays the wikitext of chongqing links for those which are in the database (if any)
  4. I paste this wiki text into my wiki somwhere in order to chongq them.
  5. chongqed.org asks me if I want to request the others (if any) to the database
  6. chongqed.org prompts me for extra information to help you.
  7. chongqed.org emails/alerts you to ask you if the submitted spammers are OK.
  8. …several hours/days later you check them, and say yes
  9. chongqed.org adds new spammer websites and keywords into the database
  10. chongqed.org emails me, to notify me of the new additions, and provides wikitext chongqing links for them.
  11. spammers realise all their keywords are getting chongqed to buggery, and they give up :-)

…Of course it's not me that has to program it :-) – Halz - 2004-12-16 16:32 UTC

That's not a bad idea. In fact, Joe and me already have something like that to add spam to the database. Of course, I cannot simply let anyone add anything, but you weren't suggesting this anyway.

The problem with what we currently have is that spammers' junk needs to be preprocessed manually. There are wiki style links and html style links. There are spammy links including directories and page names, while we want only domains. I could easily give you something that would handle preprocessed stuff in a way similar to what you describe. And then, I could go ahead and make it smarter in little steps. – Manni - 2004-12-16 20:08

Sounds good to me. I am glad I am not the one programming it. ;-) – Joe - 2004-12-16 21:58 UTC

We just got a submission of a bunch of stuff for sites that were already listed in the DB. Could the submission form check for matches against the DB and provide the user the chongqed links for those that already exist. – Joe - 2004-12-17 07:12 UTC

It could. Of course, that would require parsing out the URLs/domains first. – Manni - 2004-12-17 15:18

Yeah you would need a reasonably reliable way to separate out the wheat from the chaff, i.e. to get from a block of spammy wiki text into an array of domains and corresponding keywords. Presumably the subdomains need to be identified and stripped out too. If you can crack that nut, then steps 2,3, and 5 should be quite doable. – Halz - 2005-03-03 20:40 UTC

I have some Python code to help me manually chongq spammers. Its not nearly capable of unsupervised automated use and not optimized, but could be used as a starting point. I improve it every once in a while when I hit spam it can't handle. At this point it will strip off pagenames and use them as keywords if there were none. It doesn't strip subdomains though since occasionally its not the entire domain that deserves chongqing. – Joe - 2005-03-03 20:52 UTC

Exactly. I still have to prepare any spam I submit to the db manually. I have some macros for my text editor to help me, but as of now I never had that brilliant idea that would let me automate all this. Subdomains are really tricky. I always discard 'www.'. But what about others? Sometimes we have one big, spammy domain like 51.net. I can easily discard the subdomain here. But what about 'cheapxanax.tblog.com'? On the other hand: as long as the step of entering the stuff into the db is not automated, a script that is simply checking for a known spammer could always try domain.tld and subdomain.domain.tld. How good is your Perl Halz? – Manni - 2005-03-03 21:52


Do we really need a submit form?

Do we still need it? I set this up before we had the wiki. Maybe directing users to the wiki to submit spam would be easier? – Manni - 2005-03-03 15:23

The submit form allows us more control in what the user inputs. On the wiki likely all we would get is a paste of the spam. With the submit form it saves us a little work when filled out even partially. I think both are useful. Not everyone has used a wiki before or has any clue what one is. You could put a link there telling people its an option that is useful for reporting a bunch of spammers at once. – Joe - 2005-03-03 15:04 UTC

Right. No reason to completely get rid of the form. It's just that people like Halz do a lot of work preparing their submissions that is essentially lost as soon as the click the 'submit' button. The nice thing about using the wiki is that the Dan would catch already known spam. – Manni - 2005-03-03 16:58

But Dan would also annoy people who don't pay attention to the put it inside pre tag rules, and those who do wouldn't get any checking against the database. Maybe you could have Dan ignore that page and also have a noindex, nofollow on the page so spammers would be less likely to use it themselves. And doing that you wouldn't have to put spam in pre tags. – Joe - 2005-03-03 18:07 UTC

I have to think about this. One way to submit spam using the wiki would be simply linking a spammed revision (bad SEOing, though). Meanwhile I took the liberty of giving out decent submission ids. – Manni - 2005-03-03 20:55

If we do a nofollow noindex on the page its not useful for SEO anyway. But if we enter them and quickly remove them from the submission page its not going to be well SEOed anyway. Currently we have used that page as kind of a discussion page too, but the current content could be moved. What do the decent submission ids mean? The number of submissions since the beginning of the system? – Joe - 2005-03-03 20:06 UTC

Yes, if we do a nofollow,noindex on that page. Of course, it's possible and it should be done, I think. Halz! Busiest of submitters! What's your take on this? – Yes, I made it start with 87 because I found 86 (real) submissions in my archive. – Manni - 2005-03-03 21:37

Yes I think in some ways it's a step backwards, but for the current traffic through the spammer submission system (me), you might as well go for this approach. Makes it easier. How about inventing a new wiki markup tag for turning a page into a NOINDEXed page. This is better than hardcoding a particular special page as NOINDEXed, but dont know how easy it is… or maybe a tag and a close tag which we can wrap around a block of text or a whole page, and within this the links are all rel=nofollow. Just thinking about the order the script must do things in. Maybe this is easier to implement. – Halz - 2005-03-03 22:22 UTC

The problem with a tag is it could be removed either on purpose or by accident. By using robots.txt it should be easy to do a noindex, nofollow for a specific page. – Joe - 2005-03-03 23:10 UTC

I guess noindex,nofollow meta tags would be quite simple to implement for any given page. Getting Dan out of the picture is going to be a little harder. He can be such a finecky bastard. – Manni - 2005-03-04 00:19


how to submit a spammer who's attempt failed?

The submit spammer form requires one to enter the "URL of a spammy revision of one wiki page". If omitted, the form cannot be submitted.

What if the spammer attempty to deface my wiki, but failed?

I use a slightly modified MediaWiki installation that can detect/log/block a variety of wiki spam attempts, including chonqued blacklist matches, spammy keyword matches, and other rules. When a spam attempt is blocked, the attempt is logged, but the spammer is shown a fake confirmation page, complete with preview, so they believe that their spam attempt has succeeded. No new revision of the spammed page is created, so I have no example url to submit, only a log message that looks something like this:

Subject: OHRRPGCE-Wiki Main Page
From: webmaster-wiki@HamsterRepublic.com
Date: Tue, 16 Aug 2005 13:22:23 -0700
 
spam attempt blocked from "24.202.53.171"
Reason: direct links are forbidden on the main page
 
Hi all!
BEST info about hotels, motels, inns, travels and more:
[http://best.all-hotels-motels.com/index.html all-hotels-motels.com]
Best regarts, Mike Tison

-James Paige

James, I don't know exactly what Joe's or Manni's policy is with regard to the "URL of a spammy revision of one wiki page" and how strongly they require it. However, in case they do require such a URL, I can back up your example with a URL as well as other supporting evidence. This URL is a revision of a page spammed by the "Mike Tison" spammer. Ann has a detailed page on the "Mike Tison" spammer. Similarly, the domain all-hotels-motels.com appears in WikiMinion's database.

RichardP - 2005-08-16 22:15 UTC

Usually even if it is from a source we know we can trust we like to have "proof" to look at. A log report is not going to be good enough on it's own usually. But luckily (or actually unluckily for the web), spammers usually spread their garbage all over the place so if we can't actually see your particular spam it usually won't be long before Google stumbles across it somewhere.

The requirements on the form fill in are mostly to prevent vandals and idiots from sending in junk requests. Just fill it in with a link to your site and note in the comment that the spam was blocked so no revision.

I would like to know more about your modified MediaWiki spam blocking. Is that using available modules/plugins/extensions or something you did on your own?

Joe - 2005-08-19 00:57 UTC