WikiHome RecentChanges WikiNode Preferences chongqed.org

Links Experiment

Links Experiment

The experiments detailed on this page attempt to discover how search engines react to different kinds of links. When discussing spammers we had been using text only URLs (inside pre tags) assuming they would not be seen as links by search engines. Halz wondered if that might not be true and lead to testing it empirically. It was then expanded to include many other types of links, especially ones with variations of the new nofollow attribute.

Jump to the results table below if you aren't interested in all the discussion that lead up to the (at the moment) final results.


You know these <pre> tags we use all over the place… Well it is not possible that google bots index links within a pre tag, even though they are not rendered as clickable links? – Halz - 2005-04-12 10:42 UTC

I don't think so. I can't give an guarantees or provide any kind of evidence, I just don't see why they should do that. It may give them another URL to spider (I doubt that), but it surely won't increase the page rank. – Manni - 2005-04-12 13:12

Thing is, once it appears in a browser, it is obvious whether a <A HREF tags is in a pre tag or not, but for a bot to figure this out involves a bit of tricky parsing (baring in mind that it can't assume the pre tag is closed properly). Maybe google has a simple bots which only look at <A> tags, and do most of the donkey work, then the ocassional super advanced bot which comes along every now and then, to check if the simple bots had been mislead, and does really clever checks for things like hidden layers, css font size/colour tricks. Could be that pre tags actually count as a cheating tactic. …of course this is all wild speculation. Who knows how google really works? Would be a bit annoying if it was actaully following those links though hey? – Halz - 2005-04-12 12:39 UTC

We can of course test this empirically:

http://chongqed.org/halzspeculation.html
Let's see in two or three weeks what the keyword 'halzspeculation' gives us in different search engines. – Manni - 2005-04-12 16:15

I sure hope we are right about Googlebot not seeing them as links. I really doubt it would. Google would have built a good bot so I don't think they would be just looking for http://, they would be looking for href=. – Joe - 2005-04-12 17:43 UTC

OK, here's the next experiment. I simply called it Joespeculation. What will happen with URLs that are neither inside a pre-tag nor actually linked in a a-href thingy? http://chongqed.org/Joespeculation.html – Manni - 2005-04-14 08:00

…and how about a 'thirdspeculation' to test what happens to a URL which is in an A HREF, but also in PRE tag. – Halz - 2005-04-14 07:29 UTC

Err… You mean something like this:

<a href="http://chongqed.org/thirdspeculation.html">thirdspeculation</a>
Manni - 2005-04-14 09:39

I added each to my blog to make sure Google saw them. Once I realized thirdspeculation was being treated as an actual link I removed it. It stayed there over night before I realized it so its possible I tainted the results. – Joe - 2005-04-16 01:56 UTC

The first 2 speculations are still not in Google. However, the third one is, so I did mess it up. But we can just consider that one the control in our experiment and try again. It took a little while to make it into Google's index since that link was removed from my page on April 15, 16:29:38 CST. According to Google cache, at least two of my pages were crawled early that morning (01:35:08 and 03:37:59 GMT) and the thirdspeculation page was crawled Apr 15, 06:32:14 GMT.

Next should we experiment with rel="nofollow"? I am very interested to know if that is just denying PageRank (as the Google press release implies) or those links are actually not followed (as the name implies). I would also like to know for sure what other search engines are supporting this attribute.

Joe - 2005-04-16 23:33 UTC

Well, the nofollow experiment confirmed what I suspected. Google does follow a link even with the rel="nofollow" attribute (contradicting the meaning of No Follow). (something weird was going on, it is now not followed/indexed somehow) No way to confirm that it the link is not considered in PageRank calculations but I am sure it's not. It would be a huge mistake for Google to promote it as a fix for comment spam otherwise.

Other than in Google, chongqed.org is not indexed well anyway so that could easily throw off the results. But Google is the most important in link spam because it is based on PageRank. I suspect others have similar link based technology, but they don't seem to rely on it as heavily.

Joe - 2005-04-21 08:46 UTC

I'm curious Joe, what is the purpose behind the bombspeculation link experiment? – RichardP - 2005-04-27 15:36 UTC

Those are to see if I can associate the word chongqbomb with the bombspeculation pages in searches even though that word does not appear on the page (the usual idea behind a Google bomb). – Joe - 2005-04-27 17:04 UTC

Looks like Manni snuck some new ones in on me: relthenhref and hrefthenrel. The first one has the rel=nofollow before the href, the second seems to be messed up as it has no rel but the name implies it does. Anyway, they are both in Google now. This proves the nofollowspeculation discrepancy I have been going nuts over. As with nofollowspeculation, Google again indexed the rel=nofollow one when the rel was placed before the href. Google corrected that one (on and off) which was pretty confusing. I have been trying to test this again with nofollowspeculation3, so far it has not been indexed yet. One similarity I can think of is that the original and Manni's new version were the only ones on the chongqed front page. Since they are internal site links are they indexed by a different crawler? At least they must be more likely to be crawled since its the same site. – Joe - 2005-05-04 09:04 UTC

Sorry. I messed up the link for the hrefthenrel speculation. I corrected this one and that new speculation is now called 'hrefandthenrel'. – Manni - 2005-05-04 12:18

I am beginning to wonder if Google is ignoring rel=nofollow on local links. That would explain why nofollowspeculation, hrefandthenrel, and relthenhref were indexed. Those three were linked from chongqed.org's front page. For the other nofollow tests the only place linking to them was my blog. To figure this out I just put up four more links on a different site's front page. Two nofollows to chongqed and two local nofollows. I will see how they turn out. – Joe - 2005-05-06 20:45 UTC

Google finally reindexed the main page with the links yesterday, but none are followed so far. – Joe - 2005-05-13 08:20 UTC

Google has still not indexed any of the links. This is either good and confusing or they just haven't crawled the site enough yet. I don't know which. – Joe - 2005-05-27 21:25 UTC

Still have not been indexed by Google. Yahoo indexed the rel then href one. – Joe - 2005-06-11 17:04 UTC

It's been a long time since I last checked on the results of the experiments. Google results are mostly the same, but a few test pages are no longer indexed. I am guessing that is because there are no longer links to those test pages. I also discovered Sid the Umax spammer's revenge may have affected test results on some of the early pages. He stole the text of several of our pages which included links to the tests. Later he realized his stupidity and changed the domain to one of his own spammy sites. Luckily Google is not a fan of his tactics and do not index any of the sites he has done this on. Sadly Yahoo apparently doesn't care that it is full of garbage.

Google has still not indexed the offsite tests I was running so I am giving up (for now). I did get some Yahoo results though. They did not index either of the nofollow links to chongqed, but did index the local href then rel nofollow link. I am still confused what they are doing, before they had the other one indexed.

MSN Search finally showed enough results to draw some conclusions. They actually do what you would expect with the nofollow tag; they don't index those links. Good job Microsoft!

Joe - 2005-07-26 05:33 UTC

?

Google April/May 2005
halzspeculationpre text only URL not followed
joespeculationtext only URL not followed
thirdspeculation2text only link &lt;a href=…not followed
nofollowspeculationrel="nofollow" before href ^followed
nofollowspeculation2rel="nofollow" after href not followed
nofollowspeculation3rel="nofollow" before href not followed
nofollowspeculation4rel="nofollow" after href not followed
nofollowspeculation5rel=nofollow after href not followed
relthenhrefrel="nofollow" before href ^indexed
hrefandthenrelrel="nofollow" after href ^indexed
redirectspeculationblogger redirect indexed
redirectspeculation2blogger redirect indexed
yredirectspeculationyahoo redirect indexed
gredirectspeculationgoogle redirect not followed
notextspeculationno text between <a></a> tags not followed
thirdspeculationnormal link indexed
bombspeculationnormal bombworked
bombspeculation2nofollow bombnot followed
controlspeculationnormal linkindexed


Yahoo April/May 2005
halzspeculationpre text only URL not followed
joespeculationtext only URL not followed
thirdspeculation2text only link &lt;a href=…not followed
nofollowspeculationrel="nofollow" before href indexed
nofollowspeculation2rel="nofollow" after href indexed
nofollowspeculation3rel="nofollow" before href indexed
nofollowspeculation4rel="nofollow" after href indexed
nofollowspeculation5rel=nofollow after href indexed
relthenhrefrel="nofollow" before href indexed
hrefandthenrelrel="nofollow" after href indexed
redirectspeculationblogger redirect not followed
redirectspeculation2blogger redirect not followed
yredirectspeculationyahoo redirect not followed
gredirectspeculationgoogle redirect not followed
notextspeculationno text between <a></a> tags indexed
thirdspeculationnormal link indexed
bombspeculationnormal bomb$worked
bombspeculation2nofollow bombindexed
controlspeculationnormal linkindexed


MSN April-July 2005
thirdspeculationnormal link indexed
controlspeculationnormal linkindexed
short table since MSN only indexed normal links

indexed - page can be found in search results
followed - page recieved a hit but was not indexed
not followed - page recieved no hit
worked - the googlebomb was succesful

* results too early to tell
^ nofollowspeculation was followed and indexed, later it was removed from the index. So far the others have been indexed and not corrected.
$ this bomb worked initially on Yahoo, it is still indexed, but the bomb is no longer associated with the page.


I had trouble identifying more than three major search engines to test: Google, Yahoo, and MSN (which were all in on the agreement to implement nofollow). All the others get results from one of them and maybe mix in results from dmoz, Teoma, Overture, etc. Ex: HotBot and AOL had the same results as Google.

Between April 12 (the beginning) and May 4, 2005 based on server logs:

MSN has only crawled Manni's messed up hrefthenrel page, but it did that 4 times. It did not visit the relthenhref that was there at the same time so that is a good sign.

Looksmart has crawled 9 of the pages starting on April 30 and has not indexed them yet. It follows nofollow links.

Ask Jeeves/Teoma crawled the original nofollowspeculation page on April 23 and has not indexed it. So it follows nofollow links, but may not index them, we don't have enough info yet.

GigaBlast has not crawled any of the speculation pages.

Yahoo has followed 10 different pages with 62 hits. Google has followed 9 in 54 hits. (Both including hrefthenrel which is not listing in the tables.)


These links are for continuing the experiment with other engines:

(The original thirdspeculation is now used as a control and is supposed to be a link.)

http://chongqed.org/halzspeculation.html

http://chongqed.org/Joespeculation.html

http://chongqed.org/thirdspeculation.html

<a href="http://chongqed.org/thirdspeculation2.html">thirdspeculation2</a>

http://www.blogger.com/r?http://chongqed.org/redirectspeculation.html

http://rds.yahoo.com/*http://chongqed.org/speculations/yredirectspeculation.html

http://www.google.com/url?&q=http://chongqed.org/speculations/gredirectspeculation.html