WikiHome RecentChanges WikiNode Preferences chongqed.org

DB Improvements

This is a discussion about ways of improving the structure and output of the chongqed.org database


Underlying DB structure

In the last couple of days I have been working on the script that reads spammers and keywords from the database and at the structure of the database itself. All of these changes should lead to a little more performance. Here they are:

If you notice any bugs, please report them here.

I'm also thinking about splitting the index pages (spammers and keywords) into several smaller pages. Don't know how Google et al would like that and whether it is necessary. But if we keep adding pages it will get necessary at some point.

Manni

First, I thought they always used templates for most of the text on the spammer pages. Does this still allow for custom text in addition like you did for usagc.us? – Joe

Up until now, the script was generating the html, or most of it. Now I use HTML::Template, just like good old POPFile. The script lost some hundred lines which is good by itself. And I can change the pages much easier now. I can provide additional details to any spammer. All I need is some content. These 'stories' are stored away in extra files and whenever the script delivers a chongqing page, it looks for a file with the same name as the spammers domain. – Manni


Spammer / Keyword lists

The spammers list is already getting too big, and taking too long to load. I tend to load this list up and then search for ocurrances of a domain name in the 'Spammer' column. For this use, I dont need all the separate keywords. Maybe there should be a list of keywords on a seperate page under each spammer. – Halz - 2004-12-01 14:50 UTC

I feel your pain, Halz. Could you offer some additional details? A list of keywords only plus a list of spammer only plus what we currently have? – Manni - 2004-12-01 18:03

Halz (and everybody else, of course)! How does that look for a start: http://chongqed.org/test ?

Manni - 2004-12-03 15:38

Interesting. Not quite so ugly or big. Still takes a while to load on dialup, but not too bad. – Joe - 2004-12-03 13:41 UTC

Hey! Looks great. Much faster to load. I guess the original list might still be useful for other types of searching – Halz - 2004-12-07 08:26 UTC

I had a sudden insight into SQL and even the original list should now load a bit faster. I still agree that it isn't very handy. Of course, I hope that search engine will still make good use of the old list. Regarding the new one: it has the side effect that you now can check for known spammers very efficiently: Just add the name of the spamvertized domain to http://spammers.chongqed.org/wikispammer/

Should I now restructure the spammers subdomain? Should I be serving the new list as default?

Manni - 2004-12-07 10:30

By default it probably doesn't matter. For visual appeal the new version is better, but it is less interesting and less useful. Its missing the c&p link. It also doesn't contain the keywords which are good for Google. Possibly you could combine styles a little. You could add letter sperators to the older version. That would help readability, but does nothing for load time. The new one reduces data in two ways, it doesn't show the same URL more than once for diff keywords and by not listing the keywords you save a lot too. I definatly would keep the old one around for Google and those who want to use it, but I guess it makes sense to make this one the default. The other solution is to seperate the alphabet into a couple diff pages. something like: 0-F, G-S, T-Z. It could be automatically set to split the pages into near equal size groups depending on how many entries are in the DB at the time. --Joe - 2004-12-07 09:18 UTC

I can still add the c&p link. But I don't know how useful it really is and how much it is used. Personally, I would use a c&p link that gives you chongqing links for all the keywords of one spammer.

Google can easily find the keywords by looking at the wikispammer/xyz pages. One more page to look at, but they are there.

Letter separators seem like a good idea for the old list. Should be no problem.

Different pages would be nice. I have thought about that. I just don't know how to implement them in a nice way.

Manni - 2004-12-07 11:35

The keywords are still there elsewhere, but Google likes pages that link to each other and are related to each other. By having the words in both places it may be benifitial to us. Sam mentioned that Google suposedly likes sentances. That is likely why we saw that "Google my family Casino…" odd spam from the Casnio guy. --Joe - 2004-12-07 09:43 UTC

It would be kind of hard (at least awkward) to come up with sentences. But since Google likes interlinked pages, having the new and the old list, should make Google pretty happy. – Manni - 2004-12-07 12:35

These two are the same now right?: http://chongqed.org/test and http://spammers.chongqed.org/wikispammer/ …whether or not you work on this some more (ideas of adding c&p links, and splitting the list alphabetically), you should definately make a link to it from the main page 'spammers' sub-menu (so then there's three items on the menu) – Halz - 2005-03-03 13:40 UTC

Yes, these two links are identical (at least they produce identical results ;-)). There already are three items on that menu. Apparently is suffers from a little usability problem, though. I was actually thinking about retiring the other two links. Search engines don't seem to like them much and the resulting pages are way too large for a human being. I don't think that they are still needed. – Manni - 2005-03-03 15:15

What are you thinking about retiring? I agree that it might be a good idea to make some changes, Google doesn't seem too excited about our current methods, but I wanted to make sure what you are thinking. – Joe - 2005-03-03 15:02 UTC

I was thinking about retiring the "spammers + keywords" and the "keywords + spammers" pages. – Manni - 2005-03-03 16:55

I don't think those pages are hurting anything and they may help associate the pages with those keywords in Google. Maybe just don't link to them from the main menu. The entire spammers domain is ranked very low (1), I still think the problem is so many nearly identical pages. Google must be able to detect that automatically or more SEO/spammers would be trying the same technique. I know many do, but we have 4080 spammers, and 11450 keywords and a page on each.

So are you talking about getting rid of all the wikispammer or [1] keyword pages? It might be a good idea. Google seems to dislike the way its setup now. You could setup another subdomain, keywords.chongqed.org. But some people have used those links. Could you redirect one to the other? – Joe - 2005-03-03 18:36 UTC

Yes, it's easy to keep old links working. And, of course, I'm talking about the latter. Those nasty, long, blown-up index pages for keyword-spammer combinations that can either be sorted by spammer or by keyword. I was thinking about delivering the same content on spammers.chongqed.org and on spammers.chongqed.org/wikispammer – Manni - 2005-03-03 20:40

Rather than serve the same content I would just redirect to the other page. That would cut our duplicate pages in half. Out of the thousands of pages we have Google has only indexed a small portion, about 200. – Joe - 2005-03-03 19:48 UTC