Discussion about ways of stopping WikiSpam on usemod wikis
At the prompting of Halz, I've made the patch WikiPatches/UseBogoSpamNotify to use WikiSpam/BogoFilter for classifying content as spam or ham. Looking at the RecentChanges page makes it pretty clear which pages need to be fixed. I have kept out of this patch any auto banning, and the ability for normal users to train the filter with content. Once an admin has trained bogofilter, it does a reaonable job of identifying spam without extra training, thus normal users shouldn't need to do this. If an admin needs help with training, he can go back in the history of pages and mark specific revisions as spam/ham. Feel free to try it out at http://sosaith.org/cgi-bin/wiki-bogo.pl?RecentChanges with admin password "test".
Let me know what you think. --TomScanlan
- The UseMod philosophy is you patch what you need. What gets accepted into the mainstream is what has the highest value for the most number of people. If it's questionable, it gets a configuration variable. So, in other words, go for it. It's worth trying out to see if it works. It may be particularly useful for PIMs, if not communities. -- SunirShah
- The patch already has an on/off config variable. --TomScanlan
One way can be to enable an editing password, put it on the wiki itself, then lock the site.
This should keep at least some stupids bot away. Even human spammers might not read before they spam so we should be safe from them, too.
Since bots are stupid and certainly looking for "well known" links, would it help to change the "edit"-link action from "edit" to something else? Can spam-bot technologies used for wiki-spam be explained a bit?
RadoS, I'm not sure how successful that would be. Analysis of web server logs indicates that, unlike much of the "online gambling" WikiSpam, the vast majority of the WikiSpam linking to Chinese sites is being inserted by hand. This is particularly mind-boggling when you see a wiki hit by hundreds of such edits. I know of one wiki that not only changed their site to require users to create accounts to edit the wiki, but also implemented a [captcha] in the login process. Despite these precautions, they still get hit with WikiSpam. I know of another wiki that locked the entire wiki and then distributed an editor password via a separate mailing list. They too continue to get hit with WikiSpam. I think you are definitely underestimating the persistence of the spammers. -- RichardP
- Also this goes against one of the fundamental principles of wikis. It's a beautiful thing that wikis don't require registration/passwords etc. It really makes a difference for wikis on non-technical topics. I once emailed several of my non-technical friends and asked them to contribute to a wiki. They all tried it out and were really impressed by it, but I know very few of them would have bothered, if the website had asked them to choose a username and password. For non-techy people it's real turn-off. To be forced to lock down a wiki because of spammers is kinda sad, we should look for better solutions. -- Halz 18th Jan 2005
- Well, the PW would be given at prominent place easy to find (start-page or on "preferences"), they wouldn't have to make one up. However, if RichardP is right, then this won't help us that much because it applies only to "stupid bots", not to malicious "smarties". Entering a public PW is not much of a "keep non-techies out", it's just a basic sanity check to keep human or code bots out. Free access is a good idea but only when there is nobody abusing it. --RadoS 21Jan'05
- Ever since we locked our site and released a public editing password, we are safe from spammers, yay! Additionally I changed all "edit" URL-actions, but I can't provide stats on which counter-measure caught how much spam. Anyway, it works without overchallenging people too much (afterall they can read&write, no? ;). -- RadoS 7Jun'05
I have looked at my sevrer logs and most changes are from 'stupid' bots. I have changed the edit link. If you are using GNU/Linux?, it takes two lines, for example:-
sed -e s/Edit/Ed1t/g /usr/lib/cgi-bin/wiki.pl > /tmp/wiki.pl
sed -e s/edit/ed1t/g /tmp/wiki.pl > /usr/lib/cgi-bin/wiki.pl
Changing edit links to ed1t. --bmsleight 2005-03-15
Steal page ranking for spammy keywords
A lot of WikiSpam
is SEO spamming, that is, increasing their rank in google. I wonder what it would do to their search terms if you let the spam through but rewrote the destination link to something really offensive instead. Or just pointed it to one of their competitors. Probably wouldn't work unless everyone did it though, and not enough people will tolerate seeing the spam at all. -- ChuckAdams
- The good folks at http://www.chongqed.org have a very similar idea, except you link to a page which names and shames the spammer. -- Halz 18th Jan 2005
- Chuck, I'm not convinced that your "dirty tricks" approach is very practical. Many spammers replace an entire page with their links rather than adding their links to the original page. I would not be willing to leave these changes alone, even with rewritten links, since doing so essentially results in the destruction of pages. Similarly, some spammers are more subtle - rather then adding links or replacing entire pages with their links, they instead replace existing links with links to their own destinations. I can foresee substantial problems with automatically changing such spammer links to offensive links. For example, imagine a page describing educational toys with links to manufacturer sites for more information. It would be a disaster if a spammer changed the destination of these links to his own and these links were in turn automatically converted to offensive links. -- RichardP
- Yeah we should aim to clean up the spam, and keep articles looking good and reading well. However theres nothing to stop people creating a seperate place on their wiki, e.g. a page called [[WikiSpam]] or [[SpamReport]] containing links to http://www.chongqed.org . Google will find these pages even though they are not the original location of the spam. -- Halz 18th Jan 2005
Prevent search engine robots indexing old revisions
As it stands in UseMod 1.0, reverting changes made by a SEO spammer to a well-indexed wiki has little effect on the PageRank? bonus acquired by the spammer since a link in a historical revision of a page accessed via the page history is nearly as effective as a link in the current revision of the page. I recommend using robots meta tags to mark certain pages as off-limits to search engines in order to reduce the incentive for SEO spammers. I've submitted a possible implementation of this feature at WikiPatches/RobotsMetaTag. -- RichardP
- I think this is a bare minimum anti-spam tactic. This should be included in the default install ASAP. The reason I say this is... The patch is unlikely to have any noticable effect upon any one particular wiki where it is installed. Spammers are generally too stupid to know the difference between one wiki and another. However they may start to notice if all usemod wikis (or a significant proportion) have these metatags. If spammers start to give up, then we all benefit, but at the moment the problem is only going to get worse.
- As usemod fans will know, this wiki software is very popular particularly because it is easy to install. The result is that there are many usemod administrators all over the internet who dont have the technical skills or the inclination to go about applying patches which have no obvious benefits. Which is why I think it is important that robots meta tags should be included in the default usemod install. -- Halz 18th Jan 2005
- I just added a third implementation at WikiPatches/RobotsNoFollow. I didn't see RichardP's shot at it. If mine isn't useful kill it and go with his as it is better than the first WikiPatches/RobotsNoFollow patch. --TomScanlan
As an alternative to robots metatags you could use robots.txt in the document root of the webserver if you have write permission there. Put something like the following lines in robots.txt to prevent history pages from being indexed by search engine bots.
- Markus, the suggestion of using robots.txt is a promising idea. That method, unfortunately, does have several drawbacks. First, it doesn't prevent robots from revisting and indexing pages that have been deleted. Second, not everyone who can install and run wiki.pl has sufficient rights to place a robots.txt file at the top level of their server's web hierarchy (see [here]). -- RichardP
- I don't understand your first point completely. What is the problem with pages that are marked for deletion?
- Markus, pages that have been successfully deleted can become a problem if they were indexed before they were deleted and the page URL contains attractive keywords. Such pages continue to attract undue attention even after they have been deleted, since the URL is revisited by search engines. This is because requests for deleted pages don't return a 404 error, they instead succeed and offer to let you create a new page. A classic example of this problem was the MeatBall:AdultCheck page, the original motivation behind the WikiPatches/NoIndex patch. -- RichardP
After changing the robot.txt, as well as the robot meta tag i thought, very well, how to explain this for a spammer, where its likely he/she will actually read it?
I chose the line meant to inform people editing a page, somewhere around line 82, between where it says on this wiki: "This change is a minor edit." and the save button.
So the wikis I care for now has the line 82 as follows:
$EditNote = "<h1>Spammer?</h1><h2>This site doesnt help you in your rankings!!</h2><h2>you are waisting your time!</h2><ul><li>All 'old pages' are removed from Google in the robot.txt, so after cleaning, Google will not see them</li><li>Certain other steps ar taken...some you will see soon, other later</li><li>Doing bad things here will give you a VERY bad Karma...</li></ul>"; # HTML notice above buttons on edit page
The spammer MUST read it there or?
In my style sheet the h2īs are with yellow background and BIG. Is there better ideas (Greatful for english corrections...) to do this? Dan Koehl
Prevent access from open proxies
Meatball wiki has just implemented a new feature to [detect and ban the use of 'open proxies']. For now you can get the patch from there. Anyone else tried this out on their wikis? Does it work OK? If so I think we should put this in the recommendations, and bring the patch to this wiki to make it available under WikiPatches -- Halz - 28th April 2005
- Ok this has now been put on WikiPatches/OpenProxy. I guess I will describe this solution breifly on WikiSpam page. -- Halz - 29th April 2005
Surge protection (otherwise known as edit throttling) is an obvious way of preventing spammers quickly destroying lots of pages. It also prevents spam bots using up all the server resources (bandwidth & CPU). So err... How does it work on UseMod
? On this WikiSpam
page nobody has filled in any details, and yet a comment from CliffordAdams
on meatball says:
"I have enabled rate-limiting code. There are limits to how many different pages may be edited for every 10 minutes, 1 hour, 6 hours, and 1 day. (There are also limits to the number of edits, but those are very high.) The code limits both individual users (by hostname), and all users as a group (with higher limits). If these limits are reached when you save a page, you will be returned to the edit page with a notice."
...Does this mean it is built in feature of the current software? -- Halz - 28th April 2005
Force users to subscribe/login before editing
After a month of WikiSpam
, I noticed some spammer habits:
- they create pages whose names where picked up in the sites pages. I delete the page, but 1 hour later, the process begins again. So it might be a robot, I think.
- they spam existing pages which are already opened. I edit a precedent revision, and, one 1 hour later, the page is spammed again.
- they could not edit/modify/spam locked pages (locked as a Administrator, in the UseMod parlance).
So I suspected robots to do do the spam, and modified the code of wiki.pl (V0.92 based) so any user must have an ID greater than 1000 to edit. After a week of test on my site [CmicWiki]
, it seems to work, though it could be breakable by a (more smarter) spammer, I think.
my ($id, $deeepChck) : @_;
# Anti-WikiSpam patch
return 0 if $UserID < 1000;
# endof Anti-WikiSpam
--MichelMarcon aka Cmic 2005-08-07 11:35:00