Hacker News was—at least for a little while. At news.ycombinator.com recently, the robots.txt file was changed to disallow all crawling from search engines, as theNextWeb reports. However, Paul G. at Hacker News quickly explained:
Don’t worry, it doesn’t mean anything. The software for ranking applications runs on the same server, and it is horribly inefficient (something 4 people use every 6 months doesn’t tend to get optimized much). This weekend all of us were reading applications at the same time, and the system was getting so slow that I banned crawlers for a bit to buy us some margin. (Traffic from crawlers is much more expensive for us than traffic from human users, because it interacts badly with lazy item loading.) We only finished reading applications an hour before I had to leave for SXSW, so I forgot to set robots.txt back to the normal one, but I just did now.
There’s nothing wrong with that (though you’d hope you wouldn’t forget that kind of thing!). Rather than the User-agent: * Disallow: / theNextWeb spotted, Hacker News’s robots.txt now only disallows all user agents to five selected paths.
Can you ban all search engines (on purpose and for the long term)? Sure—that’s what robots.txt is for (I’m looking at you, newspaper sites who claim Google’s stealing your
bacon content). Some people do it just to keep search engines out; others do it to force themselves to develop other traffic streams. But if you do it, be sure to actually work on those other traffic streams, and to have a good on-site search capability.
What do you think? Would you ever block all search engines, for any reason?