Saturday, September 17, 2011

In Defense of Robots.txt



Back in April of last year, I posted a rant on why robots.txt is horrible. The real problem then is not robots.txt that I was hating, it was the fact that Archive.org retroactively blocked previous versions of websites that had robots.txt added to them (often domain squatters).

The real problem now is that in this day and age, bots are crawling the web, stealing content and creating clone caches that steal page views. It ultimately brought down Dan's 20th Century Abandonware because there was simply no way to block the bots while allowing full visitor access and findability. I know that my brother's website is somewhat limited by search because he blocks so many of these garbage bots.

Stolen content is everywhere: not in the form of copyrighted torrents (at least, not as much as the Hollywood magnates would like you to believe) but by actual content that was produced somewhere on the Internet and suddenly shows up in no less than half a dozen other areas.

No comments:

Post a Comment