Webcrawlers/bots often identify themselves in the user agent string. Well it turns out, up until now, a huge majority of my bandwidth usage has come from bots scraping my site thousands of times a day.
A robots.txt file can advertise that you don't want bots to crawl your site. But it's completely voluntary—a bot may happily ignore it and scrape your site anyway. And I'm fine with webcrawlers indexing my site, so that it might be more discoverable. It's the bandwidth hogs that I want to block.
A quick post today showing some different ways to block visitors via their IP address. This can be useful for a variety of reasons, including stopping some stupid script kiddie from harassing your site, or preventing some creepy stalker loser from lurking around your forums, or even silencing the endless supply of angry trolls that never seem to get a clue. So many reasons why, and so many ways to block them.
This article, Stupid .htaccess Tricks, covers just about every .htaccess “trick” in the book, and easily is the site’s most popular resource. I hope that you find it useful, and either way thank you for visiting :)