Webcrawlers/bots often identify themselves in the user agent string. Well it turns out, up until now, a huge majority of my bandwidth usage has come from bots scraping my site thousands of times a day.
A robots.txt file can advertise that you don't want bots to crawl your site. But it's completely voluntary—a bot may happily ignore it and scrape your site anyway. And I'm fine with webcrawlers indexing my site, so that it might be more discoverable. It's the bandwidth hogs that I want to block.