A collection of awesome web crawl, scraping, and spidering projects in different languages.
A python module framework for writing web spiders. Aims to make it easy to write bespoke crawlers to solve specific problems, like scanning an unusual blog for content. Write rules specific to the use case. plugin architecture, cross-platform.