A collection of applications able to interact with websites, without requiring the user to open them in a browser. It also provides well-defined APIs to talk to websites lacking one. Automate access and extraction of data from websites that don't make it easy or possible. Has applications for adding accessibility to sites that are unfriendly to the visually impaired. Tries to focus on quality of results. Multiple interfaces.
This code demonstrates how to scrape the Doomsday Clock to get the current value. It has a CSS selector, source, and regular expression to extract the current time.
A Python module that tries to make parsing HTML as easy to do as Requests makes HTTP requests easy. Written by the same developer, in fact. Built on top of Requests, so you don't have to juggle both. Python v3.6 and later only. Full JS support, CSS selectors, XPath selectors, user-agent spoofing, automatic redirects.
SelectorGadget is a bookmarklet (which works on pretty much any browser) or a Chrome Plugin that makes it easy to generate a CSS selector for writing CSS or scraping web pages to pick out only the bits you're interested in. Click on the part of the page you want, then click on a part you don't want (if there is one). Lets you pick out multiple bits and stack them into a single selector.
A python module (Python3, specifically - Python2 support was obsoleted) that tries to be the Requests of HTML scraping. Designed with news sites in mind. Picks out names of authors, publication dates, text, URLs to images, any embedded media. keyword analysis. NLP Picks articles out of websites. URL extraction. Picks out categories. i18n support.
Documentation here: https://newspaper.readthedocs.io/en/latest/
An excellent summary of CSS selectors!