A python module (Python3, specifically - Python2 support was obsoleted) that tries to be the Requests of HTML scraping. Designed with news sites in mind. Picks out names of authors, publication dates, text, URLs to images, any embedded media. keyword analysis. NLP Picks articles out of websites. URL extraction. Picks out categories. i18n support.
Documentation here: https://newspaper.readthedocs.io/en/latest/