Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
/codelucas/newspaper https://github.com/codelucas/newspaper
Mon 19 Mar 2018 11:59:31 PM PDT archive.org

A python module (Python3, specifically - Python2 support was obsoleted) that tries to be the Requests of HTML scraping. Designed with news sites in mind. Picks out names of authors, publication dates, text, URLs to images, any embedded media. keyword analysis. NLP Picks articles out of websites. URL extraction. Picks out categories. i18n support.

Documentation here: https://newspaper.readthedocs.io/en/latest/

articles github nlp python exocortex html text modules urls scraping i18n news categories
6298 links, including 411 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn