github repo for a nosql database written entirely in and for python as a module. Zero external dependencies, no server. Document oriented database, fully test coverage, less than 1.5kloc. Anything that can be represented as a document can be accessed as a dict. Written in pure Python.
python module for extracting text from different documents. Can also be used as a CLI utility. Can work with text-based formats like CSV, JSON, and HTML. Can work with binary formats like MS Word, MP3, and PDF. The list is fairly extensive.
Whoosh is a Python library which implements full text searching and indexing of arbitrary text. Can be used to build custom search engines. Pythonic, fully extensible. Supposedly pretty fast, too. Multithreaded to take advantage of execution across multiple processor cores.
Python module which implements an unofficial API for Sci-Hub.
The Intercept's archive of leaked tigerswan documents.
3777 links, including 206 private