This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.
The entire Internt is in one file! Just unzip internet.zip!
MeiliSearch is a powerful, fast, open-source, easy to use, and deploy search engine. The search and indexation are fully customizable and handles features like typo-tolerance, filters, and synonyms. For more details about those features, go to our documentation. Has its own web search interface as well as an API. Searches its indices as you type. Smart enough to figure out typos and synonyms. Customizable. Create an index, then upload documents to it.
Easier to set up than Elasticsearch. More lightweight, too.
A fast, multi-threaded application that takes apart files, indexes them, and shoves them into Elastic Search. Tries to be portable. Relies upon Elastic Search, unfortunately. Indices can be transported elsewhere (say you've indexed offline storage media) and loaded into the engine.
A community-built and maintained database of science fiction, fantasy, and horror that includes bibliographic data, community reviews, ISBN numbers of as many editions as people can find (of use to amateur librarians such as myself), and links to anthologies.
python module for extracting text from different documents. Can also be used as a CLI utility. Can work with text-based formats like CSV, JSON, and HTML. Can work with binary formats like MS Word, MP3, and PDF. The list is fairly extensive.