A Python module that wraps around the Tantivy search engine.
Tantivy is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine. Tantivy is, in fact, strongly inspired by Lucene's design.
Full-text search. Configurable tokenizer (stemming available for 17 Latin languages) with third party support for Chinese, Japanese, and Korean. Fast. Tiny startup time (<10ms), perfect for command-line tools. BM25 scoring (the same as Lucene). Natural query language and phrase query search. Incremental indexing of data. Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop).
There is a CLI tool (tantivy-cli) that lets you do all the configuration and setting up from the command line.
A project that reads the output of the Pypi API endpoints, builds a local database with it, and lets you run searches (which Pypi hasn't let anyone do for years).
Stored package list using rust-fst (old but works on Python 3.13). Note: there is not support for the arm architecture. The package details are fetched in real time using the JSON API, so new versions appear instantly.
A fast open source source vector search and clustering engine. API bindings for multiple languages. Tries to be simple to use and extensible; if you're using it with C++ you only need to import one header file. Tries to be hardware agnostic; supports half-precision and quarter-precision with 16-bit floats and 8-bit integers, respectively. Can scan very large indices without loading the entire file into memory; implicitly supports serializing indices to disk. Heterogeneous lookups, renaming/relabeling, and on-the-fly deletions. Supports semantic search. Supports exact and approximate search.
A log file viewer for the terminal. Merge, tail, search, filter, and query log files with ease. No server. No setup. Still featureful.
Just point lnav at a directory and it will take care of the rest. File formats are automatically detected and compressed files are unpacked on the fly. Online help and previews for operations make it simpler to level up your experience. Can merge the files by time into a single view. Can tail the files, follow renames, find new files in directories in realtime. Can show you only warnings and errors, search with regular expressions, highlight matches, filter, and even do basic statistics and visualizations of what it finds.
Github: https://github.com/tstack/lnav
This is a database of Internet places. Mostly domains. Sometimes other things. Think of it as Internet meta database. This repository contains link metadata: title, description, publish date, etc.
The entire Internt is in one file! Just unzip internet.zip!
The mission is simple: to preserve and share LGBTQIA+ history. Originally a small project to organize historical resources, it has grown into a free, searchable digital archive accessible to educators, researchers, and anyone interested in queer history. The initiative is dedicated to making LGBTQIA+ history available to all and aims to expand into a comprehensive portal for LGBTQIA+ education. Partnerships with like-minded organizations are welcomed to preserve history through shared projects and initiatives.
The average consumer funds politicians and PACs about 3x more through their purchasing decisions as compared to their direct political contributions. Corporations earn profits off of your everyday purchases. And some of those profits are then donated to politicians and causes you might not agree with. The U.S. Supreme Court has said that corporations have a constitutional right to political speech. We want everyone to hear what they’re saying!
Goods Unite Us has spent thousands of hours vetting companies' political expenditures in federal elections. Use our tools to reach out to brands to let them know how you feel about their political expenditures. Goods Unite Us can help inform your purchases by exposing who you’re supporting when you shop certain brands and companies. Make sure your retirement savings and other investments aren't undermining your vote!
We created this site to help keep our community informed. If you witness activity, you can submit a report through our form, contributing to a more aware and prepared community. Use our form to report activity. Your submissions help keep others informed and aware. Reports are reviewed and displayed to reflect general areas where activity has been observed. You can submit reports without creating an account. Reports come from the community, helping to keep people informed in real time.
Easily discover books and ebooks available at your local library! As you browse books and e-books, the Library Extension can check your library's online catalog and display the availability of that item on the same page.
Access to more than one library? No more searching across multiple library catalogs. All conveniently displayed on the sites you visit already! You'll get a quick, convenient link to reserve the title from your library! See results from any of nearly 5000 supported libraries and library systems. No signup required.
When it comes to raw search speed FlexSearch outperforms every single searching library out there and also provides flexible search capabilities like multi-field search, phonetic transformations or partial matching. Depending on the used options it also provides the most memory-efficient index. FlexSearch introduce a new scoring algorithm called "contextual index" based on a pre-scored lexical dictionary architecture which actually performs queries up to 1,000,000 times faster compared to other libraries. FlexSearch also provides you a non-blocking asynchronous processing model as well as web workers to perform any updates or queries on the index in parallel through dedicated balanced threads.
Can be loaded as part of a website and run in the browser, so you don't need node.js. It's possible that it could be integrated with Pelican; I don't know, haven't tried yet. The latest stable builds can be downloaded right from the CDN. Impressively tiny. I'm not so sure about creating the index it uses, though.
uBlacklist lists for the 16 Companies that dominate search results. These lists were inspired by the article How Google is killing independent sites like ours on HouseFresh and Detailed.com's How 16 Companies are Dominating the World’s Google Search Results.
Bookmark links, take simple notes and store images and pdfs. Automatically tags your bookmarks using AI for faster retrieval. Automatically fetches title, description and images for links. Automatically archives what you add. Sort your bookmarks into lists for better organization. Search through all your bookmarks using full text search. SSO support.
Available for iOS and Android. Addons for Chrome and Firefox.
All AI/LLM functionality is local. No external services are used.
Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time.
Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database.
A strong attention to performance and code cleanliness has been given when designing Sonic. It aims at being crash-free, super-fast and puts minimum strain on server resources (our measurements have shown that Sonic - when under load - responds to search queries in the μs range, eats ~30MB RAM and has a low CPU footprint
Available in Arch as extra/sonic.
Configuration docs: https://github.com/valeriansaliou/sonic/blob/master/CONFIGURATION.md
fzf is a general-purpose command-line fuzzy finder. It's an interactive filter program for any kind of list; files, command history, processes, hostnames, bookmarks, git commits, etc. It implements a "fuzzy" matching algorithm, so you can quickly type in patterns with omitted characters and still get the results you want.
Run something through it, like the output of a command. Start typing parts of a regular expression, it'll show you what matches. Use the arrow keys to move the highlight around. Whatever you pick gets output on stdout.
webidx is a client-side search engine for static websites. It works by using a simple Perl script (webidx.pl) to generate an SQLite database containing an index of static HTML files. The SQLite database is then published alongside the static content.
The search functionality is implemented in webidx.js which uses sql.js to provide an interface to the SQLite file.
Seems like this should be pretty easy to plug into a Pelican workflow. I might want to write my own database generator in Python, though.
Maybe there's a way to enable vector searching in SQLite?
A plugin to take your published Pelican posts and put them into a SQLite database.
Once the plugin has been installed you only need to run make html to create a SQLite database called pelican.db which will be created in the root of your pelican site. There are partial instructions for using this to implement search for a site built with Pelican.
Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure. Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other website framework. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started. After indexing, Pagefind adds a static search bundle to your built files, which exposes a JavaScript search API that can be used anywhere on your site. Pagefind also provides a prebuilt UI that can be used with no configuration.
Github: https://github.com/cloudcannon/pagefind
I don't see why this couldn't be added to my Pelican workflow.
Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers. Inspired by Perplexity AI, it's an open-source option that not just searches the web but understands your questions. It uses advanced machine learning algorithms like similarity searching and embeddings to refine results and provides clear answers with sources cited. Using SearxNG to stay current and fully open source, Perplexica ensures you always get the most up-to-date information without compromising your privacy.
You can make use local LLMs such as Llama3 and Mixtral using Ollama. Normal or Copilot modes. Special modes to better answer specific types of questions. Some search tools might give you outdated info because they use data from crawling bots and convert them into embeddings and store them in a index. Unlike them, Perplexica uses SearxNG, a metasearch engine to get the results and rerank and get the most relevant source out of it, ensuring you always get the latest information without the overhead of daily data updates.
Has a documented installation process that doesn't require Docker.