Unobtanium is a web-crawler with a search frontend, or simpler stated: It's a search engine. The developers instance is over at unobtanium.rocks and tries to be a technology and personal websites focused search engine. Unobtanium makes heavy use of SQLite.
Git: https://codeberg.org/unobtanium/unobtanium
The docs specifically talk about how to plug Unobtanium into SearxNG: https://doc.unobtanium.rocks/manual/searxng/
A curated list of search engines useful during penetration testing, vulnerability assessments, red/blue team operations, bug bounties, and more.
Openverse is a tool that allows openly licensed and public domain works to be discovered and used by everyone.
Openverse searches across more than 800 million images and audio tracks from open APIs and the Common Crawl dataset. We aggregate works from multiple public repositories, and facilitate reuse through features like one-click attribution.
Currently Openverse only searches images and audio tracks, with search for video provided through External Sources. We plan to add additional media types such as open texts and 3D models, with the ultimate goal of providing access to the estimated 2.5 billion CC licensed and public domain works on the web. All of our code is open source and can be accessed at the Openverse GitHub repository. We welcome community contribution. You can see what we’re currently working on.
Openverse is the successor to CC Search which was launched by Creative Commons in 2019, after its migration to WordPress in 2021. You can read more about this transition in the official announcements from Creative Commons and WordPress. We remain committed to our goal of tackling discoverability and accessibility of open access media.
Openverse does not verify licensing information for individual works, or whether the generated attribution is accurate or complete. Please independently verify the licensing status and attribution information before reusing the content.
Old'aVista is a search engine focused on personal websites that used to be hosted on services like Geocities, Angelfire, AOL, Xoom and so on. In no way it should compete with any of the famous search engines as it's focused on finding historic personal websites. The data was acquired by scraping pages from the Internet Archive. I basically used a node application I built with some starting links and I saved all the links I found in a queue and the text from the pages in the index. Old'aVista's design is based on the 1999 version of the defunct Altavista search engine. The name of the website itself is a wordplay on Altavista. My original idea as to get old search engines and make them functional again, but I decided to make this website its own thing while maintaining the nostalgia factor.
IndexNow is an easy way for websites owners to instantly inform search engines about latest content changes on their website. In its simplest form, IndexNow is a simple ping so that search engines know that a URL and its content has been added, updated, or deleted, allowing search engines to quickly reflect this change in their search results.
Without IndexNow, it can take days to weeks for search engines to discover that the content has changed, as search engines don’t crawl every URL often. With IndexNow, search engines know immediately the "URLs that have changed, helping them prioritize crawl for these URLs and thereby limiting organic crawling to discover new content."
IndexNow is offered under the terms of the Attribution-ShareAlike Creative Commons License and has support from Microsoft Bing, Naver, Seznam.cz, Yandex, Yep.
IndexNow-enabled search engines shares immediately all URLs submitted to all other IndexNow-enabled search engines, so you just need to notify one endpoint.
Alexandria.org is a non-profit, ad-free search engine. Our goal is to provide the best available information without compromise. The index is built on data from Common Crawl and the engine is written in C++. The source code is available. We are still at an early stage of development and running the search engine on a shoestring budget.
Github:
In theory you can set up your own instance. In practice, I don't know how practical that would be.
Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time.
Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database.
A strong attention to performance and code cleanliness has been given when designing Sonic. It aims at being crash-free, super-fast and puts minimum strain on server resources (our measurements have shown that Sonic - when under load - responds to search queries in the μs range, eats ~30MB RAM and has a low CPU footprint
Available in Arch as extra/sonic.
Configuration docs: https://github.com/valeriansaliou/sonic/blob/master/CONFIGURATION.md
Clew is a web search engine trying to be different from the rest.
Git repo: https://codeberg.org/Clew/Clew
Hi, I'm Sean, A.K.A. Action Retro on YouTube. I work on a lot of 80's and 90's Macs (and other vintage machines), and I really like to try and get them online. However, the modern internet is not kind to old machines, which generally cannot handle the complicated javascript, CSS, and encryption that modern sites have. However, they can browse basic websites just fine. So I decided to see how much of the internet I could turn into basic websites, so that old machines can browse the modern internet once again!
The search functionality of FrogFind is basically a custom wrapper for DuckDuckGo search, converting the results to extremely basic HTML that old browsers can read. When clicking through to pages from search results, those pages are processed through a PHP port of Mozilla's Readability, which is what powers Firefox's reader mode. I then further strip down the results to be as basic HTML as possible.
I designed FrogFind with classic Macs in mind, so I've been testing on my SE/30 to make sure it looks good in 1 bit color with a 512x384 resolution. Most of my testing has been on Netscape 1.1N and 2.0.2, as well as a few 68k Mac versions of iCab. FrogFind should also work great on any text-based web browser!
An OpenOrb instance is configured with a list of feeds to search - what was once called a blogroll - and indexes this list periodically. In this way, OpenOrb provides a window into the content a specific person or community cares about, with the benefit of making this content searchable and therefore more accessible. OpenOrb is designed to be the opposite of Google and other black box, monolithic search engines - it's open source, configurable, personal, and predictable.
OpenOrb uses an extremely simple search engine, mostly adapted from Alex Molas's superb 'search engine in 80 lines of Python', which uses BM25 and doesn't handle wildcards, boolean operators, or stemming/lemmatization (yet). I might add some of these features in the future, but I wrote OpenOrb in a single day so I'm keeping it simple for now!
The success of a search on OpenOrb suffers from the same limitations as RSS in general - people and software have different, and sometimes weird, opinions on how to present structured data. If a feed doesn't include full post content (which it should!), then OpenOrb won't be able to index it. If a feed has a cut-off limit for how many posts are in it, OpenOrb will only know about the ones that it had time to save. Embrace the limitations of the messy open web, and tell your friends to stop deleting posts from their feeds.
Somebody's directory of search engines.
Created in response to the environs of apathy concerning the use of hypertext search and discovery. In Lieu, the internet is not what is made searchable, but instead one's own neighbourhood. Put differently, Lieu is a neighbourhood search engine, a way for personal webrings to increase serendipitous connexions.
Lieu's crawl & precrawl commands output to standard output, for easy inspection of the data. You typically want to redirect their output to the files Lieu reads from, as defined in the config file. See below for a typical workflow.
This service is a search engine that looks for public archives at different File Sharing Services that are not so well known. These services do not offer a simple option to find files hosted on their servers.
An experimental semantic search site for vintage computing files stored at the Internet Archive.
Aleph is a powerful tool for people who follow the money. It helps investigators to securely access and search large amounts of data - no matter whether they are a government database or a leaked email archive.
Requires a (free?) account?
Consider adding to Searx?
Mwmbl is a non-profit, ad-free, free-libre and free-lunch search engine with a focus on useability and speed. At the moment it is little more than an idea together with a proof of concept implementation of the web front-end and search technology on a small index. Our vision is a community working to provide top quality search particularly for hackers, funded purely by donations.
We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension. This will crawl the web in the background, retrieving one page a second. It does not use or access any of your personal data. Instead it crawls the web at random, using the top scoring sites on Hacker News as seed pages. After extracting a summary of each page, it batches these up and sends the data to a central server to be stored and indexed.
Seems to require Postgres.
If you try installing it with Poetry you'll bounce off of newer versions of Python (it specifically looks for <= v3.11), but if you pick apart poetry.lock and do things manually you might have better luck.
Stract is an open source search engine where the user has the ability to see exactly what is going on and customize almost everything about their search results. It's a search engine made for hackers and tinkerers just like ourselves. No more searches where some of the terms in the query arent used, and the engine tries to guess what you really meant. You get what you search for.
Oh, and if we ever become evil (maybe by changing our motto) please take our code and start a competitor. The fact that you have this ability will make sure that our values will always be aligned with our users.
We will also have a paid API for developers.
Github: https://github.com/StractOrg/stract
The build process is outlined in CONTRIBUTING.md. It mentions an optional configuration option for Alice, an AI search assistant.
REST API docs: https://trystract.com/beta/api/docs/
A search engine for almost a billion US court cases and records.
REST API: https://www.judyrecords.com/api
(You have to e-mail them and request an API key.)
SauceNAO is a reverse image search engine. The name 'SauceNAO' is derived from a slang form of "Need to know the source of this Now!" which has found common usage on image boards and other similar sites.