Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
page 1 / 2
38 results tagged searchengine  ✕   ✕
Unobtanium Search https://unobtanium.rocks/
Mon 26 May 2025 09:20:06 PM PDT archive.org

Unobtanium is a web-crawler with a search frontend, or simpler stated: It's a search engine. The developers instance is over at unobtanium.rocks and tries to be a technology and personal websites focused search engine. Unobtanium makes heavy use of SQLite.

Git: https://codeberg.org/unobtanium/unobtanium

The docs specifically talk about how to plug Unobtanium into SearxNG: https://doc.unobtanium.rocks/manual/searxng/

searchengine service opensource rust crawlers webapps selfhosted
edoardottt/awesome-hacker-search-engines https://github.com/edoardottt/awesome-hacker-search-engines
Wed 16 Apr 2025 01:10:56 PM PDT archive.org

A curated list of search engines useful during penetration testing, vulnerability assessments, red/blue team operations, bug bounties, and more.

awesome searchengine hacking sysadmin sites services sysadmin vulnerabilities advisories exploits internet analysis osint
Openverse: Openly licensed images, audio, and more https://openverse.org/
Mon 03 Feb 2025 04:55:11 PM PST archive.org

Openverse is a tool that allows openly licensed and public domain works to be discovered and used by everyone.

Openverse searches across more than 800 million images and audio tracks from open APIs and the Common Crawl dataset. We aggregate works from multiple public repositories, and facilitate reuse through features like one-click attribution.

Currently Openverse only searches images and audio tracks, with search for video provided through External Sources. We plan to add additional media types such as open texts and 3D models, with the ultimate goal of providing access to the estimated 2.5 billion CC licensed and public domain works on the web. All of our code is open source and can be accessed at the Openverse GitHub repository. We welcome community contribution. You can see what we’re currently working on.

Openverse is the successor to CC Search which was launched by Creative Commons in 2019, after its migration to WordPress in 2021. You can read more about this transition in the official announcements from Creative Commons and WordPress. We remain committed to our goal of tackling discoverability and accessibility of open access media.

Openverse does not verify licensing information for individual works, or whether the generated attribution is accurate or complete. Please independently verify the licensing status and attribution information before reusing the content.

Github: https://github.com/wordpress/openverse

searchengine creativecommons open media images audio video python
Old'aVista: The most powerful guide to the old Internet. https://oldavista.com/
Tue 07 Jan 2025 08:13:17 PM PST archive.org

Old'aVista is a search engine focused on personal websites that used to be hosted on services like Geocities, Angelfire, AOL, Xoom and so on. In no way it should compete with any of the famous search engines as it's focused on finding historic personal websites. The data was acquired by scraping pages from the Internet Archive. I basically used a node application I built with some starting links and I saved all the links I found in a queue and the text from the pages in the index. Old'aVista's design is based on the 1999 version of the defunct Altavista search engine. The name of the website itself is a wordplay on Altavista. My original idea as to get old search engines and make them functional again, but I decided to make this website its own thing while maintaining the nostalgia factor.

searchengine retrocomputing websites directories historical archives
Index Now https://www.indexnow.org/
Sun 24 Nov 2024 12:01:43 PM PST archive.org

IndexNow is an easy way for websites owners to instantly inform search engines about latest content changes on their website. In its simplest form, IndexNow is a simple ping so that search engines know that a URL and its content has been added, updated, or deleted, allowing search engines to quickly reflect this change in their search results.

Without IndexNow, it can take days to weeks for search engines to discover that the content has changed, as search engines don’t crawl every URL often. With IndexNow, search engines know immediately the "URLs that have changed, helping them prioritize crawl for these URLs and thereby limiting organic crawling to discover new content."

IndexNow is offered under the terms of the Attribution-ShareAlike Creative Commons License and has support from Microsoft Bing, Naver, Seznam.cz, Yandex, Yep.

IndexNow-enabled search engines shares immediately all URLs submitted to all other IndexNow-enabled search engines, so you just need to notify one endpoint.

searchengine notifications updates standards rest api
Alexandria https://www.alexandria.org/
Tue 01 Oct 2024 09:26:58 AM PDT archive.org

Alexandria.org is a non-profit, ad-free search engine. Our goal is to provide the best available information without compromise. The index is built on data from Common Crawl and the engine is written in C++. The source code is available. We are still at an early stage of development and running the search engine on a shoestring budget.

Github:

  • Search engine: https://github.com/alexandria-org/alexandria
  • REST API: https://github.com/alexandria-org/alexandria-api
  • Front end: https://github.com/alexandria-org/alexandria-frontend

In theory you can set up your own instance. In practice, I don't know how practical that would be.

searchengine opensource rest api cpp php
valeriansaliou/sonic https://github.com/valeriansaliou/sonic
Mon 26 Aug 2024 12:22:30 PM PDT archive.org

Sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time.

Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database.

A strong attention to performance and code cleanliness has been given when designing Sonic. It aims at being crash-free, super-fast and puts minimum strain on server resources (our measurements have shown that Sonic - when under load - responds to search queries in the μs range, eats ~30MB RAM and has a low CPU footprint

Available in Arch as extra/sonic.

Configuration docs: https://github.com/valeriansaliou/sonic/blob/master/CONFIGURATION.md

rust search searchengine lightweight nlp exocortex archival
Clew https://clew.se/
Sat 24 Aug 2024 07:43:30 PM PDT archive.org

Clew is a web search engine trying to be different from the rest.

  • We focus on writing by independent creators. There's no point in indexing Wikipedia; it has its own perfectly serviceable search bar.
  • We are not ad-supported; currently, Clew is a labor of love by Benjamin Hollon and donation-supported.
  • Because our funding source is not dependent on the interests of for-profit businesses and corporations, our search rankings are unbiased and apply the same standards equally to all indexed domains.
  • In addition, we incorporate variables that would not be in the interests of commercial search engines, such as whether we detect invasive ads or tracking on a site and how much bandwidth pages require to download, perfect for individuals who want to see whether a website will respect their privacy and data before opening it.

Git repo: https://codeberg.org/Clew/Clew

searchengine service smolnet python postgres
FrogFind! http://frogfind.com/
Fri 12 Jul 2024 07:27:26 PM PDT archive.org

Hi, I'm Sean, A.K.A. Action Retro on YouTube. I work on a lot of 80's and 90's Macs (and other vintage machines), and I really like to try and get them online. However, the modern internet is not kind to old machines, which generally cannot handle the complicated javascript, CSS, and encryption that modern sites have. However, they can browse basic websites just fine. So I decided to see how much of the internet I could turn into basic websites, so that old machines can browse the modern internet once again!

The search functionality of FrogFind is basically a custom wrapper for DuckDuckGo search, converting the results to extremely basic HTML that old browsers can read. When clicking through to pages from search results, those pages are processed through a PHP port of Mozilla's Readability, which is what powers Firefox's reader mode. I then further strip down the results to be as basic HTML as possible.

I designed FrogFind with classic Macs in mind, so I've been testing on my SE/30 to make sure it looks good in 1 bit color with a 512x384 resolution. Most of my testing has been on Netscape 1.1N and 2.0.2, as well as a few 68k Mac versions of iCab. FrogFind should also work great on any text-based web browser!

retrotech retrocomputing searchengine resources community proxy lightweight html
OpenOrb: A curated search engine for Atom and RSS feeds. https://git.sr.ht/~lown/openorb
Thu 13 Jun 2024 03:31:33 AM PDT archive.org

An OpenOrb instance is configured with a list of feeds to search - what was once called a blogroll - and indexes this list periodically. In this way, OpenOrb provides a window into the content a specific person or community cares about, with the benefit of making this content searchable and therefore more accessible. OpenOrb is designed to be the opposite of Google and other black box, monolithic search engines - it's open source, configurable, personal, and predictable.

OpenOrb uses an extremely simple search engine, mostly adapted from Alex Molas's superb 'search engine in 80 lines of Python', which uses BM25 and doesn't handle wildcards, boolean operators, or stemming/lemmatization (yet). I might add some of these features in the future, but I wrote OpenOrb in a single day so I'm keeping it simple for now!

The success of a search on OpenOrb suffers from the same limitations as RSS in general - people and software have different, and sometimes weird, opinions on how to present structured data. If a feed doesn't include full post content (which it should!), then OpenOrb won't be able to index it. If a feed has a cut-off limit for how many posts are in it, OpenOrb will only know about the ones that it had time to save. Embrace the limitations of the messy open web, and tell your friends to stop deleting posts from their feeds.

searchengine rss curated python exocortex
Search Engines https://neoxion.net/engines-all-internet/
Thu 16 May 2024 01:43:28 PM PDT archive.org

Somebody's directory of search engines.

searchengine directory services
cblgh/lieu https://github.com/cblgh/lieu
Sun 05 May 2024 07:36:42 PM PDT archive.org

Created in response to the environs of apathy concerning the use of hypertext search and discovery. In Lieu, the internet is not what is made searchable, but instead one's own neighbourhood. Put differently, Lieu is a neighbourhood search engine, a way for personal webrings to increase serendipitous connexions.

Lieu's crawl & precrawl commands output to standard output, for easy inspection of the data. You typically want to redirect their output to the files Lieu reads from, as defined in the config file. See below for a typical workflow.

golang searchengine exocortex webrings
Fidigger https://fidigger.com/
Wed 01 May 2024 07:37:01 PM PDT archive.org

This service is a search engine that looks for public archives at different File Sharing Services that are not so well known. These services do not offer a simple option to find files hosted on their servers.

searchengine files services
Discmaster https://discmaster.textfiles.com/
Wed 24 Jan 2024 09:51:36 AM PST archive.org

An experimental semantic search site for vintage computing files stored at the Internet Archive.

online searchengine retrocomputing vintage semantic files software archives
Ichido https://ichi.do/
Mon 15 Jan 2024 08:54:39 PM PST archive.org
searchengine
Aleph: Find public records and leaks. https://search.burojansen.nl/
Tue 09 Jan 2024 09:55:32 PM PST archive.org

Aleph is a powerful tool for people who follow the money. It helps investigators to securely access and search large amounts of data - no matter whether they are a government database or a leaked email archive.

Requires a (free?) account?

Consider adding to Searx?

searchengine people money companies organizations datasets leaks investigation littlesister ironmonger
mwmbl/mwmbl https://github.com/mwmbl/mwmbl
Sun 22 Oct 2023 07:40:14 PM PDT archive.org

Mwmbl is a non-profit, ad-free, free-libre and free-lunch search engine with a focus on useability and speed. At the moment it is little more than an idea together with a proof of concept implementation of the web front-end and search technology on a small index. Our vision is a community working to provide top quality search particularly for hackers, funded purely by donations.

We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension. This will crawl the web in the background, retrieving one page a second. It does not use or access any of your personal data. Instead it crawls the web at random, using the top scoring sites on Hacker News as seed pages. After extracting a summary of each page, it batches these up and sends the data to a central server to be stored and indexed.

Seems to require Postgres.

If you try installing it with Poetry you'll bounce off of newer versions of Python (it specifically looks for <= v3.11), but if you pick apart poetry.lock and do things manually you might have better luck.

python searchengine selfhosted exocortex crawlers
Stract: Completely Open Search Engine https://stract.com/
Thu 07 Sep 2023 12:29:00 PM PDT archive.org

Stract is an open source search engine where the user has the ability to see exactly what is going on and customize almost everything about their search results. It's a search engine made for hackers and tinkerers just like ourselves. No more searches where some of the terms in the query arent used, and the engine tries to guess what you really meant. You get what you search for.

Oh, and if we ever become evil (maybe by changing our motto) please take our code and start a competitor. The fact that you have this ability will make sure that our values will always be aligned with our users.

We will also have a paid API for developers.

Github: https://github.com/StractOrg/stract

The build process is outlined in CONTRIBUTING.md. It mentions an optional configuration option for Alice, an AI search assistant.

REST API docs: https://trystract.com/beta/api/docs/

searchengine opensource rust rest api selfhosted
Judy Records - Free Public Records Search https://www.judyrecords.com/
Sun 23 Jul 2023 02:51:15 PM PDT archive.org

A search engine for almost a billion US court cases and records.

REST API: https://www.judyrecords.com/api

(You have to e-mail them and request an API key.)

searchengine osint court records cases legal rest api
SauceNAO Reverse Image Search https://saucenao.com/
Tue 10 Jan 2023 06:33:18 PM PST archive.org

SauceNAO is a reverse image search engine. The name 'SauceNAO' is derived from a slang form of "Need to know the source of this Now!" which has found common usage on image boards and other similar sites.

searchengine images osint
page 1 / 2
6654 links, including 429 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn