Qdrant is a vector database and vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more! Implement a unique custom modification of the HNSW algorithm for Approximate Nearest Neighbor Search. Support additional payload associated with vectors. Not only stores payload but also allows filter results based on payload values.
Unlike Elasticsearch post-filtering, Qdrant guarantees all relevant vectors are retrieved.
The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by asking questions in everyday language. Provides a flexible solution for semantic search and retrieval of personal or organizational documents using natural language queries.
I'm pushing these thoughts into the public sphere for two primary reasons:
(1) To facilitate a discussion around web search in which I can learn from others. I believe what I write here has solid merits but I also believe that we do our best work when we are challenged, encouraged, and refocused by others.
(2) To create a community of individuals who are interested in the future of web search. Particularly individuals who are interested in actively participating in this future.
Note that you needn't be part of the second for me to value your input on the first. I don't want to miss out on wisdom from those who have other commitments/priorities than this project.
Below is a list of companies hiring. We're checking regularly with companies directly to keep this live list up-to-date. Join our talent pool (beta) & community to get matched with opportunities.
scavenger is a multi-threaded post-exploitation scanning tool for scavenging systems, finding most frequently used files and folders as well as "interesting" files containing sensitive information.
NExfil is an OSINT tool written in python for finding profiles by username. The provided usernames are checked on over 350 websites within few seconds. The goal behind this tool was to get results quickly while maintaining low amounts of false positives.
HackerBoards is an established comparison website for any single-board computer (SBC), module (SoM) and Linux-supported development board. With over 450 active entries, Board-DB is the largest online database and comparison tool for single board computers (SBCs), computing modules (SoMs), and development boards.
HaveIBeenTrained uses clip retrieval to search the Laion-5B and Laion-400M image datasets. These are currently the largest public text-to-image datsets, and they are used to train models like Stable Diffusion, Imagen, among many others.
When it's time to train a generative AI system, organizations like Stability use those datasets to download the images from their links and present them to the model with their captions.
With HaveIBeenTrained, artists can search these databases for links to their work and flag them for removal. We partner with Laion, who built these datasets, to remove those links. This helps ensure that future models will not be trained with work that has been opted out.
Explore thousands of Mastodon Servers spanning any topic you can think of on our Mastodon Server List. Curated.
Torrents.csv is a collaborative git repository of torrents, consisting of a single, searchable torrents.csv file. Its initially populated with a January 2017 backup of the pirate bay, and new torrents are periodically added from various torrents sites. It comes with a self-hostable webserver, a command line search, and a folder scanner to add torrents.
Torrents.csv will only store torrents with at least one seeder to keep the file small, will be periodically purged of non-seeded torrents, and sorted by seeders descending.
Has a REST API.
Experimental website to browse and search vintage computer files from archive.org. Thousands of new files are added daily!
A collection of several hundred online tools for OSINT.
Zinc is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of the resources. It uses bluge as the underlying indexing library.
It is very simple and easy to operate as opposed to Elasticsearch which requires a couple dozen knobs to understand and tune which you can get up and running in 2 minutes
It is a drop-in replacement for Elasticsearch if you are just ingesting data using APIs and searching using kibana (Kibana is not supported with zinc. Zinc provides its own UI).
While Elasticsearch is a very good product, it is complex and requires lots of resources and is more than a decade old. I built Zinc so it becomes easier for folks to use full text search indexing without doing a lot of work.
A self-hosted bookmark database with full-text page content search. Bookmarklet support. Bookmark content is scraped and indexed locally. Page content periodically refreshed automatically. Full-text search of all stored data. No separate database required. Easily export your bookmarks to a plain text file - your data is yours. Even has .deb and .rpm packages for installation and upgrading.
Spyglass is a search platform that lives on your device, indexing what you want, exposing it to you in a super simple and fast interface. Warning: Spyglass is very much in its early stages, but it’s in a place where it's functional and can be used to replace basic searches.
Spyglass is a solution to address the following common issues when searching the web.
A still alive and updated archive of telephony information. Area codes, exchanges, regional telcos, rate centers, deployed hardware types, and more.
You can even search on some data sets.
Free Competitors is a server software to make websites that help users find Free Software replacements to proprietary software. And search Free Software alternatives to other Free Software.
The things you might want to replace and what you could replace them with are all in here as JSON files: https://notabug.org/jyamihud/FreeCompetitors/src/master/apps
Enter a URL. Search a bunch of online archives simultaneously for the data stored there.
When performing passive recon on a target, there are dozens of tools we can use to gather various pieces of intel on our target. This tool will allow us to parse these utilities easily.
Acrossword is a small async wrapper around the SentenceBERT library. It has a convenient object-oriented API with two main purposes:
zero-shot text classification
It's useful if you want to avoid larger bloated libraries with capabilities you don't need, and comes with zero fuss.