Cross-platform, open-source voice assistant and framework to build fully-featured, offline machines you can talk to. Self-hosted. Desktop and mobile clients. Repos of note:
A huge blocklist of sites that contain AI generated content, for the purposes of cleaning image search engines (Google Search, DuckDuckGo, and Bing) with uBlock Origin or uBlacklist. There is also a Pi-Hole compatible list in the repo.
list.txt can probably be processed and used to build a blocking database for search bots.
AlgorithmWatch is a human rights organization based in Berlin and Zurich. We fight for a world where algorithms and Artificial Intelligence (AI) do not weaken justice, democracy, and sustainability, but strengthen them.
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, retrieval augmented generation and more. Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts.
Features
Artificial Intelligence (AI) is often presented like a complex field, the state of the art being impossible to understand, models too large to train, incredible work in progress moving forward that could change anything, yet a black box inscrutable for anyone except the selected few.
This is truly damaging to the field as it is a fascinating topic and even though indeed nobody can understand it all, we can all benefit from tinkering with it, learning from it and possibly even using it.
Regardless of all those limitation the goal here is to showcase that even though not everything can be done on your desktop, a lot can. Composing from that and learning how it works can help to reconsider a potential feeling of helplessness. Not only can you self-host AI models, use them, adapt them, but there is a whole community and set of tools to help you do so. This movement itself is very encouraging. AI does not have to be a block box. Your digital life does not have to be owned by someone else, even for the state of the art.
Your AI second brain. A copilot to search and chat (using RAG) with your knowledge base (pdf, markdown, org). Use powerful, online (e.g gpt4) or private, offline (e.g mistral) LLMs. Self-host locally or have it always accessible on the cloud. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp
Khoj is an AI application to search and chat with your notes and documents. It is open-source, self-hostable and accessible on Desktop, Emacs, Obsidian, Web and Whatsapp. It works with pdf, markdown, org-mode, notion files and github repositories. It can paint, search the internet and understand speech.
Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.
With Weaviate, you can turn your text, images and more into a searchable vector database using state-of-the-art ML models. Weaviate typically performs a 10-NN neighbor search out of millions of objects in single-digit milliseconds. You can use Weaviate to conveniently vectorize your data at import time, or alternatively you can upload your own vectors (say, if you download a model from OpenAI or HuggingFace). Weaviate powers lightning-fast vector searches, but it is capable of much more. Some of its other superpowers include recommendation, summarization, and integrations with neural search frameworks.
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.
Millisecond search on trillion vector datasets. Rich APIs designed for data science workflows. Consistent user experience across laptop, local cluster, and cloud. Embed real-time search and analytics into virtually any application. Component-level scalability makes it possible to scale up and down on demand. Milvus can autoscale at a component level according to the load type, making resource scheduling much more efficient.
Welcome to Machine Learning Systems with TinyML. This book is your gateway to the fast-paced world of AI systems through the lens of embedded systems. It is an extension of the course, TinyML from CS249r at Harvard University.
Our aim is to make this open-source book a collaborative effort that brings together insights from students, professionals, and the broader community of applied machine learning practitioners. We want to create a one-stop guide that dives deep into the nuts and bolts of AI systems and their many uses.
Insight into the hidden ecosystem of autonomous chatbots and data scrapers crawling across the web. Protect your website from unwanted AI agent access. You can submit newly spotted agents. No data feeds so you have to sign up.
Maintaining a robots.txt file will help protect your website from unwanted AI agent access. The Dark Visitors list is continuously updated so you can control the behavior of all known AI agents.
An interactive visualization (with simple explanations) of how large language models work.
A database that tries to make it easy to build an LLM-like search database. Super-simple API for loading data and querying it.
You can do everything in your code or run it as a server (chroma run --path /path/to/datastore/on/disk
) and use an HTTP client to interact with it.
This is an amalgam of TTP's on different offensive ML attacks encompassing the ML supply chain and adversarial ML attacks.
It is focused heavily on attacks that have code you can use to perform the attacks right away, rather than a database of research papers. (PoC or GTFO type logic). Generally speaking if it is here I have tested it and it works. The intent is to help red teams and offensive practitioners quickly understand what tool in the toolbox to use to attack ML environments.
This is a living vault. It is very much not a finished list of resources. There are pages that are polished, and some that are little more than placeholders with a few bullet points that I jotted down during conferences or on the fly.
The goal is to organize the attacks in a way that is useful to red team operators rather than useful for say, academics trying to understand adversarial ML.
The "Awesome GPTs (Agents) Repo" represents an initial effort to compile a comprehensive list of GPT agents focused on cybersecurity (offensive and defensive), created by the community. Please note, this repository is a community-driven project and may not list all existing GPT agents in cybersecurity. Contributions are welcome – feel free to add your own creations!
Disclaimer: Users should exercise caution and evaluate the agents before use. Additionally, please note that some of these GPTs are still in experimental test phase.
To escape a deluge of generated content, companies are screening your resumes and documents using AI. This website allows you to inject invisible text into your PDF that will make any AI language model think you are the perfect candidate for the job.
When you select a PDF file, the text in the textbox is inserted on the first page of the document. The text is rendered with minimum font size and opacity, so it is invisible to the human eye. However, it is still visible to AI text recognition algorithms.
This is a freely available online course on neuroscience for people with a machine learning background. The aim is to bring together these two fields that have a shared goal in understanding intelligent processes. Rather than pushing for “neuroscience-inspired” ideas in machine learning, the idea is to broaden the conceptions of both fields to incorporate elements of the other in the hope that this will lead to new, creative thinking.
The course is given in person at the Department for Electrical and Electronic Engineering, Imperial College London, and made freely available online (although without the practical classes).
Each week there are a series of videos to watch on YouTube, and a set of exercises available as a Jupyter notebook that can be run locally or via Google Colab. Students at Imperial College can discuss on Teams, and for everyone else there is an open Discord server.
Github: https://github.com/neuro4ml
Create an ai.txt file for your website to set permissions for text and data mining. Use the toggles to allow or block your content from being used to train AI models. By default all content is opted out. Selecting allow for any content type will let data miners know that they may use content on your website of that media type.
There is no guarantee that anybody will ever obey this, but it can't hurt to try.
Uses sdxl-emoji to turn natural language descriptions into custom emoji.
Github: https://github.com/cbh123/emoji