AlgorithmWatch is a human rights organization based in Berlin and Zurich. We fight for a world where algorithms and Artificial Intelligence (AI) do not weaken justice, democracy, and sustainability, but strengthen them.
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling, retrieval augmented generation and more. Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts.
Features
Artificial Intelligence (AI) is often presented like a complex field, the state of the art being impossible to understand, models too large to train, incredible work in progress moving forward that could change anything, yet a black box inscrutable for anyone except the selected few.
This is truly damaging to the field as it is a fascinating topic and even though indeed nobody can understand it all, we can all benefit from tinkering with it, learning from it and possibly even using it.
Regardless of all those limitation the goal here is to showcase that even though not everything can be done on your desktop, a lot can. Composing from that and learning how it works can help to reconsider a potential feeling of helplessness. Not only can you self-host AI models, use them, adapt them, but there is a whole community and set of tools to help you do so. This movement itself is very encouraging. AI does not have to be a block box. Your digital life does not have to be owned by someone else, even for the state of the art.
Your AI second brain. A copilot to search and chat (using RAG) with your knowledge base (pdf, markdown, org). Use powerful, online (e.g gpt4) or private, offline (e.g mistral) LLMs. Self-host locally or have it always accessible on the cloud. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp
Khoj is an AI application to search and chat with your notes and documents. It is open-source, self-hostable and accessible on Desktop, Emacs, Obsidian, Web and Whatsapp. It works with pdf, markdown, org-mode, notion files and github repositories. It can paint, search the internet and understand speech.
Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients.
With Weaviate, you can turn your text, images and more into a searchable vector database using state-of-the-art ML models. Weaviate typically performs a 10-NN neighbor search out of millions of objects in single-digit milliseconds. You can use Weaviate to conveniently vectorize your data at import time, or alternatively you can upload your own vectors (say, if you download a model from OpenAI or HuggingFace). Weaviate powers lightning-fast vector searches, but it is capable of much more. Some of its other superpowers include recommendation, summarization, and integrations with neural search frameworks.
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.
Millisecond search on trillion vector datasets. Rich APIs designed for data science workflows. Consistent user experience across laptop, local cluster, and cloud. Embed real-time search and analytics into virtually any application. Component-level scalability makes it possible to scale up and down on demand. Milvus can autoscale at a component level according to the load type, making resource scheduling much more efficient.
Welcome to Machine Learning Systems with TinyML. This book is your gateway to the fast-paced world of AI systems through the lens of embedded systems. It is an extension of the course, TinyML from CS249r at Harvard University.
Our aim is to make this open-source book a collaborative effort that brings together insights from students, professionals, and the broader community of applied machine learning practitioners. We want to create a one-stop guide that dives deep into the nuts and bolts of AI systems and their many uses.
An interactive visualization (with simple explanations) of how large language models work.
A database that tries to make it easy to build an LLM-like search database. Super-simple API for loading data and querying it.
You can do everything in your code or run it as a server (chroma run --path /path/to/datastore/on/disk
) and use an HTTP client to interact with it.
This is an amalgam of TTP's on different offensive ML attacks encompassing the ML supply chain and adversarial ML attacks.
It is focused heavily on attacks that have code you can use to perform the attacks right away, rather than a database of research papers. (PoC or GTFO type logic). Generally speaking if it is here I have tested it and it works. The intent is to help red teams and offensive practitioners quickly understand what tool in the toolbox to use to attack ML environments.
This is a living vault. It is very much not a finished list of resources. There are pages that are polished, and some that are little more than placeholders with a few bullet points that I jotted down during conferences or on the fly.
The goal is to organize the attacks in a way that is useful to red team operators rather than useful for say, academics trying to understand adversarial ML.
The "Awesome GPTs (Agents) Repo" represents an initial effort to compile a comprehensive list of GPT agents focused on cybersecurity (offensive and defensive), created by the community. Please note, this repository is a community-driven project and may not list all existing GPT agents in cybersecurity. Contributions are welcome – feel free to add your own creations!
Disclaimer: Users should exercise caution and evaluate the agents before use. Additionally, please note that some of these GPTs are still in experimental test phase.
To escape a deluge of generated content, companies are screening your resumes and documents using AI. This website allows you to inject invisible text into your PDF that will make any AI language model think you are the perfect candidate for the job.
When you select a PDF file, the text in the textbox is inserted on the first page of the document. The text is rendered with minimum font size and opacity, so it is invisible to the human eye. However, it is still visible to AI text recognition algorithms.
This is a freely available online course on neuroscience for people with a machine learning background. The aim is to bring together these two fields that have a shared goal in understanding intelligent processes. Rather than pushing for “neuroscience-inspired” ideas in machine learning, the idea is to broaden the conceptions of both fields to incorporate elements of the other in the hope that this will lead to new, creative thinking.
The course is given in person at the Department for Electrical and Electronic Engineering, Imperial College London, and made freely available online (although without the practical classes).
Each week there are a series of videos to watch on YouTube, and a set of exercises available as a Jupyter notebook that can be run locally or via Google Colab. Students at Imperial College can discuss on Teams, and for everyone else there is an open Discord server.
Github: https://github.com/neuro4ml
Create an ai.txt file for your website to set permissions for text and data mining. Use the toggles to allow or block your content from being used to train AI models. By default all content is opted out. Selecting allow for any content type will let data miners know that they may use content on your website of that media type.
There is no guarantee that anybody will ever obey this, but it can't hurt to try.
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
Pre-trained word vectors can be downloaded.
Transcription of calls from trunk-recorder using OpenAI Whisper.
If you're using OpenAI Whisper, you can use a local GPU to accelerate computations.
Easily configure and deploy a fully self-hosted chatbot web service based on open source Large Language Models (LLMs), such as Llama 2, without the need for knowledge in machine learning or programmation.
Free and Open Source chatbot web service with UI and API. Fully self-hosted, not tied to any service, and offline capable. Forget about API keys! Models and embeddings can be pre-downloaded, and the training and inference processes can run off-line if necessary. Easy to setup, no need to program, just configure the service with a YAML file, and start it with 1 command. No need for GPU, this will work even on your laptop CPU! That said, running on CPUs can be quite slow (up to 1min to answer a documents-base question on recent laptops), so we are working on making a better use of GPU when available.
LangChain models supported:
A site that tracks advances in LLM technology. Software, papers, research, directories... looks like a bit of everything.
EvaDB is a database system for developing AI apps. We aim to simplify the development and deployment of AI apps that operate on unstructured data (text documents, videos, PDFs, podcasts, etc.) and structured data (tables, vector index).
The high-level Python and SQL APIs allow beginners to use EvaDB in a few lines of code. Advanced users can define custom user-defined functions that wrap around any AI model or Python library. EvaDB is fully implemented in Python and licensed under an Apache license.
Ideal for patching into existing AI APIs.