Easily configure and deploy a fully self-hosted chatbot web service based on open source Large Language Models (LLMs), such as Llama 2, without the need for knowledge in machine learning or programmation.
Free and Open Source chatbot web service with UI and API. Fully self-hosted, not tied to any service, and offline capable. Forget about API keys! Models and embeddings can be pre-downloaded, and the training and inference processes can run off-line if necessary. Easy to setup, no need to program, just configure the service with a YAML file, and start it with 1 command. No need for GPU, this will work even on your laptop CPU! That said, running on CPUs can be quite slow (up to 1min to answer a documents-base question on recent laptops), so we are working on making a better use of GPU when available.
LangChain models supported:
Run, create, and share large language models (LLMs).
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned 20 billion parameter language model, a 6 billion parameter moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. It was trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai. Much more than a model release, this is the beginning of an open source project. We are releasing a set of tools and processes for ongoing improvement with community contributions.
Includes pre-trained network weights.
Building applications with LLMs through composability. Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
Create a ChatGPT like experience over your custom docs using LangChain. This repo can help you use models hosted on HuggingFace for embedding and for text generation.
The simplest, fastest repository for training/finetuning medium-sized GPTs. It's a re-write of minGPT, which I think became too complicated, and which I am hesitant to now touch. Still under active development, currently working to reproduce GPT-2 on OpenWebText dataset. The code itself aims by design to be plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the MIT License]. Incorporated into NLTK.
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 45+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
In this post, we’ll be looking at how we can use a deep learning model to train a chatbot on my past social media conversations in hope of getting the chatbot to respond to messages the way that I would.
A modular Python framework for implementing chatops bots. Aims to make it easy to write new plugins that implement various skills and interfaces. Supports XMPP MUCs. Can be configured from inside of chat, so you don't have to edit a config file and restart the bot. Implements command access control.
An article about writing chatbots for chatops in python. Links to frameworks to help do this.
A corpus of over 520 million words which consists of a massive cross-section of the english language between 1990 and 2015. This corpus is used for NLP study, AI training, and lingustic analysis. There's an online service, you can download various forms of it, and you can add to it if you have access.
Tracery is a procedural generation system for generating text, graphics, and more. Think of it like a procgen framework rather than a tool limited to one particular use case. People use it to generate text and dialogue for games, bots (Twitter, et al), artwork, probably music, recipes, insults... Unusual kinds of games have been developed with it, such as rhythm games and dating sims(!). Worth looking into. There is a version for the Twine game development system and a port to Python (https://github.com/aparrish/pytracery), which would make it very useful to us...
A Markov chain generator in Python that is still maintained. Aims to be very extensible. Can save and restore its models as JSON files. Key methods can be overridden. Can randomly generate sentences, splice models together. Can plug NLP software into it to do more interesting things. Tries very hard to not just regurgitate things from the model; you can tweak this a bit. exocortex bots betafork
A Python module that implements an NLP chatbot. Language agnostic, can be trained to speak any spoken language. Train an instance on a corpus and it will be able to communicate in a conversational manner.
Documentation for Discord's REST API.
Documentation for the Chatterbot corpus file format.
NLP training corpuses for the Chatterbot python module. Contains all of the structured text used to teach the text classifier and semantic analysis engines for the module. All user contributed. Encouages contribution by the community. YAML categories The training data consists of actual conversations and fragments thereof in the file.