FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
Pre-trained word vectors can be downloaded.
Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on proprietary providers such as Google or Azure to perform translations. Instead, its translation engine is powered by the open source Argos Translate library.
Supports per-user limit quotas, e.g. you can issue API keys to users so that they can enjoy higher requests limits per minute (if you also set --req-limit). By default all users are rate-limited based on --req-limit, but passing an optional api_key parameter to the REST endpoints allows a user to enjoy higher request limits. To use API keys simply start LibreTranslate with the --api-keys option.
There are also F/OSS mobile clients for Android and browser plugins.
A curated list of delightful Conversational AI resources.
Reddit Persona is a python module that extracts personality insights, sentiment & interests from a user account. Support for subreddit analysis not working due to praw update v3--> v5, fix incoming ).
Text is collected via reddit's python API, praw, and NLP is powered by the indico.io API.
Intellexer™ is a linguistic platform developed by EffectiveSoft.
Our API and SDK incorporate powerful linguistic tools for analyzing text in natural language. We encourage both developers and integrators to use them for improving existing or creating new Document/Knowledge management systems.
Our API and SDK provide effective capabilities for the development of various semantics-based solutions. The solutions can vary in the number and algorithmic complexity of the linguistic instruments used, depending on the customer's needs.
Free API key.
High performance NLP models as a service. Pre-trained. You can upload and run your own spaCy models as well. Seems to be GPU accelerated on the back-end because they're an nVidia partner.
Named entity recognition, classification, summarization, question in context answering, sentiment analysis, part of speech tagging.
Free tier: All pre-trained models, 3 API requests per minute.
Starter tier: All pre-trained models, 15 requests per minute, $39us/month
Lingua Franca is our multilingual Natural Language Processing library. It allows Mycroft to both understand and respond with naturally expressed entities such as numbers, dates and times. Stand-alone Python module. Ready-to-use and currently has support for Danish, Dutch, English, French, German, Hungarian, Italian, Portuguese, Spanish, and Swedish. Heuristic parsing routines to extract numbers, dates, times, or durations from a spoken language transcription. Natural language formatters for numbers, dates, times and durations as well as utilities for working with lists in multiple languages. Can reformat figures so they can be better pronounced by a synthesizer. Extract information from text to use in figuring out what the user wants and grab the stuff needed to do it.
Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after. Abstracts away the boilerplate for the stuff you actually care about.
A bot implemented as a Github App which analyzes the interactions a user has had elsewhere on Github and uses sentiment analysis to figure out how toxic the user is likely to be in their interactions with your project.
Uses the Probot framework.
A F/OSS natural language translation system that seems to want to give Google Translate a run for its money. The corpuses used for training appear to be crowdsourced, and I think you can download the trained models on their own. Aims to be self-hosted.
Installation docs: http://wiki.apertium.org/wiki/Installation
An NLP deep learning toolkit for building training pipelines. Tries to minimize the effort for constructing the training and inference stages. Defines modular building blocks of neural network components, and a suite of NLP models. The end goal is to make building a neural network as easy as playing with Legos. Supports English and Chinese.
A deep learning NLP modeling framework based on PyTorch. Text classifiers, sequence taggers, joint intent-slot models.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the MIT License]. Incorporated into NLTK.
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 45+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
Snips NLU (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant. The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.
A fuzzy string matching module for Python. Seems fairly smart, designed to be practical. Can also use python-levenstein for additional matching accuracy if available. Looks very helpful for searching on arbitrary strings. Does the statistical analysis for you (percentage probability of a good match).
5049 links, including 361 private