An online book about designing human-centered AI products.
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 45+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
TabNine is the all-language autocompleter. It uses machine learning to provide responsive, reliable, and relevant suggestions by analyzing earlier text in the file being edited. Thus, it's largely language agnostic. Uses .gitignore to only analyze source code. Can be plugged into VS Code, Sublime, vim, Atom, with other text editors on the way. The full version costs $29us for a per-user (not per-machine) license.
I just wanted to let everyone know I have built out a chatbot to help answer mycroft related questions and will be adding a lot to it over the coming weeks allowing for it to answer questions about the project as well as eventually troubleshoot some issues.
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Written in Java 8.
In this post, we’ll be looking at how we can use a deep learning model to train a chatbot on my past social media conversations in hope of getting the chatbot to respond to messages the way that I would.
There are dozens of packages for NLP out there… but you’ll cover all the important bases once you master a handful of them. This is an opinionated guide that features the 5 Python NLP libraries we’ve found to be the most useful.
An open source semantic network (Creative Commons license) for AI and ML development. Semantic nets encode the meaning of words and concepts for information processing systems. Seems to encompass several spoken languages. Curated to avoid stereotypes. Still an active project. Data elements have a concept of external URLs, which link to other data sources with machine-parseable data related to that element.
A step by step process for setting up and using Mycroft for everyday tasks.
Google has opened the source to DeepDream, the neural network image analysis application that made a splash by generating its own images. Here it is, and it tells you what the dependencies are.
In response to the plethora of closed-source and API-only neural network and machine learning software out there the GNU Project has developed Gneural Network, a F/OSS framework which helps the developer build their own projects. Written in C, developed for portability. Has its own scripting language but can be referenced as a library from your own code.
Free datasets made available by Amazon. Stuff like an atlas of the Galactic Plane, NASA NEX data, the Human Microbiome Project, the Enron emails, Freebase, and the Marvel Universe's socialgraph. The Google Books Ngrams corpus is in here, also, alongside the Westbury Lab USENET corpus.
A massive corpus of annotated and tagged ngrams for use in machine learning and NLP. Free to download. It'll take time to grab it all because it's so finely split up. Creative Commons licensed (BY-NC-SA v3). Relationships between parts of speech and words are broken out, also.
An awesome list of resources for building bots.