You'll be familiar with web bugs, the transparent images which track when someone opens an email. They work by embedding a unique URL in a page's image tag, and monitoring incoming GET requests.
Imagine doing that, but for file reads, database queries, process executions or patterns in log files. Canarytokens does all this and more, letting you implant traps in your production systems rather than setting up separate honeypots.
Canarytokens are a free, quick, painless way to help defenders discover they've been breached (by having attackers announce themselves.)
Includes web bugs, DNS hostnames, fake AWS keys, login certificates, commands, documents, API keys, and more.
They make and sell replacement and upgraded game pieces for board games of all kinds.
textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after. Abstracts away the boilerplate for the stuff you actually care about.
Quickstart: https://chartbeat-labs.github.io/textacy/getting_started/quickstart.html
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 45+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
Somebody I know on Mastodon threw together a quick utility that picks keywords out of documents you feed it and throws them into a Neo4j graph database for indexing. Written in rust.
prose is a natural language processing library (English only, at the moment) in pure Go. It supports tokenization, segmentation, part-of-speech tagging, and named-entity extraction. Parses English text, can also natively extract e-mail addresses, hashtags, @mentions, URLs, and emoticons. Can tag segmented and analyzed text by part of speech, including punctuation marks. Can identify types of entities (people, places). Also has the option to build and train custom models.