Once we start editing DNA on a large scale, we will need to keep track of what we do, revision histories, comment the new genes and add copyright notices. This is a suggested standard of entering ASCII information into the genome:
We will use 4-base codons to encode 7-bit ASCII. I know it is a bit primitive, but I think it does well enough and we might want to use the extra bit (see below). Each base codes two bits, and the complementary base codes the inverse:
A: 00 G: 01 C: 10 T: 11
Thus each character will be coded as four bases, read in the canonical 5'->3' direction.
The letters 'DNA' will thus become
01000100 01001110 01000001
G A G A G A T C G A A G
or GAGAGATCGAAG.
The problem when reading a DNA string is: which strand should we read? If we read the complementary strand, we will get an inverted string backwards. But since we use 7-bit ascii, we can test to see if every 8th bit is a one or zero, and deduce which side we are on. The reading process thus tries out the eight starting frames, and chooses the one which gives an unbroken stretch of ones or zeros. If the stretch are zeros, the bases are read and converted, if they are ones they are read to the end of the message, inverted and reversed. Note that some errors can become detectable this way, as interruptions of the stretches of similar bits.
To delineate the comments, we need markers. A standard could be the sequence corresponding to "COMMENT COMMENT COMMENT..." repeated a number of times (we don't want to use a long stretch of similar bases, since it would influence the bending of DNA, which might lead to unwanted effects).
A problem is that we might accidentally create active regions in the DNA with these comments; ideally we should choose a coding that minimizes the biological effects of the comment. Methylating the cytosine bases will also inactivate the comment. If it can be marked as an intron it could also be placed inside exons, making sure the comment will follow the gene it belongs to.
Thanks to John D. Gleason for the methylating and intron ideas.
Acrossword is a small async wrapper around the SentenceBERT library. It has a convenient object-oriented API with two main purposes:
semantic search
zero-shot text classification
It's useful if you want to avoid larger bloated libraries with capabilities you don't need, and comes with zero fuss.
A list of command line tools for manipulating structured text data.
Transform any image into a prime number that looks like the image if glanced upon from far away.
buku is a powerful bookmark manager written in Python3 and SQLite3. When I started writing it, I couldn't find a flexible command-line solution with a private, portable, merge-able database along with seamless GUI integration. For those who prefer the GUI, the bukuserver sub-application exposes a browsable front-end on a local web host server.
buku can auto-import bookmarks from your browser(s) or fetch the title and description of a bookmarked url from the web. You can use your favourite editor to compose and update bookmarks.
Multiple search options, including regex and a deep scan mode.
Here's how to proxy the server behind nginx: https://github.com/jarun/buku/blob/master/docker-compose/data/nginx/nginx.conf
Profanity is a console based XMPP client written in C using ncurses and libstrophe, inspired by Irssi. Cross platform, lightweight, very handy. Takes a bit of fiddling to manage multiple accounts, though.
MicroWeb is a web browser for DOS! It is a 16-bit real mode application, designed to run on minimal hardware. Targeted at the Intel 8088 or later. CGA compatible (backwards compatible with EGA and VGA). Mouse not required. No HTTPS, CSS, or Javascript.
Somebody wrote a clone of vim entirely in Python. Already has many of the features of mainline Vim because it's easier to write them in Python than it is in C. Can integrate additional functionality (like Jedi autocompletion of Python) by installing additional Python modules. Self-hosting. PoC for the prompt_toolkit Python module.
High performance NLP models as a service. Pre-trained. You can upload and run your own spaCy models as well. Seems to be GPU accelerated on the back-end because they're an nVidia partner.
Named entity recognition, classification, summarization, question in context answering, sentiment analysis, part of speech tagging.
Free tier: All pre-trained models, 3 API requests per minute.
Starter tier: All pre-trained models, 15 requests per minute, $39us/month
Recovers passwords from pixelized screenshots.
This implementation works on pixelized images that were created with a linear box filter.
In this article I cover background information on pixelization and similar research.
Requires that the user supply a De Bruijn sequence of characters that could be expected to appear in the obfuscated text.
It won't be perfect but it'll probably get you within spitting distance.
ART is a Python lib for text converting to ASCII art. Turn regular old text into rendered ASCII art with a single function. Also generates textmoji from names (aprint("butterfly")
). Random art (randart()
) is also possible. You can also specify the font used and how it's decorated (if you want). Can even be used as a CLI tool.
Bombadillo is a non-web browser, designed for a growing list of protocols operating outside of the web. This includes Gopher, Gemini, Finger, and your local file system. Other protocols are available as add-ons. Think Lynx, but for everything else.
Source code: https://tildegit.org/sloum/bombadillo
Olipy is a Python library for artistic text generation. Unlike most software packages, which have a single, unifying purpose. Olipy is more like a set of art supplies. Each module is designed to help you achieve a different aesthetic effect. Different kinds of text generators and corruptors. Generates and riffs on prerecorded dialogs. Generate different kinds of names and titles.
Decode UTF-8 into ASCII and vice versa.
A perverse way to make your HTML look like markdown, purely via CSS.
Use the markdown.css file to make regular HTML look like plain-text markdown. No JavaScript hacks are needed.
spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 45+ languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integration. It's commercial open-source software, released under the MIT license.
An extremely high resolution graphics-to-text art utility/library/converter. Aims to be much better than aalib or caca. You can even watch videos rendered into text in realtime. Tries to be cross-platform.
A text-based signal and spectrum analyzer for RTL-SDR radios. Everything is done in ASCII in realtime. Allows you to tune the radio as well as watch it.
This source code is an implementation of the TextRank algorithm (Automatic summarization) on PHP7 strict mode. It can summarize a text, article for example to a short paragraph. Before it would start the summarizing it removes the junk words what are defined in the Stopwords namespace. It is possible to extend it with another languages.
An animated avatar that responds to text field interactions.