This is a tarpit, modeled after Nepenthes, intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by iocaine to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with iocaine, where the trap is laid by the reverse proxy.
iocaine does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. iocaine is purely about generating garbage.
This is deliberately malicious software, intended to cause harm. Do not deploy if you aren't fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they will place additional burden on your server, even if you only serve static content. With iocaine, there's going to be increased computing power used. It's highly recommended to implement rate limits at the reverse proxy level, such as with the caddy-ratelimit plugin, if using Caddy.
Entrapment is done by the reverse proxy. Anything that ends up being served by iocaine will be trapped there: there are no outgoing links. Be careful what you route towards it.
This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.
It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.
WARNING: THIS IS DELIBERATELY MALICIOUS SOFTWARE INTENDED TO CAUSE HARMFUL ACTIVITY. DO NOT DEPLOY IF YOU AREN'T FULLY COMFORTABLE WITH WHAT YOU ARE DOING.