Bookmarks
Tag cloud
Picture wall
Daily
RSS Feed
  • RSS Feed
  • Daily Feed
  • Weekly Feed
  • Monthly Feed
Filters

Links per page

  • 20 links
  • 50 links
  • 100 links

Filters

Untagged links
2 results tagged tarpit  ✕   ✕
algernon/iocaine - The deadliest poison known to AI. https://git.madhouse-project.org/algernon/iocaine
Tue 21 Jan 2025 12:51:34 PM PST archive.org

This is a tarpit, modeled after Nepenthes, intended to catch unwelcome web crawlers, but with a slightly different, more aggressive intended usage scenario. The core idea is to configure a reverse proxy to serve content generated by iocaine to AI crawlers, but normal content to every other visitor. This differs from Nepenthes, where the idea is to link to it, and trap crawlers that way. Not with iocaine, where the trap is laid by the reverse proxy.

iocaine does not try to slow crawlers. It does not try to waste their time that way - that is left up to the reverse proxy. iocaine is purely about generating garbage.

This is deliberately malicious software, intended to cause harm. Do not deploy if you aren't fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they will place additional burden on your server, even if you only serve static content. With iocaine, there's going to be increased computing power used. It's highly recommended to implement rate limits at the reverse proxy level, such as with the caddy-ratelimit plugin, if using Caddy.

Entrapment is done by the reverse proxy. Anything that ends up being served by iocaine will be trapped there: there are no outgoing links. Be careful what you route towards it.

rust ai bots defense tarpit
Nepenthes https://zadzmo.org/code/nepenthes/
Sat 18 Jan 2025 08:48:42 PM PST archive.org

This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.

WARNING: THIS IS DELIBERATELY MALICIOUS SOFTWARE INTENDED TO CAUSE HARMFUL ACTIVITY. DO NOT DEPLOY IF YOU AREN'T FULLY COMFORTABLE WITH WHAT YOU ARE DOING.

lua llm spiders tarpit random generators
6961 links, including 440 private
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Theme by kalvn