WarcDB is a an SQLite-based file format that makes web crawl data easier to share and query. It is based on the standardized Web ARChive format, used by web archivers.
An online three-card tarot draw web page. Javascript but not node.js.
Online demo: https://lmorchard.github.io/tarot-thing/
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly and makes it possible to install and run Python packages in the browser with micropip. Any pure Python package with a wheel available on PyPI is supported. Many packages with C extensions have also been ported for use with Pyodide. Comes with a robust Javascript ⟺ Python foreign function interface so that you can freely mix these two languages in your code with minimal friction. This includes full support for error handling (throw an error in one language, catch it in the other), async/await, and much more.
When used inside a browser, Python has full access to the Web APIs.
Github: https://github.com/pyodide/pyodide
Online REPL console: https://pyodide.org/en/stable/console.html
uBlacklist subscription list for developers.
Subscribe this list to block useless websites from Google Search results, such as machine-translated Stack Overflow clones.
Transforms tkinter, Qt, Remi, WxPython into portable people-friendly Pythonic interfaces, especially if you primarily do CLI tools. Tries to make it easy to build GUIs for applications, because ordinarily the process sucks. Supports several toolkits, including QT, WxPython, and Remi (if you want to turn something into a webapp); you can switch between those toolkits with a single line. No callback functions, that's all handled for you. Has a built-in debugger.
A company that does web scraping for you. Automatic retries, lots of proxies, geolocation, CAPTCHA bypass (eh?), Javascript support. Has a library of scrapers for different online services.
The free tier has only 1000 API calls. Multiple tiers of features.
CORS (Cross-Origin Resource Sharing) is hard. It's hard because it's part of how browsers fetch stuff, and that's a set of behaviours that started with the very first web browser over thirty years ago. Since then, it's been a constant source of development; adding features, improving defaults, and papering over past mistakes without breaking too much of the web.
Anyway, I figured I'd write down pretty much everything I know about CORS, and to make things interactive, I built an exciting new app.
Raindrop.io is the best place to keep all your favorite books, songs, articles or whatever else you come across while browsing.
We're not trying to reinvent the wheel; we're working on a tool that does everything you expect from a modern bookmark manager.
Collections of links. Folksonomy tags. Filters. Finds duplicates and broken links for you. Full text search. Automatically makes copies of every page you bookmark to prevent link rot.
Unlimited bookmarks, collections, and devices indefinitely at the free level. Additional features (probably collaboration) at paid tiers.
BadWolf is a minimalist and privacy-oriented WebKitGTK+ browser.
Privacy-oriented - No browser-level tracking, multiple ephemeral isolated sessions per new unrelated tabs, JavaScript off by default.
Minimalist - Small codebase (~1 500 LoC), reuses existing components when available or makes them available.
Customizable - WebKitGTK native extensions, Interface customizable through CSS.
Powerful & Usable - Stable User-Interface; The common shortcuts are available, no vi-modal edition or single-key shortcuts are used.
No annoyances - Dialogs are only used when required (save file, print, …), javascript popups open in a background tab.
Git repo: https://hacktivis.me/git/badwolf/
In the AUR.
A multithreaded hyperlink checker that crawls a site and looks for 404s. Unfortunately, not maintained anymore and written in Python2. Still useful.
Join the most popular Internet of Things platform with free Cloud, iOS and Android mobile apps, Web dashboard, and Machine Learning. Has mobile apps for interacting with interfaced devices. Assemble custom apps with a drag-and-drop builder. If it's networked and you can mess with it, you can get it talking to Blynk.
If you want to use their service, developer accounts are free but are limited to five (5) devices at a time. Paid service starts at $415us.
Open source: You can download the server's source code and run it yourself if you want. It's written in Java.
Wiby is a search engine for older style pages, lightweight and based on a subject of interest. Building a web more reminiscent of the early internet.
Futuristic sci-fi and cyberpunk graphical user interface framework for web apps. If you ever wanted to build a theme that looks like JARVIS or something out of Bladerunner, this seems like a good place to start.
Github repo: https://github.com/arwes/arwes
Winamp 2 reimplemented for the browser in Javascript. Load your local MP3s into the player (in your browser) and have a good time. Has the visualizations and even supports the old skins. Throw it on a web server and you're good to go.
Used by the Internet Archive as one of its online media players.
List of libraries, tools and APIs for web scraping and data processing.
A free guide to HTML5 <head> elements.
Buster is a browser extension which helps you to solve difficult captchas by completing reCAPTCHA audio challenges using speech recognition. Challenges are solved by clicking on the extension button at the bottom of the reCAPTCHA widget.
reCAPTCHA challenges remain a considerable burden on the web, delaying and often blocking our access to services and information depending on our physical and cognitive abilities, our social and cultural background, and the devices or networks we connect from.
The difficulty of captchas can be so out of balance, that sometimes they seem friendlier to bots than they are to humans.
The goal of this project is to improve our experience with captchas, by giving us easy access to solutions already utilized by automated systems.
A collection of awesome web crawl, scraping, and spidering projects in different languages.
A remarkably streamlined and simple to use system for using AI and ML models to interact with data. Build interactive data analyses with just a very little code. The demo shows most of everything you need to be productive. Hot reloading; change some of the Python code in your research script, the display updates. Don't need to mess with HTML and Javascript, just a text editor and a web browser.
A very simple static homepage for your server. No build process involved. Edit a YAML file, add titles, icons, and links to the services running on the server, load it in a browser. Unusually pretty, unusually handy. Never thought I'd like it.