The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.
There is also an API built around this module.
A utility that lets you query CSV, JSON and Parquet files with regular SQL statements. If DuckDB is okay with it, it'll run. Has both a fire-and-forget CLI and an interactive TUI.
ZKDocs provides comprehensive, detailed, and interactive documentation on zero-knowledge proof systems and related primitives.
At Trail of Bits, we audit many implementations of non-standardized cryptographic protocols and often find the same issues. As we discovered more instances of these bugs, we wanted to find a way to prevent them in the future. Unfortunately, for these protocols, the burden is on the developers to figure out all of the low-level implementation details and security pitfalls.
We aim to be both self-contained and comprehensive in the topics related to zero-knowledge proof systems. We describe each protocol in great detail, including all necessary setup, sanity-checks, auxiliary algorithms, further references, and potential security pitfalls with their associated severity. The protocol descriptions are interactive, letting you modify variable names. This allows you to match the variable names in ZKdocs’ specification to the variable names in your code, making it easier to find bugs and missing assertions.
marimo is an open-source reactive notebook for Python - reproducible, git-friendly, executable as a script, and shareable as an app.
Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing notebook state. marimo's reactive UI elements, like dataframe GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. marimo notebooks are pure Python and stored as .py files. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state — delete a cell and marimo deletes its variables while updating affected cells.
Collaborate on notebooks with git: small changes yield small diffs. Goodbye JSON, hello Python! Want to share outputs? Export to static HTML, or serve your notebook as a web app with the marimo CLI. The marimo editor comes with GitHub Copilot, autocomplete, hover tooltips, vim keybindings, code formatting, debugging panels, and extensive hotkeys. marimo also ships with a CLI, a library, and a VS Code extension. Learn more at our docs.
A tool for exploring a docker image layer by layer, the contents of each layer, what changs in between each layer, and discovering ways to shrink the size of your Docker/OCI image. Additionally you can run this in your CI pipeline to ensure you're keeping wasted space to a minimum.
Plots the vertical atmospheric structure as plots between altitude and temperaturem and altitude vs. wind speed using aircraft data collected from the dump1090-fa ADSB decoder. The calculations are similar to those used in the tar1090 package. Reads the JSON data from dump1090-fa/history_xx.json
files.
While browsing a variety of websites, I kept finding that the same financial metric can greatly vary per source and so do the financial statements reported while little information is given how the metric was calculated.
This is why I designed the FinanceToolkit, this is an open-source toolkit in which all relevant financial ratios (100+), indicators and performance measurements are written down in the most simplistic way allowing for complete transparency of the calculation method. This allows you to not have to rely on metrics from other providers and, given a financial statement, allow for efficient manual calculations. This leads to one uniform method of calculation being applied that is available and understood by everyone.
The Finance Toolkit is complimented very well with the Finance Database, a database that features 300.000+ symbols containing Equities, ETFs, Funds, Indices, Currencies, Cryptocurrencies and Money Markets. By utilising both, it is possible to do a fully-fledged competitive analysis with the tickers found from the FinanceDatabase inputted into the FinanceToolkit.
Curated list of resources for traders, including tools, websites, and courses related to trading in various financial markets such as stocks. It serves as a valuable reference for traders who are looking to expand their knowledge and improve their skills.
Balcony is a modern CLI tool that with some killer features:
Balcony uses read-only operations, it does not take any action on the used AWS account.
IVRE (Instrument de veille sur les réseaux extérieurs) is a network recon framework, including tools for passive and active recon. IVRE can use data from numerous passive sensors and active scanning tools. You can think of it as a self-hosted and fully-controlled alternative to Shodan / ZoomEye / Censys, GreyNoise, and more. In the AUR.
NETINT
A curated dynamic collection of websites offer a interesting and interactive experience for users. With real-time data (most of it), engaging maps, and visually stunning data visualizations, this collection is a treasure for enthusiasts of air industry, space, history, world statistics and more!
Open-source, self-hosted alternative to CARTO and Foursquare Studio for data scientists, analysts and engineers. State-of-the art WebGL-powered map visualizations and spatial analysis based on deck.gl. Tested at 100Mb and 1M rows. Efficient query result caching on Amazon S3 or Google Cloud Storage. Side-by-side SQL editor and support for CSV and GeoJSON file uploads.
Small program that computes and plots spectrograms, either in a live window or to disk, with support for stdin input. In theory, you can run any data through it and generate a spectrogram. Read the manpage.
In the AUR (but you want specgram-git because specgram has a bug and won't compile!)
Providing a suite of API endpoints to extract alternative data. Social sentiment analysis of companies, file analysis, insider trade retrieval and analysis, analyst ratings, ESG scoring.
Accessible through RapidAPI.
Free trial, 100 API calls/month. 2 requests/second
Github: https://github.com/sankalpbhatia20/AltAPI-opensource
Requires Postgres as its back-end if you self-host.
Patch this into Searx?
University of Oregon Route Views Project
The University's Route Views project was originally conceived as a tool for Internet operators to obtain real-time BGP information about the global routing system from the perspectives of several different backbones and locations around the Internet. Although other tools handle related tasks, such as the various Looking Glass Collections (see e.g. TRACEROUTE.ORG), they typically either provide only a constrained view of the routing system (e.g., either a single provider, or the route server) or they do not provide real-time access to routing data.
While the Route Views project was originally motivated by interest on the part of operators in determining how the global routing system viewed their prefixes and/or AS space, there have been many other interesting uses of this Route Views data. For example, NLANR has used Route Views data for AS path visualization and to study IPv4 address space utilization (archive). Others have used Route Views data to map IP addresses to origin AS for various topological studies. CAIDA has used it in conjunction with the NetGeo database in generating geographic locations for hosts, functionality that both CoralReef and the Skitter project support.
Automated decoding of encrypted text without knowing the key or ciphers used. Ares is the next generation of decoding tools, built by the same people that brought you Ciphey. We fully intend to replace Ciphey with Ares.
Ares is fast. Very fast. Other decoders such as Ciphey require advance artifical intelligence to determine which path it should take to decode (whether to try Caesar next or Base64 etc). Ares is so fast we don't need to worry about this currently. For every 1 decode Ciphey can do, Ares can do ~7. That's a 700% increase in speed.
There are 2 main parts to Ares, the library and the CLI. The CLI simply uses the library which means you can build on-top of Ares.
Ares currently supports 16 decoders and it is growing fast. Ciphey supports around ~50, and we are adding more everyday.
Most people find this website because they are disturbed by an unusual unidentified low-frequency sound that scientists now call the Worldwide Hum. The classic description is that The Hum sounds like a car or truck engine idling outside your home or down the block. Some people describe it as a low rumbling or droning sound. It is typically perceived louder at night than during the day, and louder indoors than outdoors. The sound can usually be masked by background noise, such as a fan or keeping the radio on. We estimate that 2-4% of the global population can experience this phenomenon under certain conditions.
The typical characteristics of the World Hum are that sufferers hear it wherever they go, and that other people in the same place and time cannot hear it. This may be a type of otoacoustic phenomenon generated internally in the brain and auditory organs, through mechanisms which are not yet fully understood, but for which this project tries to find answers and possible remedies.
The entire dataset can be downloaded as a CSV file. There is also a project whitepaper for people to gather more data for analysis.
A collection of several hundred online tools for OSINT.
A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM. Full featured hex editor. Byte patching. Patch management. Copy and paste byte sequences. String and hex pattern highlighting. Pattern matching DSL. Huge file support. Can disassemble 16 different architectures' code and counting.
The documentation for Virus Total's REST API.
Free usage tier: