audiowaveform is a C++ command-line application that generates waveform data from either MP3, WAV, FLAC, Ogg Vorbis, or Opus format audio files. Waveform data can be used to produce a visual rendering of the audio, similar in appearance to audio editing applications.
Waveform data files are saved in either binary format (.dat) or JSON (.json). Given an input waveform data file, audiowaveform can also render the audio waveform as a PNG image at a given time offset and zoom level.
The waveform data is produced from an input audio signal by first combining the input channels to produce a mono signal. The next stage is to compute the minimum and maximum sample values over groups of N input samples (where N is controlled by the --zoom command-line option), such that each N input samples produces one pair of minimum and maximum points in the output.
In the AUR.
BookBrainz is a project to create an online database of information about every single book, magazine, journal and other publication ever written. We make all the data that we collect available to the whole world to consume and use as they see fit. Anyone can contribute to BookBrainz, whether through editing our information, helping out with development, or just spreading the word about our project.
If you have a Musicbrainz account, it uses the same credentials.
Doclytics is a straightforward Rust-based tool that integrates with the paperless-ngx API to fetch and update document metadata. It primarily leverages a local language model, ollama, to extract and generate metadata for documents stored in a Paperless document library. The tool uses reqwest for making HTTP requests and serde_json for handling JSON data, ensuring seamless communication with the Paperless API and efficient data processing.
By interfacing directly with ollama, Doclytics automates the extraction of specified metadata from documents, utilizing the local LLM's capabilities to analyze document content and produce the required metadata in a JSON format. This metadata is then used to update the respective documents in the Paperless library, aiming to improve document organization and retrievability without overly complex processes or configurations.
freedb.org announced its services would shutdown entirely on 2020-03-31. Many legacy software applications have FreeDB/CDDB support built-in for fetching CD metadata such as artist, title, and track names. To keep these apps functioning in their fully glory, this is meant as a drop-in replacement for FreeDB/CDDB.
This application does not use the original CDDB database, but fetches disc information from MusicBrainz which has an open API and excellent up-to-date disc metadata.
You can use their public service as documented, or stand up and run your own. Written in Euphoria, a language I've never heard of.
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone. The Common Crawl corpus contains petabytes of data collected since 2008. It contains raw web page data, extracted metadata and text extractions. The Common Crawl dataset lives on Amazon S3 as part of the Amazon Web Services’ Open Data Sponsorships program. You can download the files entirely free using HTTP(S) or S3. Our goal is to democratize the data so everyone, not just big companies, can do high quality research and analysis.
A Community-driven, FLOSS-licensed Wiki documenting unsolicited requests, metadata leaks, and privacy-invasive features in applications. Privacy is a complex topic, and can be very context specific. I, NetNauseam, believe the best approach to privacy is to simply stop all applications from leaking data and metadata to and through the network.
All information in this repo should be viewed as an opinion, not a fact, and I do not claim your privacy will be improved in any way by following any of these recommendations. These are complex topics with many edge cases and any guarantees are difficult, if not impossible, to make.
Source code: https://codeberg.org/netnauseam/wiki/
binlist.net is a public web service for looking up credit and debit card metadata.
The first 6 or 8 digits of a payment card number (credit cards, debit cards, etc.) are known as the Issuer Identification Numbers (IIN), previously known as Bank Identification Number (BIN). These identify the institution that issued the card to the card holder.
Requests are throttled at 10 per minute with a burst allowance of 10. If you hit the speed limit the service will return a 429 http status code.
Someone also assembled a directory of metadata tags used by Pelican in its templates.
pText is a pure python library to read, write and manipulate PDF documents. It represents a PDF document as a JSON-like datastructure of nested lists, dictionaries and primitives. Extract and edit metadata, extract and edit text and images, add annotations.
Seems like it would be useful for a large-scale indexing effort.
With Meta Tags you can edit and experiment with your content then preview how your webpage will look on Google, Facebook, Twitter and more!
MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.
A personal file system indexing and search application. Part of the Gnome desktop. Indexes file contents, metadata, and location to better help you find things. Also allows you to do your own tagging of stuff it keeps track of. Uses D-BUS for IPC and SPARQL for search. Uses multiple ontologies for different kinds of files (including multimedia content).
The MAT2 is a set of tools for scrubbing the metadata (data about the origin and nature of files) from document files, images, audio recordings, and more. This data can be dangerous if anonymity is important to you.
Supports PNG, JPG, DOC, DOCX, PPT, ODT, TAR, BZ2, GZ, MP3, .torrent files, and too many others to list here.
How matrix algebra can be used on a table of names and membership checkmarks to develop a detailed social connection network.
Mediainfo is a utility which parses the metadata of media files and tells you the file and/or container format and CODECs used.
A utility for turning fanfic from online archives into ebooks to load into a tablet, phone, or reader. Written in python. Consists of a plugin for Calibre, a CLI utility, and a webservice.
A howto for activists that describes how to capture and archive video footage. Includes archival of metadata, keeping files intact, raw and edited video concerns, organization, storage concerns, cataloging, sharing, and preservation. Treats it in a verifiable, library-like manner. Can be downloaded, too.
python module for extracting text from different documents. Can also be used as a CLI utility. Can work with text-based formats like CSV, JSON, and HTML. Can work with binary formats like MS Word, MP3, and PDF. The list is fairly extensive.