STU is a TUI explorer application for Amazon S3 (AWS S3) written in Rust. Basically, you can use it in the same way as the AWS CLI. In other words, if the default profile settings exist or the environment variables are set, you do not need to specify any options.
Czkawka (tch•kav•ka (IPA: [ˈʧ̑kafka]), "hiccup" in Polish) is a simple, fast and free app to remove unnecessary files from your computer.
Krokiet ((IPA: [ˈkrɔcɛt]), "croquet" in Polish) same as above, but uses Slint frontend.
Amazingly fast - due to using more or less advanced algorithms and multithreading. Cache support - second and further scans should be much faster than the first one. CLI and GUI (gtk4 or slint). Multiple tools for flexibility.
This service is a search engine that looks for public archives at different File Sharing Services that are not so well known. These services do not offer a simple option to find files hosted on their servers.
Programatically sync and edit BookStack pages. Useful for text editor integrations (an emacs PoC implementation is included).
Pages in the configured Bookstack wiki will be downloaded and written to Markdown files in book/page.md
format. Local Markdown files that don't exist in the wiki will be uploaded as new pages in a book. When a local file is deleted the wiki page will be deleted if their last_modified dates are the same. Wiki pages that are deleted will cause their local counterparts to be deleted as well. Out-of-synch pages (i.e., the local file and wiki page have been edited independently and their edits do not line up) will not be synched without the --force
option.
Chunking documents is a challenging task that underpins any RAG system. High quality results are critical to a sucessful AI application, yet most open-source libraries are limited in their ability to handle complex documents. Open Parse is designed to fill this gap by providing a flexible, easy-to-use library capable of visually discerning document layouts and chunking them effectively.
Visually driven. Parses Markdown. Can analyze data tables by extracting them into Markdown tables.
Duperemove is a simple tool for finding duplicated extents and submitting them for deduplication. When given a list of files it will hash their contents on an extent by extent basis and compare those hashes to each other, finding and categorizing extents that match each other. Optionally, a per-block hash can be applied for further duplication lookup. When given the -d option, duperemove will submit those extents for deduplication using the Linux kernel FIDEDUPRANGE ioctl, which only applies to btrfs and xfs.
Duperemove can store the hashes it computes in a 'hashfile'. If given an existing hashfile, duperemove will only compute hashes for those files which have changed since the last run. Thus you can run duperemove repeatedly on your data as it changes, without having to re-checksum unchanged data.
Requrires kernel v3.13 or later.
It's in the Arch extra package repository.
XFiles is a file manager for X11. It can navigate through directories, show icons for files, select files, call a command to open files, generate thumbnails, and call a command to run on right mouse button click. Supports running scripts when the user selects a file.
This is an old-school X11-style X application. No toolkit, no desktop environment, no skinning, just a file manager.
CryFS encrypts your files, so you can safely store them anywhere. It works well together with cloud services like Dropbox, iCloud, OneDrive and others. Easy to setup and works with a lot of cloud storage providers. It runs in the background - you won't notice it when accessing your files in your daily workflow. Your data only leaves your computer in encrypted form. File contents, metadata and directory structure are all secure from someone who hacked your cloud. Released under LGPL.
Can be used locally but that's not its primary use case.
Two directories: A basedir that holds the encrypted files, and a mountdir which you interact with. The basedir is what gets stored remotely, synced, or whatever. Note: Not safe for concurrent access!
Files are split into equal size blocks, encrypted individually. Metadata and directory structures are also represented as those blocks for obfuscation. Block cipher used, random key generated, key encrypted with passphrase.
In Apt, Pacman, Homebrew, Nix repositories.
Default encryption algorithm: XChaCha20-Poly1305, scrypt for key derivation.
Github: https://github.com/cryfs/cryfs
rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc. rga will recursively descend into archives and match text in every file type it knows. rga works with adapters that adapt various file formats; you can add your own.
A utility that lets you query CSV, JSON and Parquet files with regular SQL statements. If DuckDB is okay with it, it'll run. Has both a fire-and-forget CLI and an interactive TUI.
An experimental semantic search site for vintage computing files stored at the Internet Archive.
A free, open-source, multi-platform application for sending files and messages, using the codec2 HF modems.
Johnny.Decimal is a system to organise your digital life. It’s designed to help you find things quickly, with more confidence, and less stress. In real life, if you stored your stuff in piles of badly-labelled boxes you’d never find anything again. If you put those boxes in boxes, in boxes, you’d never know which box to open to find the next box. It would be chaos. But I just described how you save your computer files.
Imagine your computer as a physical storage space. We can’t put everything on the floor, so we buy some shelves. If we had a limitless number of shelves, we wouldn’t know which one to look on when we wanted to find something. So we get ten shelves. We decide to dedicate each shelf to an area of our life.
freezeFS.py is a utility program that runs on a PC and converts an arbitrary folder, subfolder and file structure into a Python source file. The generated Python file can then be frozen as bytecode into a MicroPython image together with the Virtual File System driver vfsfrozen.py.
When the generated Python file is imported, the file structure is mounted with os.mount() as a read only Virtual File System, which can be accessed on the microcontroller with regular file operations such as open in "r" or "rb" mode, read, readinto, readline, seek, tell, close, listdir, ilistidr, stat.
If the deploy option is used, the files and folders of the frozen files are copied to the standard flash file system. This enables installing configuration and data files when booting the MicroPython image the first time.
An important topic is that opening files in "r" mode requires to buffer the file in RAM. However, many libraries such as web servers and json support reading text modes in "rb" mode, and no overhead is incurred.
Syncthing is a continuous file synchronization program. It synchronizes files between two or more computers in real time, safely protected from prying eyes. Your data is your data alone and you deserve to choose where it is stored, whether it is shared with some third party, and how it's transmitted over the Internet.
Github: https://github.com/syncthing/syncthing
Official APT repository: https://apt.syncthing.net/
The RapidBlock Project is a grassroots initiative to make Fediverse domain blocking more effective through collective action.
Moderation on the Fediverse is unevenly distributed. Some instance admins devotedly follow the #FediBlock hashtag, blocking abusive servers within hours of their first appearance on the network. Others wait until their own users file a report. Still others do nothing at all.
This uneven distribution of moderation allows abusive instances to do significant psychological harm. Abusive instances are a fast-moving target; setting up a new Mastodon instance takes only an hour or two, as does resetting an instance to give it a new domain name. This gives abusers a substantial time window in which there are a lot of available victims to target.
The RapidBlock Project is something different: humans are in the loop at every step of the decision-making process, and the only thing that is automated is the actual propagation of the decisions. Moderation is hard, especially good moderation. Moderation is a full-time job, and many Fediverse admins aren't taking up that mantle of responsibility. We are trying to build a central moderation team with a clear, published rationale for our blocking criteria and a clear dispute process for remediating mistaken blocks.
Utility code that can extract an AppleDouble file's contents and extract the individual resources from its resource fork segment.
This is useful if you have stored Mac files on FAT32-formatted floppy disks or in macOS X ZIP archives and want to (further) extract the data from them on a non-Mac operating system.
The documentation for Virus Total's REST API.
Free usage tier:
A tool to generate binary polyglots (files that are valid with several file formats).