Transcribe almost every language. Fully offline transcription, no data ever leaves your device. User friendly design (for a change). Transcribe audio or video. Option to transcribe audio from popular websites (YouTube, Vimeo, Facebook, Twitter and more!) Batch transcription. Supports multiple document formats. Plug it into an LLM and you can get summaries. Translate to English from any language. Optimized for GPUs. Has a CLI tool. Supports custom models.
This script which continuously records 1-2 second snippets and saves them as files named 00000000.wav, 00000001.wav and so on. After every 20 seconds we concatenate the files into a larger one named 00000000_00000035.wav and analyse the file for periods of silence lasting 500ms or longer. We then snip the file back into smaller chunks which be believe contain full sentences. These files are saved as 00000000_00000035_x_y.wav where x and y represent the start and end period of the audio in ms. We then pass this .wav file into the offline OpenAI-Whisper speech recognition library. Finally we clear up any old .wav files that have been processed and repeat the process.
Speakr is a personal, self-hosted web application designed for transcribing audio recordings (like meetings), generating concise summaries and titles, and interacting with the content through a chat interface. Keep all your meeting notes and insights securely on your own server. This includes self-hosting your own LLM models to do the heavy lifting, so you don't have to use an LLM service provider.
Upload audio files (MP3, WAV, M4A, etc.) via drag-and-drop or file selection. Transcription and summarization happen in the background without blocking the UI. Uses OpenAI-compatible Speech-to-Text (STT) APIs that you can connect to a self-hosted model (like Whisper). Generates concise titles and summaries using configurable LLMs via OpenAI-compatible APIs. Ask questions and interact with the transcription content using an AI model.
A bit of glue between components that is able to textually summarize videos and podcasts - offline. The script takes a URL as argument, downloads and extracts the audio, transcribes the spoken words to text and then finally prints a summary of the content. No external services are used by this script except for the initial audio download. Examples of URLs that work are Youtube videos and Apple podcasts, see the yt-dlp project for the full list.
This script doesn't do anything clever, it just makes use of the great work done by other projects. Since the purpose is to not have to sit through 8-12 minutes of someone explaining what should've just been a short blog post. The default model used is LLaMa-3 to support medium spec hardware. If you have a large system, Mixtral 8x7b is another great option with a much larger context window (= able to work with longer transcriptions).
The script saves transcriptions to a folder in the same directory, and if the same URL is later used again it will not re-download the audio and create a new transcription but use the existing one. This means it's possible to later use the conversational mode to ask questions on the content, even if not done the first time.
Relies upon a locally hosted LLM to do the heavy lifting so you don't have to ship the data off to another service. Entirely self hosted.
Txtify is a free and open-source web app for converting audio and video to text using advanced AI models. It supports YouTube videos and personal media files, offering fast and accurate transcriptions. Txtify can be self-hosted, giving you full control over your transcription process.
Free, instant translations and transcriptions for video and audio files!
I learn much better from text than from videos.
Youtube-to-Webpage is a Perl script to create a webpage from a Youtube video with a transcript generated from the video's closed captions paired with screenshots of the video.
The project is built upon: