# Chrysopedia — Whisper Transcription Desktop transcription tool for extracting timestamped text from video files using OpenAI's Whisper model (large-v3). Designed to run on a machine with an NVIDIA GPU (e.g., RTX 4090). ## Prerequisites - **Python 3.10+** - **ffmpeg** installed and on PATH - **NVIDIA GPU** with CUDA support (recommended; CPU fallback available) ### Install ffmpeg ```bash # Debian/Ubuntu sudo apt install ffmpeg # macOS brew install ffmpeg ``` ### Install Python dependencies ```bash pip install -r requirements.txt ``` ## Usage ### Single file ```bash python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts ``` ### Batch mode (all videos in a directory) ```bash python transcribe.py --input ./videos/ --output-dir ./transcripts ``` ### Options | Flag | Default | Description | | --------------- | ----------- | ----------------------------------------------- | | `--input` | (required) | Path to a video file or directory of videos | | `--output-dir` | (required) | Directory to write transcript JSON files | | `--model` | `large-v3` | Whisper model name (`tiny`, `base`, `small`, `medium`, `large-v3`) | | `--device` | `cuda` | Compute device (`cuda` or `cpu`) | | `--creator` | (inferred) | Override creator folder name in output JSON | | `-v, --verbose` | off | Enable debug logging | ## Output Format Each video produces a JSON file matching the Chrysopedia spec: ```json { "source_file": "Skope — Sound Design Masterclass pt2.mp4", "creator_folder": "Skope", "duration_seconds": 7243, "segments": [ { "start": 0.0, "end": 4.52, "text": "Hey everyone welcome back to part two...", "words": [ { "word": "Hey", "start": 0.0, "end": 0.28 }, { "word": "everyone", "start": 0.32, "end": 0.74 } ] } ] } ``` ## Resumability The script automatically skips videos whose output JSON already exists. To re-transcribe a file, delete its output JSON first. ## Performance Whisper large-v3 on an RTX 4090 processes audio at roughly 10–20× real-time. A 2-hour video takes ~6–12 minutes. For 300 videos averaging 1.5 hours each, the initial transcription pass takes roughly 15–40 hours of GPU time. ## Directory Convention The script infers the `creator_folder` field from the parent directory of each video file. Organize videos like: ``` videos/ ├── Skope/ │ ├── Sound Design Masterclass pt1.mp4 │ └── Sound Design Masterclass pt2.mp4 ├── Mr Bill/ │ └── Glitch Techniques.mp4 ``` Override with `--creator` when processing files outside this structure.