# Chrysopedia — Whisper Transcription Desktop transcription tool for extracting timestamped text from video files using OpenAI's Whisper model (large-v3). Designed to run on a machine with an NVIDIA GPU (e.g., RTX 4090). ## Prerequisites - **Python 3.10+** - **ffmpeg** installed and on PATH - **NVIDIA GPU** with CUDA support (recommended; CPU fallback available) ### Install ffmpeg ```bash # Debian/Ubuntu sudo apt install ffmpeg # macOS brew install ffmpeg # Windows (via chocolatey or manual install) choco install ffmpeg ``` ### Install Python dependencies ```bash pip install -r requirements.txt # For CUDA support, install torch with CUDA wheels: pip install torch --index-url https://download.pytorch.org/whl/cu126 ``` ## Usage ### Single file ```bash python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts ``` ### Batch mode (all videos in a directory) ```bash python transcribe.py --input ./videos/ --output-dir ./transcripts ``` ### Mass batch mode (recursive, multi-creator) For large content libraries with nested subdirectories per creator: ```bash python batch_transcribe.py \ --content-root "A:\Education\Artist Streams & Content" \ --output-dir "C:\Users\jlightner\chrysopedia\transcripts" \ --python C:\Users\jlightner\.conda\envs\transcribe\python.exe # Dry run to preview without transcribing: python batch_transcribe.py --content-root ... --output-dir ... --dry-run ``` `batch_transcribe.py` recursively walks all subdirectories, discovers video files, and calls `transcribe.py` for each directory. The `creator_folder` field in the output JSON is set to the top-level subdirectory name (the artist/creator). Output directory structure mirrors the source hierarchy. A `batch_manifest.json` is written to the output root on completion with timing, per-creator results, and error details. ### Options (transcribe.py) | Flag | Default | Description | | --------------- | ----------- | ----------------------------------------------- | | `--input` | (required) | Path to a video file or directory of videos | | `--output-dir` | (required) | Directory to write transcript JSON files | | `--model` | `large-v3` | Whisper model name (`tiny`, `base`, `small`, `medium`, `large-v3`) | | `--device` | `cuda` | Compute device (`cuda` or `cpu`) | | `--creator` | (inferred) | Override creator folder name in output JSON | | `-v, --verbose` | off | Enable debug logging | ### Options (batch_transcribe.py) | Flag | Default | Description | | ----------------- | ------------ | ------------------------------------------------ | | `--content-root` | (required) | Root directory with creator subdirectories | | `--output-dir` | (required) | Root output directory for transcript JSONs | | `--script` | (auto) | Path to transcribe.py (default: same directory) | | `--python` | (auto) | Python interpreter to use | | `--model` | `large-v3` | Whisper model name | | `--device` | `cuda` | Compute device | | `--dry-run` | off | Preview work plan without transcribing | ## Output Format Each video produces a JSON file matching the Chrysopedia pipeline spec: ```json { "source_file": "Skope — Sound Design Masterclass pt2.mp4", "creator_folder": "Skope", "duration_seconds": 7243, "segments": [ { "start": 0.0, "end": 4.52, "text": "Hey everyone welcome back to part two...", "words": [ { "word": "Hey", "start": 0.0, "end": 0.28 }, { "word": "everyone", "start": 0.32, "end": 0.74 } ] } ] } ``` This format is consumed directly by the Chrysopedia pipeline stage 2 (transcript segmentation) via the `POST /api/v1/ingest` endpoint. ## Resumability Both scripts automatically skip videos whose output JSON already exists. To re-transcribe a file, delete its output JSON first. ## Current Transcription Environment ### Machine: HAL0022 (10.0.0.131) - **GPU:** NVIDIA GeForce RTX 4090 (24GB VRAM) - **OS:** Windows 11 - **Python:** Conda env `transcribe` at `C:\Users\jlightner\.conda\envs\transcribe\python.exe` - **CUDA:** PyTorch with cu126 wheels ### Content Source ``` A:\Education\Artist Streams & Content\ ├── au5/ (334 videos) ├── Keota/ (193 videos) ├── DJ Shortee/ (83 videos) ├── KOAN Sound/ (68 videos) ├── Teddy Killerz/ (62 videos) ├── ... (42 creators, 1197 videos total across 146 directories) ``` ### Transcript Output Location ``` C:\Users\jlightner\chrysopedia\transcripts\ ``` Directory structure mirrors the source hierarchy. Each video produces a `.json` transcript file. **Transfer to ub01:** Transcripts need to be copied to `/vmPool/r/services/chrysopedia_data/transcripts/` on ub01 for pipeline ingestion. This can be done via SMB (`\\ub01\vmPool\services\chrysopedia_data\transcripts`) or via `scp`/`rsync` from a Linux machine with access to both. ### Running the Batch Job The batch transcription runs as a Windows Scheduled Task to survive SSH disconnections: ```powershell # The task is already created. To re-run: schtasks /run /tn "ChrysopediaTranscribe" # Check status: schtasks /query /tn "ChrysopediaTranscribe" /v /fo list | findstr /i "status result" # Monitor log: Get-Content 'C:\Users\jlightner\chrysopedia\transcription.log' -Tail 30 # Or follow live: Get-Content 'C:\Users\jlightner\chrysopedia\transcription.log' -Tail 20 -Wait ``` ### Scripts on HAL0022 ``` C:\Users\jlightner\chrysopedia\ ├── transcribe.py # Single-file/directory transcription ├── batch_transcribe.py # Recursive multi-creator batch runner ├── run_transcription.bat # Batch file invoked by scheduled task ├── launch_transcription.py # Alternative launcher (subprocess) ├── transcription.log # Current batch run log output └── transcripts/ # Output directory ├── batch_manifest.json ├── au5/ ├── Break/ └── ... ``` ## Performance Whisper large-v3 on an RTX 4090 processes audio at roughly 10–20× real-time. A 2-hour video takes ~6–12 minutes. For the full 1,197-video library, expect roughly 20–60 hours of GPU time depending on average video length. ## Directory Convention The script infers the `creator_folder` field from the parent directory of each video file (or the top-level creator folder in batch mode). Organize videos like: ``` content-root/ ├── Skope/ │ ├── Youtube/ │ │ ├── Sound Design Masterclass pt1.mp4 │ │ └── Sound Design Masterclass pt2.mp4 │ └── Patreon/ │ └── Advanced Wavetables.mp4 ├── Mr Bill/ │ └── Youtube/ │ └── Glitch Techniques.mp4 ``` Override with `--creator` when processing files outside this structure.