- "whisper/transcribe.py" - "whisper/requirements.txt" - "whisper/README.md" GSD-Task: S01/T04 |
||
|---|---|---|
| .. | ||
| README.md | ||
| requirements.txt | ||
| transcribe.py | ||
Chrysopedia — Whisper Transcription
Desktop transcription tool for extracting timestamped text from video files using OpenAI's Whisper model (large-v3). Designed to run on a machine with an NVIDIA GPU (e.g., RTX 4090).
Prerequisites
- Python 3.10+
- ffmpeg installed and on PATH
- NVIDIA GPU with CUDA support (recommended; CPU fallback available)
Install ffmpeg
# Debian/Ubuntu
sudo apt install ffmpeg
# macOS
brew install ffmpeg
Install Python dependencies
pip install -r requirements.txt
Usage
Single file
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
Batch mode (all videos in a directory)
python transcribe.py --input ./videos/ --output-dir ./transcripts
Options
| Flag | Default | Description |
|---|---|---|
--input |
(required) | Path to a video file or directory of videos |
--output-dir |
(required) | Directory to write transcript JSON files |
--model |
large-v3 |
Whisper model name (tiny, base, small, medium, large-v3) |
--device |
cuda |
Compute device (cuda or cpu) |
--creator |
(inferred) | Override creator folder name in output JSON |
-v, --verbose |
off | Enable debug logging |
Output Format
Each video produces a JSON file matching the Chrysopedia spec:
{
"source_file": "Skope — Sound Design Masterclass pt2.mp4",
"creator_folder": "Skope",
"duration_seconds": 7243,
"segments": [
{
"start": 0.0,
"end": 4.52,
"text": "Hey everyone welcome back to part two...",
"words": [
{ "word": "Hey", "start": 0.0, "end": 0.28 },
{ "word": "everyone", "start": 0.32, "end": 0.74 }
]
}
]
}
Resumability
The script automatically skips videos whose output JSON already exists. To re-transcribe a file, delete its output JSON first.
Performance
Whisper large-v3 on an RTX 4090 processes audio at roughly 10–20× real-time. A 2-hour video takes ~6–12 minutes. For 300 videos averaging 1.5 hours each, the initial transcription pass takes roughly 15–40 hours of GPU time.
Directory Convention
The script infers the creator_folder field from the parent directory of each
video file. Organize videos like:
videos/
├── Skope/
│ ├── Sound Design Masterclass pt1.mp4
│ └── Sound Design Masterclass pt2.mp4
├── Mr Bill/
│ └── Glitch Techniques.mp4
Override with --creator when processing files outside this structure.