History

jlightner 56adf2f2ef test: Created desktop Whisper transcription script with single-file/bat… - "whisper/transcribe.py" - "whisper/requirements.txt" - "whisper/README.md" GSD-Task: S01/T04		2026-03-29 21:57:42 +00:00
..
README.md	test: Created desktop Whisper transcription script with single-file/bat…	2026-03-29 21:57:42 +00:00
requirements.txt	test: Created desktop Whisper transcription script with single-file/bat…	2026-03-29 21:57:42 +00:00
transcribe.py	test: Created desktop Whisper transcription script with single-file/bat…	2026-03-29 21:57:42 +00:00

README.md

Chrysopedia — Whisper Transcription

Desktop transcription tool for extracting timestamped text from video files using OpenAI's Whisper model (large-v3). Designed to run on a machine with an NVIDIA GPU (e.g., RTX 4090).

Prerequisites

Python 3.10+
ffmpeg installed and on PATH
NVIDIA GPU with CUDA support (recommended; CPU fallback available)

Install ffmpeg

# Debian/Ubuntu
sudo apt install ffmpeg

# macOS
brew install ffmpeg

Install Python dependencies

pip install -r requirements.txt

Usage

Single file

python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts

Batch mode (all videos in a directory)

python transcribe.py --input ./videos/ --output-dir ./transcripts

Options

Flag	Default	Description
`--input`	(required)	Path to a video file or directory of videos
`--output-dir`	(required)	Directory to write transcript JSON files
`--model`	`large-v3`	Whisper model name (`tiny`, `base`, `small`, `medium`, `large-v3`)
`--device`	`cuda`	Compute device (`cuda` or `cpu`)
`--creator`	(inferred)	Override creator folder name in output JSON
`-v, --verbose`	off	Enable debug logging

Output Format

Each video produces a JSON file matching the Chrysopedia spec:

{
  "source_file": "Skope — Sound Design Masterclass pt2.mp4",
  "creator_folder": "Skope",
  "duration_seconds": 7243,
  "segments": [
    {
      "start": 0.0,
      "end": 4.52,
      "text": "Hey everyone welcome back to part two...",
      "words": [
        { "word": "Hey", "start": 0.0, "end": 0.28 },
        { "word": "everyone", "start": 0.32, "end": 0.74 }
      ]
    }
  ]
}

Resumability

The script automatically skips videos whose output JSON already exists. To re-transcribe a file, delete its output JSON first.

Performance

Whisper large-v3 on an RTX 4090 processes audio at roughly 10–20× real-time. A 2-hour video takes ~6–12 minutes. For 300 videos averaging 1.5 hours each, the initial transcription pass takes roughly 15–40 hours of GPU time.

Directory Convention

The script infers the creator_folder field from the parent directory of each video file. Organize videos like:

videos/
├── Skope/
│   ├── Sound Design Masterclass pt1.mp4
│   └── Sound Design Masterclass pt2.mp4
├── Mr Bill/
│   └── Glitch Techniques.mp4

Override with --creator when processing files outside this structure.

README.md Unescape Escape