chrysopedia/whisper/README.md
jlightner 56adf2f2ef test: Created desktop Whisper transcription script with single-file/bat…
- "whisper/transcribe.py"
- "whisper/requirements.txt"
- "whisper/README.md"

GSD-Task: S01/T04
2026-03-29 21:57:42 +00:00

102 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Chrysopedia — Whisper Transcription
Desktop transcription tool for extracting timestamped text from video files
using OpenAI's Whisper model (large-v3). Designed to run on a machine with
an NVIDIA GPU (e.g., RTX 4090).
## Prerequisites
- **Python 3.10+**
- **ffmpeg** installed and on PATH
- **NVIDIA GPU** with CUDA support (recommended; CPU fallback available)
### Install ffmpeg
```bash
# Debian/Ubuntu
sudo apt install ffmpeg
# macOS
brew install ffmpeg
```
### Install Python dependencies
```bash
pip install -r requirements.txt
```
## Usage
### Single file
```bash
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
```
### Batch mode (all videos in a directory)
```bash
python transcribe.py --input ./videos/ --output-dir ./transcripts
```
### Options
| Flag | Default | Description |
| --------------- | ----------- | ----------------------------------------------- |
| `--input` | (required) | Path to a video file or directory of videos |
| `--output-dir` | (required) | Directory to write transcript JSON files |
| `--model` | `large-v3` | Whisper model name (`tiny`, `base`, `small`, `medium`, `large-v3`) |
| `--device` | `cuda` | Compute device (`cuda` or `cpu`) |
| `--creator` | (inferred) | Override creator folder name in output JSON |
| `-v, --verbose` | off | Enable debug logging |
## Output Format
Each video produces a JSON file matching the Chrysopedia spec:
```json
{
"source_file": "Skope — Sound Design Masterclass pt2.mp4",
"creator_folder": "Skope",
"duration_seconds": 7243,
"segments": [
{
"start": 0.0,
"end": 4.52,
"text": "Hey everyone welcome back to part two...",
"words": [
{ "word": "Hey", "start": 0.0, "end": 0.28 },
{ "word": "everyone", "start": 0.32, "end": 0.74 }
]
}
]
}
```
## Resumability
The script automatically skips videos whose output JSON already exists. To
re-transcribe a file, delete its output JSON first.
## Performance
Whisper large-v3 on an RTX 4090 processes audio at roughly 1020× real-time.
A 2-hour video takes ~612 minutes. For 300 videos averaging 1.5 hours each,
the initial transcription pass takes roughly 1540 hours of GPU time.
## Directory Convention
The script infers the `creator_folder` field from the parent directory of each
video file. Organize videos like:
```
videos/
├── Skope/
│ ├── Sound Design Masterclass pt1.mp4
│ └── Sound Design Masterclass pt2.mp4
├── Mr Bill/
│ └── Glitch Techniques.mp4
```
Override with `--creator` when processing files outside this structure.