test: Created desktop Whisper transcription script with single-file/bat…
- "whisper/transcribe.py" - "whisper/requirements.txt" - "whisper/README.md" GSD-Task: S01/T04
This commit is contained in:
parent
07126138b5
commit
56adf2f2ef
6 changed files with 616 additions and 2 deletions
|
|
@ -53,7 +53,7 @@
|
|||
- Estimate: 1-2 hours
|
||||
- Files: backend/main.py, backend/schemas.py, backend/routers/__init__.py, backend/routers/health.py, backend/routers/creators.py, backend/config.py
|
||||
- Verify: curl http://localhost:8000/health returns 200; curl http://localhost:8000/api/v1/creators returns empty list
|
||||
- [ ] **T04: Whisper transcription script** — 1. Create Python script whisper/transcribe.py that:
|
||||
- [x] **T04: Created desktop Whisper transcription script with single-file/batch modes, resumability, spec-compliant JSON output, and ffmpeg validation** — 1. Create Python script whisper/transcribe.py that:
|
||||
- Accepts video file path (or directory for batch mode)
|
||||
- Extracts audio via ffmpeg (subprocess)
|
||||
- Runs Whisper large-v3 with segment-level and word-level timestamps
|
||||
|
|
|
|||
31
.gsd/milestones/M001/slices/S01/tasks/T03-VERIFY.json
Normal file
31
.gsd/milestones/M001/slices/S01/tasks/T03-VERIFY.json
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
{
|
||||
"schemaVersion": 1,
|
||||
"taskId": "T03",
|
||||
"unitId": "M001/S01/T03",
|
||||
"timestamp": 1774821297505,
|
||||
"passed": false,
|
||||
"discoverySource": "none",
|
||||
"checks": [],
|
||||
"retryAttempt": 1,
|
||||
"maxRetries": 2,
|
||||
"runtimeErrors": [
|
||||
{
|
||||
"source": "bg-shell",
|
||||
"severity": "crash",
|
||||
"message": "[chrysopedia-api] exitCode=1",
|
||||
"blocking": true
|
||||
},
|
||||
{
|
||||
"source": "bg-shell",
|
||||
"severity": "crash",
|
||||
"message": "[chrysopedia-api-2] exitCode=1",
|
||||
"blocking": true
|
||||
},
|
||||
{
|
||||
"source": "bg-shell",
|
||||
"severity": "crash",
|
||||
"message": "[chrysopedia-api-3] exitCode=1",
|
||||
"blocking": true
|
||||
}
|
||||
]
|
||||
}
|
||||
82
.gsd/milestones/M001/slices/S01/tasks/T04-SUMMARY.md
Normal file
82
.gsd/milestones/M001/slices/S01/tasks/T04-SUMMARY.md
Normal file
|
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
id: T04
|
||||
parent: S01
|
||||
milestone: M001
|
||||
provides: []
|
||||
requires: []
|
||||
affects: []
|
||||
key_files: ["whisper/transcribe.py", "whisper/requirements.txt", "whisper/README.md"]
|
||||
key_decisions: ["Whisper import deferred inside transcribe_audio() so --help and ffmpeg validation work without openai-whisper installed", "Audio extracted to 16kHz mono WAV via ffmpeg subprocess matching Whisper expected format", "Creator folder inferred from parent directory name by default, overridable with --creator"]
|
||||
patterns_established: []
|
||||
drill_down_paths: []
|
||||
observability_surfaces: []
|
||||
duration: ""
|
||||
verification_result: "1. python3 whisper/transcribe.py --help — exits 0, shows full usage with all CLI args and examples. 2. python3 whisper/transcribe.py --input /tmp/fake.mp4 --output-dir /tmp/out — exits 1 with clear ffmpeg-not-found error, confirming validation works. 3. Python AST parse confirms syntax validity."
|
||||
completed_at: 2026-03-29T21:57:39.524Z
|
||||
blocker_discovered: false
|
||||
---
|
||||
|
||||
# T04: Created desktop Whisper transcription script with single-file/batch modes, resumability, spec-compliant JSON output, and ffmpeg validation
|
||||
|
||||
> Created desktop Whisper transcription script with single-file/batch modes, resumability, spec-compliant JSON output, and ffmpeg validation
|
||||
|
||||
## What Happened
|
||||
---
|
||||
id: T04
|
||||
parent: S01
|
||||
milestone: M001
|
||||
key_files:
|
||||
- whisper/transcribe.py
|
||||
- whisper/requirements.txt
|
||||
- whisper/README.md
|
||||
key_decisions:
|
||||
- Whisper import deferred inside transcribe_audio() so --help and ffmpeg validation work without openai-whisper installed
|
||||
- Audio extracted to 16kHz mono WAV via ffmpeg subprocess matching Whisper expected format
|
||||
- Creator folder inferred from parent directory name by default, overridable with --creator
|
||||
duration: ""
|
||||
verification_result: passed
|
||||
completed_at: 2026-03-29T21:57:39.525Z
|
||||
blocker_discovered: false
|
||||
---
|
||||
|
||||
# T04: Created desktop Whisper transcription script with single-file/batch modes, resumability, spec-compliant JSON output, and ffmpeg validation
|
||||
|
||||
**Created desktop Whisper transcription script with single-file/batch modes, resumability, spec-compliant JSON output, and ffmpeg validation**
|
||||
|
||||
## What Happened
|
||||
|
||||
Built whisper/transcribe.py implementing all task plan requirements: argparse CLI with --input, --output-dir, --model (default large-v3), --device (default cuda); ffmpeg audio extraction to 16kHz mono WAV; Whisper transcription with word-level timestamps; JSON output matching the Chrysopedia spec format (source_file, creator_folder, duration_seconds, segments with words); resumability via output-exists check; batch mode with progress logging. Deferred whisper import so --help works without the dependency. Created requirements.txt and comprehensive README.md.
|
||||
|
||||
## Verification
|
||||
|
||||
1. python3 whisper/transcribe.py --help — exits 0, shows full usage with all CLI args and examples. 2. python3 whisper/transcribe.py --input /tmp/fake.mp4 --output-dir /tmp/out — exits 1 with clear ffmpeg-not-found error, confirming validation works. 3. Python AST parse confirms syntax validity.
|
||||
|
||||
## Verification Evidence
|
||||
|
||||
| # | Command | Exit Code | Verdict | Duration |
|
||||
|---|---------|-----------|---------|----------|
|
||||
| 1 | `python3 whisper/transcribe.py --help` | 0 | ✅ pass | 200ms |
|
||||
| 2 | `python3 whisper/transcribe.py --input /tmp/fake.mp4 --output-dir /tmp/out` | 1 | ✅ pass (expected ffmpeg error) | 200ms |
|
||||
| 3 | `python3 -c "import ast; ast.parse(open('whisper/transcribe.py').read())"` | 0 | ✅ pass | 100ms |
|
||||
|
||||
|
||||
## Deviations
|
||||
|
||||
Added --creator CLI flag for overriding inferred creator folder name. ffmpeg-python included in requirements.txt per plan but script uses subprocess directly for reliability.
|
||||
|
||||
## Known Issues
|
||||
|
||||
None.
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `whisper/transcribe.py`
|
||||
- `whisper/requirements.txt`
|
||||
- `whisper/README.md`
|
||||
|
||||
|
||||
## Deviations
|
||||
Added --creator CLI flag for overriding inferred creator folder name. ffmpeg-python included in requirements.txt per plan but script uses subprocess directly for reliability.
|
||||
|
||||
## Known Issues
|
||||
None.
|
||||
|
|
@ -1,3 +1,102 @@
|
|||
# Chrysopedia — Whisper Transcription
|
||||
|
||||
Desktop transcription script. See `transcribe.py` for usage.
|
||||
Desktop transcription tool for extracting timestamped text from video files
|
||||
using OpenAI's Whisper model (large-v3). Designed to run on a machine with
|
||||
an NVIDIA GPU (e.g., RTX 4090).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Python 3.10+**
|
||||
- **ffmpeg** installed and on PATH
|
||||
- **NVIDIA GPU** with CUDA support (recommended; CPU fallback available)
|
||||
|
||||
### Install ffmpeg
|
||||
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
### Install Python dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Single file
|
||||
|
||||
```bash
|
||||
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
|
||||
```
|
||||
|
||||
### Batch mode (all videos in a directory)
|
||||
|
||||
```bash
|
||||
python transcribe.py --input ./videos/ --output-dir ./transcripts
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
| Flag | Default | Description |
|
||||
| --------------- | ----------- | ----------------------------------------------- |
|
||||
| `--input` | (required) | Path to a video file or directory of videos |
|
||||
| `--output-dir` | (required) | Directory to write transcript JSON files |
|
||||
| `--model` | `large-v3` | Whisper model name (`tiny`, `base`, `small`, `medium`, `large-v3`) |
|
||||
| `--device` | `cuda` | Compute device (`cuda` or `cpu`) |
|
||||
| `--creator` | (inferred) | Override creator folder name in output JSON |
|
||||
| `-v, --verbose` | off | Enable debug logging |
|
||||
|
||||
## Output Format
|
||||
|
||||
Each video produces a JSON file matching the Chrysopedia spec:
|
||||
|
||||
```json
|
||||
{
|
||||
"source_file": "Skope — Sound Design Masterclass pt2.mp4",
|
||||
"creator_folder": "Skope",
|
||||
"duration_seconds": 7243,
|
||||
"segments": [
|
||||
{
|
||||
"start": 0.0,
|
||||
"end": 4.52,
|
||||
"text": "Hey everyone welcome back to part two...",
|
||||
"words": [
|
||||
{ "word": "Hey", "start": 0.0, "end": 0.28 },
|
||||
{ "word": "everyone", "start": 0.32, "end": 0.74 }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Resumability
|
||||
|
||||
The script automatically skips videos whose output JSON already exists. To
|
||||
re-transcribe a file, delete its output JSON first.
|
||||
|
||||
## Performance
|
||||
|
||||
Whisper large-v3 on an RTX 4090 processes audio at roughly 10–20× real-time.
|
||||
A 2-hour video takes ~6–12 minutes. For 300 videos averaging 1.5 hours each,
|
||||
the initial transcription pass takes roughly 15–40 hours of GPU time.
|
||||
|
||||
## Directory Convention
|
||||
|
||||
The script infers the `creator_folder` field from the parent directory of each
|
||||
video file. Organize videos like:
|
||||
|
||||
```
|
||||
videos/
|
||||
├── Skope/
|
||||
│ ├── Sound Design Masterclass pt1.mp4
|
||||
│ └── Sound Design Masterclass pt2.mp4
|
||||
├── Mr Bill/
|
||||
│ └── Glitch Techniques.mp4
|
||||
```
|
||||
|
||||
Override with `--creator` when processing files outside this structure.
|
||||
|
|
|
|||
9
whisper/requirements.txt
Normal file
9
whisper/requirements.txt
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Chrysopedia — Whisper transcription dependencies
|
||||
# Install: pip install -r requirements.txt
|
||||
#
|
||||
# Note: openai-whisper requires ffmpeg to be installed on the system.
|
||||
# sudo apt install ffmpeg (Debian/Ubuntu)
|
||||
# brew install ffmpeg (macOS)
|
||||
|
||||
openai-whisper>=20231117
|
||||
ffmpeg-python>=0.2.0
|
||||
393
whisper/transcribe.py
Normal file
393
whisper/transcribe.py
Normal file
|
|
@ -0,0 +1,393 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Chrysopedia — Whisper Transcription Script
|
||||
|
||||
Desktop transcription tool for extracting timestamped text from video files
|
||||
using OpenAI's Whisper model (large-v3). Designed to run on a machine with
|
||||
an NVIDIA GPU (e.g., RTX 4090).
|
||||
|
||||
Outputs JSON matching the Chrysopedia spec format:
|
||||
{
|
||||
"source_file": "filename.mp4",
|
||||
"creator_folder": "CreatorName",
|
||||
"duration_seconds": 7243,
|
||||
"segments": [
|
||||
{
|
||||
"start": 0.0,
|
||||
"end": 4.52,
|
||||
"text": "...",
|
||||
"words": [{"word": "Hey", "start": 0.0, "end": 0.28}, ...]
|
||||
}
|
||||
]
|
||||
}
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Logging
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
LOG_FORMAT = "%(asctime)s [%(levelname)s] %(message)s"
|
||||
logging.basicConfig(format=LOG_FORMAT, level=logging.INFO)
|
||||
logger = logging.getLogger("chrysopedia.transcribe")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SUPPORTED_EXTENSIONS = {".mp4", ".mkv", ".avi", ".mov", ".webm", ".flv", ".wmv"}
|
||||
DEFAULT_MODEL = "large-v3"
|
||||
DEFAULT_DEVICE = "cuda"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def check_ffmpeg() -> bool:
|
||||
"""Return True if ffmpeg is available on PATH."""
|
||||
return shutil.which("ffmpeg") is not None
|
||||
|
||||
|
||||
def get_audio_duration(video_path: Path) -> float | None:
|
||||
"""Use ffprobe to get duration in seconds. Returns None on failure."""
|
||||
ffprobe = shutil.which("ffprobe")
|
||||
if ffprobe is None:
|
||||
return None
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
ffprobe,
|
||||
"-v", "error",
|
||||
"-show_entries", "format=duration",
|
||||
"-of", "default=noprint_wrappers=1:nokey=1",
|
||||
str(video_path),
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30,
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except (subprocess.TimeoutExpired, ValueError, OSError) as exc:
|
||||
logger.warning("Could not determine duration for %s: %s", video_path.name, exc)
|
||||
return None
|
||||
|
||||
|
||||
def extract_audio(video_path: Path, audio_path: Path) -> None:
|
||||
"""Extract audio from video to 16kHz mono WAV using ffmpeg."""
|
||||
logger.info("Extracting audio: %s -> %s", video_path.name, audio_path.name)
|
||||
cmd = [
|
||||
"ffmpeg",
|
||||
"-i", str(video_path),
|
||||
"-vn", # no video
|
||||
"-acodec", "pcm_s16le", # 16-bit PCM
|
||||
"-ar", "16000", # 16kHz (Whisper expects this)
|
||||
"-ac", "1", # mono
|
||||
"-y", # overwrite
|
||||
str(audio_path),
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=600)
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(
|
||||
f"ffmpeg audio extraction failed (exit {result.returncode}): {result.stderr[:500]}"
|
||||
)
|
||||
|
||||
|
||||
def transcribe_audio(
|
||||
audio_path: Path,
|
||||
model_name: str = DEFAULT_MODEL,
|
||||
device: str = DEFAULT_DEVICE,
|
||||
) -> dict:
|
||||
"""Run Whisper on the audio file and return the raw result dict."""
|
||||
# Import whisper here so --help works without the dependency installed
|
||||
try:
|
||||
import whisper # type: ignore[import-untyped]
|
||||
except ImportError:
|
||||
logger.error(
|
||||
"openai-whisper is not installed. "
|
||||
"Install it with: pip install openai-whisper"
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
logger.info("Loading Whisper model '%s' on device '%s'...", model_name, device)
|
||||
t0 = time.time()
|
||||
model = whisper.load_model(model_name, device=device)
|
||||
logger.info("Model loaded in %.1f s", time.time() - t0)
|
||||
|
||||
logger.info("Transcribing %s ...", audio_path.name)
|
||||
t0 = time.time()
|
||||
result = model.transcribe(
|
||||
str(audio_path),
|
||||
word_timestamps=True,
|
||||
verbose=False,
|
||||
)
|
||||
elapsed = time.time() - t0
|
||||
logger.info(
|
||||
"Transcription complete in %.1f s (%.1fx real-time)",
|
||||
elapsed,
|
||||
(result.get("duration", elapsed) / elapsed) if elapsed > 0 else 0,
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
def format_output(
|
||||
whisper_result: dict,
|
||||
source_file: str,
|
||||
creator_folder: str,
|
||||
duration_seconds: float | None,
|
||||
) -> dict:
|
||||
"""Convert Whisper result to the Chrysopedia spec JSON format."""
|
||||
segments = []
|
||||
for seg in whisper_result.get("segments", []):
|
||||
words = []
|
||||
for w in seg.get("words", []):
|
||||
words.append(
|
||||
{
|
||||
"word": w.get("word", "").strip(),
|
||||
"start": round(w.get("start", 0.0), 2),
|
||||
"end": round(w.get("end", 0.0), 2),
|
||||
}
|
||||
)
|
||||
segments.append(
|
||||
{
|
||||
"start": round(seg.get("start", 0.0), 2),
|
||||
"end": round(seg.get("end", 0.0), 2),
|
||||
"text": seg.get("text", "").strip(),
|
||||
"words": words,
|
||||
}
|
||||
)
|
||||
|
||||
# Use duration from ffprobe if available, otherwise from whisper
|
||||
if duration_seconds is None:
|
||||
duration_seconds = whisper_result.get("duration", 0.0)
|
||||
|
||||
return {
|
||||
"source_file": source_file,
|
||||
"creator_folder": creator_folder,
|
||||
"duration_seconds": round(duration_seconds),
|
||||
"segments": segments,
|
||||
}
|
||||
|
||||
|
||||
def infer_creator_folder(video_path: Path) -> str:
|
||||
"""
|
||||
Infer creator folder name from directory structure.
|
||||
|
||||
Expected layout: /path/to/<CreatorName>/video.mp4
|
||||
Falls back to parent directory name.
|
||||
"""
|
||||
return video_path.parent.name
|
||||
|
||||
|
||||
def output_path_for(video_path: Path, output_dir: Path) -> Path:
|
||||
"""Compute the output JSON path for a given video file."""
|
||||
return output_dir / f"{video_path.stem}.json"
|
||||
|
||||
|
||||
def process_single(
|
||||
video_path: Path,
|
||||
output_dir: Path,
|
||||
model_name: str,
|
||||
device: str,
|
||||
creator_folder: str | None = None,
|
||||
) -> Path | None:
|
||||
"""
|
||||
Process a single video file. Returns the output path on success, None if skipped.
|
||||
"""
|
||||
out_path = output_path_for(video_path, output_dir)
|
||||
|
||||
# Resumability: skip if output already exists
|
||||
if out_path.exists():
|
||||
logger.info("SKIP (output exists): %s", out_path)
|
||||
return None
|
||||
|
||||
logger.info("Processing: %s", video_path)
|
||||
|
||||
# Determine creator folder
|
||||
folder = creator_folder or infer_creator_folder(video_path)
|
||||
|
||||
# Get duration via ffprobe
|
||||
duration = get_audio_duration(video_path)
|
||||
if duration is not None:
|
||||
logger.info("Video duration: %.0f s (%.1f min)", duration, duration / 60)
|
||||
|
||||
# Extract audio to temp file
|
||||
with tempfile.TemporaryDirectory(prefix="chrysopedia_") as tmpdir:
|
||||
audio_path = Path(tmpdir) / "audio.wav"
|
||||
extract_audio(video_path, audio_path)
|
||||
|
||||
# Transcribe
|
||||
whisper_result = transcribe_audio(audio_path, model_name, device)
|
||||
|
||||
# Format and write output
|
||||
output = format_output(whisper_result, video_path.name, folder, duration)
|
||||
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
with open(out_path, "w", encoding="utf-8") as f:
|
||||
json.dump(output, f, indent=2, ensure_ascii=False)
|
||||
|
||||
segment_count = len(output["segments"])
|
||||
logger.info("Wrote %s (%d segments)", out_path, segment_count)
|
||||
return out_path
|
||||
|
||||
|
||||
def find_videos(input_path: Path) -> list[Path]:
|
||||
"""Find all supported video files in a directory (non-recursive)."""
|
||||
videos = sorted(
|
||||
p for p in input_path.iterdir()
|
||||
if p.is_file() and p.suffix.lower() in SUPPORTED_EXTENSIONS
|
||||
)
|
||||
return videos
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="transcribe",
|
||||
description=(
|
||||
"Chrysopedia Whisper Transcription — extract timestamped transcripts "
|
||||
"from video files using OpenAI's Whisper model."
|
||||
),
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" # Single file\n"
|
||||
" python transcribe.py --input video.mp4 --output-dir ./transcripts\n"
|
||||
"\n"
|
||||
" # Batch mode (all videos in directory)\n"
|
||||
" python transcribe.py --input ./videos/ --output-dir ./transcripts\n"
|
||||
"\n"
|
||||
" # Use a smaller model on CPU\n"
|
||||
" python transcribe.py --input video.mp4 --model base --device cpu\n"
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input",
|
||||
required=True,
|
||||
type=str,
|
||||
help="Path to a video file or directory of video files",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir",
|
||||
required=True,
|
||||
type=str,
|
||||
help="Directory to write transcript JSON files",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
default=DEFAULT_MODEL,
|
||||
type=str,
|
||||
help=f"Whisper model name (default: {DEFAULT_MODEL})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
default=DEFAULT_DEVICE,
|
||||
type=str,
|
||||
help=f"Compute device: cuda, cpu (default: {DEFAULT_DEVICE})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--creator",
|
||||
default=None,
|
||||
type=str,
|
||||
help="Override creator folder name (default: inferred from parent directory)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v", "--verbose",
|
||||
action="store_true",
|
||||
help="Enable debug logging",
|
||||
)
|
||||
return parser
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
if args.verbose:
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Validate ffmpeg availability
|
||||
if not check_ffmpeg():
|
||||
logger.error(
|
||||
"ffmpeg is not installed or not on PATH. "
|
||||
"Install it with: sudo apt install ffmpeg (or equivalent)"
|
||||
)
|
||||
return 1
|
||||
|
||||
input_path = Path(args.input).resolve()
|
||||
output_dir = Path(args.output_dir).resolve()
|
||||
|
||||
if not input_path.exists():
|
||||
logger.error("Input path does not exist: %s", input_path)
|
||||
return 1
|
||||
|
||||
# Single file mode
|
||||
if input_path.is_file():
|
||||
if input_path.suffix.lower() not in SUPPORTED_EXTENSIONS:
|
||||
logger.error(
|
||||
"Unsupported file type '%s'. Supported: %s",
|
||||
input_path.suffix,
|
||||
", ".join(sorted(SUPPORTED_EXTENSIONS)),
|
||||
)
|
||||
return 1
|
||||
result = process_single(
|
||||
input_path, output_dir, args.model, args.device, args.creator
|
||||
)
|
||||
if result is None:
|
||||
logger.info("Nothing to do (output already exists).")
|
||||
return 0
|
||||
|
||||
# Batch mode (directory)
|
||||
if input_path.is_dir():
|
||||
videos = find_videos(input_path)
|
||||
if not videos:
|
||||
logger.warning("No supported video files found in %s", input_path)
|
||||
return 0
|
||||
|
||||
logger.info("Found %d video(s) in %s", len(videos), input_path)
|
||||
processed = 0
|
||||
skipped = 0
|
||||
failed = 0
|
||||
|
||||
for i, video in enumerate(videos, 1):
|
||||
logger.info("--- [%d/%d] %s ---", i, len(videos), video.name)
|
||||
try:
|
||||
result = process_single(
|
||||
video, output_dir, args.model, args.device, args.creator
|
||||
)
|
||||
if result is not None:
|
||||
processed += 1
|
||||
else:
|
||||
skipped += 1
|
||||
except Exception:
|
||||
logger.exception("FAILED: %s", video.name)
|
||||
failed += 1
|
||||
|
||||
logger.info(
|
||||
"Batch complete: %d processed, %d skipped, %d failed",
|
||||
processed, skipped, failed,
|
||||
)
|
||||
return 1 if failed > 0 else 0
|
||||
|
||||
logger.error("Input is neither a file nor a directory: %s", input_path)
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Loading…
Add table
Reference in a new issue