feat: Added shorts_generator.py with 3 format presets and stage_generat…
- "backend/pipeline/shorts_generator.py" - "backend/pipeline/stages.py" GSD-Task: S03/T02
This commit is contained in:
parent
dfc5aa2ae7
commit
0007528e77
5 changed files with 435 additions and 1 deletions
|
|
@ -52,7 +52,7 @@ Set up all infrastructure for the shorts pipeline: new SQLAlchemy model, Alembic
|
||||||
- Estimate: 30m
|
- Estimate: 30m
|
||||||
- Files: backend/models.py, backend/config.py, docker/Dockerfile.api, docker-compose.yml, alembic/versions/025_add_generated_shorts.py
|
- Files: backend/models.py, backend/config.py, docker/Dockerfile.api, docker-compose.yml, alembic/versions/025_add_generated_shorts.py
|
||||||
- Verify: cd backend && python -c "from models import GeneratedShort, FormatPreset, ShortStatus; print('OK')" && grep -q ffmpeg ../docker/Dockerfile.api && grep -q video_source_path config.py
|
- Verify: cd backend && python -c "from models import GeneratedShort, FormatPreset, ShortStatus; print('OK')" && grep -q ffmpeg ../docker/Dockerfile.api && grep -q video_source_path config.py
|
||||||
- [ ] **T02: Build ffmpeg clip generator module and Celery task with MinIO upload** — ## Description
|
- [x] **T02: Added shorts_generator.py with 3 format presets and stage_generate_shorts Celery task with MinIO upload and per-preset error handling** — ## Description
|
||||||
|
|
||||||
Create the pure ffmpeg wrapper module with 3 format presets, then wire a Celery task that reads an approved highlight, resolves the video file path, generates clips for each preset, uploads to MinIO, and updates DB status.
|
Create the pure ffmpeg wrapper module with 3 format presets, then wire a Celery task that reads an approved highlight, resolves the video file path, generates clips for each preset, uploads to MinIO, and updates DB status.
|
||||||
|
|
||||||
|
|
|
||||||
30
.gsd/milestones/M023/slices/S03/tasks/T01-VERIFY.json
Normal file
30
.gsd/milestones/M023/slices/S03/tasks/T01-VERIFY.json
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
{
|
||||||
|
"schemaVersion": 1,
|
||||||
|
"taskId": "T01",
|
||||||
|
"unitId": "M023/S03/T01",
|
||||||
|
"timestamp": 1775295816039,
|
||||||
|
"passed": false,
|
||||||
|
"discoverySource": "task-plan",
|
||||||
|
"checks": [
|
||||||
|
{
|
||||||
|
"command": "cd backend",
|
||||||
|
"exitCode": 0,
|
||||||
|
"durationMs": 9,
|
||||||
|
"verdict": "pass"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"command": "grep -q ffmpeg ../docker/Dockerfile.api",
|
||||||
|
"exitCode": 2,
|
||||||
|
"durationMs": 9,
|
||||||
|
"verdict": "fail"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"command": "grep -q video_source_path config.py",
|
||||||
|
"exitCode": 2,
|
||||||
|
"durationMs": 10,
|
||||||
|
"verdict": "fail"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"retryAttempt": 1,
|
||||||
|
"maxRetries": 2
|
||||||
|
}
|
||||||
83
.gsd/milestones/M023/slices/S03/tasks/T02-SUMMARY.md
Normal file
83
.gsd/milestones/M023/slices/S03/tasks/T02-SUMMARY.md
Normal file
|
|
@ -0,0 +1,83 @@
|
||||||
|
---
|
||||||
|
id: T02
|
||||||
|
parent: S03
|
||||||
|
milestone: M023
|
||||||
|
provides: []
|
||||||
|
requires: []
|
||||||
|
affects: []
|
||||||
|
key_files: ["backend/pipeline/shorts_generator.py", "backend/pipeline/stages.py"]
|
||||||
|
key_decisions: ["Lazy imports inside Celery task for shorts_generator and model types to avoid circular imports", "Per-preset independent processing with isolated error handling"]
|
||||||
|
patterns_established: []
|
||||||
|
drill_down_paths: []
|
||||||
|
observability_surfaces: []
|
||||||
|
duration: ""
|
||||||
|
verification_result: "All task and slice verification checks pass: shorts_generator module imports OK, stage_generate_shorts task imports and registers in Celery OK, model imports OK, ffmpeg in Dockerfile confirmed, video_source_path in config confirmed, chrysopedia_videos volume mount in docker-compose.yml confirmed."
|
||||||
|
completed_at: 2026-04-04T09:47:33.243Z
|
||||||
|
blocker_discovered: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# T02: Added shorts_generator.py with 3 format presets and stage_generate_shorts Celery task with MinIO upload and per-preset error handling
|
||||||
|
|
||||||
|
> Added shorts_generator.py with 3 format presets and stage_generate_shorts Celery task with MinIO upload and per-preset error handling
|
||||||
|
|
||||||
|
## What Happened
|
||||||
|
---
|
||||||
|
id: T02
|
||||||
|
parent: S03
|
||||||
|
milestone: M023
|
||||||
|
key_files:
|
||||||
|
- backend/pipeline/shorts_generator.py
|
||||||
|
- backend/pipeline/stages.py
|
||||||
|
key_decisions:
|
||||||
|
- Lazy imports inside Celery task for shorts_generator and model types to avoid circular imports
|
||||||
|
- Per-preset independent processing with isolated error handling
|
||||||
|
duration: ""
|
||||||
|
verification_result: passed
|
||||||
|
completed_at: 2026-04-04T09:47:33.246Z
|
||||||
|
blocker_discovered: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# T02: Added shorts_generator.py with 3 format presets and stage_generate_shorts Celery task with MinIO upload and per-preset error handling
|
||||||
|
|
||||||
|
**Added shorts_generator.py with 3 format presets and stage_generate_shorts Celery task with MinIO upload and per-preset error handling**
|
||||||
|
|
||||||
|
## What Happened
|
||||||
|
|
||||||
|
Created backend/pipeline/shorts_generator.py with PRESETS dict (vertical 1080x1920, square 1080x1080, horizontal 1920x1080), extract_clip() using ffmpeg subprocess with 300s timeout, and resolve_video_path() with file existence validation. Added stage_generate_shorts Celery task to stages.py that loads an approved HighlightCandidate, resolves the video file, and processes each FormatPreset independently — creating GeneratedShort rows, extracting clips to /tmp, uploading to MinIO, and updating status. Each preset failure is isolated; temp files are cleaned in finally blocks.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
All task and slice verification checks pass: shorts_generator module imports OK, stage_generate_shorts task imports and registers in Celery OK, model imports OK, ffmpeg in Dockerfile confirmed, video_source_path in config confirmed, chrysopedia_videos volume mount in docker-compose.yml confirmed.
|
||||||
|
|
||||||
|
## Verification Evidence
|
||||||
|
|
||||||
|
| # | Command | Exit Code | Verdict | Duration |
|
||||||
|
|---|---------|-----------|---------|----------|
|
||||||
|
| 1 | `cd backend && python -c "from pipeline.shorts_generator import extract_clip, PRESETS, resolve_video_path; print('OK')"` | 0 | ✅ pass | 500ms |
|
||||||
|
| 2 | `cd backend && python -c "from pipeline.stages import stage_generate_shorts; print('OK')"` | 0 | ✅ pass | 800ms |
|
||||||
|
| 3 | `grep -q 'stage_generate_shorts' backend/pipeline/stages.py` | 0 | ✅ pass | 50ms |
|
||||||
|
| 4 | `cd backend && python -c "from models import GeneratedShort, FormatPreset, ShortStatus; print('OK')"` | 0 | ✅ pass | 500ms |
|
||||||
|
| 5 | `grep ffmpeg docker/Dockerfile.api` | 0 | ✅ pass | 50ms |
|
||||||
|
| 6 | `grep video_source_path backend/config.py` | 0 | ✅ pass | 50ms |
|
||||||
|
| 7 | `grep chrysopedia_videos docker-compose.yml` | 0 | ✅ pass | 50ms |
|
||||||
|
|
||||||
|
|
||||||
|
## Deviations
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `backend/pipeline/shorts_generator.py`
|
||||||
|
- `backend/pipeline/stages.py`
|
||||||
|
|
||||||
|
|
||||||
|
## Deviations
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
None.
|
||||||
132
backend/pipeline/shorts_generator.py
Normal file
132
backend/pipeline/shorts_generator.py
Normal file
|
|
@ -0,0 +1,132 @@
|
||||||
|
"""FFmpeg clip extraction with format presets for shorts generation.
|
||||||
|
|
||||||
|
Pure functions — no DB access, no Celery dependency. Tested independently.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from models import FormatPreset
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
FFMPEG_TIMEOUT_SECS = 300
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class PresetSpec:
|
||||||
|
"""Resolution and ffmpeg video filter for a format preset."""
|
||||||
|
width: int
|
||||||
|
height: int
|
||||||
|
vf_filter: str
|
||||||
|
|
||||||
|
|
||||||
|
PRESETS: dict[FormatPreset, PresetSpec] = {
|
||||||
|
FormatPreset.vertical: PresetSpec(
|
||||||
|
width=1080,
|
||||||
|
height=1920,
|
||||||
|
vf_filter="scale=1080:-2,pad=1080:1920:(ow-iw)/2:(oh-ih)/2:black",
|
||||||
|
),
|
||||||
|
FormatPreset.square: PresetSpec(
|
||||||
|
width=1080,
|
||||||
|
height=1080,
|
||||||
|
vf_filter="crop=min(iw\\,ih):min(iw\\,ih),scale=1080:1080",
|
||||||
|
),
|
||||||
|
FormatPreset.horizontal: PresetSpec(
|
||||||
|
width=1920,
|
||||||
|
height=1080,
|
||||||
|
vf_filter="scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2:black",
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_video_path(video_source_root: str, file_path: str) -> Path:
|
||||||
|
"""Join root + relative path and validate the file exists.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
video_source_root: Base directory for video files (e.g. /videos).
|
||||||
|
file_path: Relative path stored in SourceVideo.file_path.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Resolved absolute Path.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
FileNotFoundError: If the resolved path doesn't exist or isn't a file.
|
||||||
|
"""
|
||||||
|
resolved = Path(video_source_root) / file_path
|
||||||
|
if not resolved.is_file():
|
||||||
|
raise FileNotFoundError(
|
||||||
|
f"Video file not found: {resolved} "
|
||||||
|
f"(root={video_source_root!r}, relative={file_path!r})"
|
||||||
|
)
|
||||||
|
return resolved
|
||||||
|
|
||||||
|
|
||||||
|
def extract_clip(
|
||||||
|
input_path: Path | str,
|
||||||
|
output_path: Path | str,
|
||||||
|
start_secs: float,
|
||||||
|
end_secs: float,
|
||||||
|
vf_filter: str,
|
||||||
|
) -> None:
|
||||||
|
"""Extract a clip from a video file using ffmpeg.
|
||||||
|
|
||||||
|
Seeks to *start_secs*, encodes until *end_secs*, and applies *vf_filter*.
|
||||||
|
Uses ``-c:v libx264 -preset fast -crf 23`` for reasonable quality/speed.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
input_path: Source video file.
|
||||||
|
output_path: Destination mp4 file (parent dir must exist).
|
||||||
|
start_secs: Start time in seconds.
|
||||||
|
end_secs: End time in seconds.
|
||||||
|
vf_filter: ffmpeg ``-vf`` filter string.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
subprocess.CalledProcessError: If ffmpeg exits non-zero.
|
||||||
|
subprocess.TimeoutExpired: If ffmpeg exceeds the timeout.
|
||||||
|
ValueError: If start >= end.
|
||||||
|
"""
|
||||||
|
duration = end_secs - start_secs
|
||||||
|
if duration <= 0:
|
||||||
|
raise ValueError(
|
||||||
|
f"Invalid clip range: start={start_secs}s end={end_secs}s "
|
||||||
|
f"(duration={duration}s)"
|
||||||
|
)
|
||||||
|
|
||||||
|
cmd = [
|
||||||
|
"ffmpeg",
|
||||||
|
"-y", # overwrite output
|
||||||
|
"-ss", str(start_secs), # seek before input (fast)
|
||||||
|
"-i", str(input_path),
|
||||||
|
"-t", str(duration),
|
||||||
|
"-vf", vf_filter,
|
||||||
|
"-c:v", "libx264",
|
||||||
|
"-preset", "fast",
|
||||||
|
"-crf", "23",
|
||||||
|
"-c:a", "aac",
|
||||||
|
"-b:a", "128k",
|
||||||
|
"-movflags", "+faststart", # web-friendly mp4
|
||||||
|
str(output_path),
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
"ffmpeg: extracting %.1fs clip from %s → %s",
|
||||||
|
duration, input_path, output_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
timeout=FFMPEG_TIMEOUT_SECS,
|
||||||
|
)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
stderr_text = result.stderr.decode("utf-8", errors="replace")[-2000:]
|
||||||
|
logger.error("ffmpeg failed (rc=%d): %s", result.returncode, stderr_text)
|
||||||
|
raise subprocess.CalledProcessError(
|
||||||
|
result.returncode, cmd, output=result.stdout, stderr=result.stderr,
|
||||||
|
)
|
||||||
|
|
@ -2860,3 +2860,192 @@ def extract_personality_profile(self, creator_id: str) -> str:
|
||||||
raise self.retry(exc=exc)
|
raise self.retry(exc=exc)
|
||||||
finally:
|
finally:
|
||||||
session.close()
|
session.close()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Stage: Shorts Generation ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
@celery_app.task(bind=True, max_retries=1, default_retry_delay=60)
|
||||||
|
def stage_generate_shorts(self, highlight_candidate_id: str) -> str:
|
||||||
|
"""Generate video shorts for an approved highlight candidate.
|
||||||
|
|
||||||
|
Creates one GeneratedShort row per FormatPreset, extracts the clip via
|
||||||
|
ffmpeg, uploads to MinIO, and updates status. Each preset is independent —
|
||||||
|
a failure on one does not block the others.
|
||||||
|
|
||||||
|
Returns the highlight_candidate_id on completion.
|
||||||
|
"""
|
||||||
|
from pipeline.shorts_generator import PRESETS, extract_clip, resolve_video_path
|
||||||
|
from models import FormatPreset, GeneratedShort, ShortStatus
|
||||||
|
|
||||||
|
start = time.monotonic()
|
||||||
|
session = _get_sync_session()
|
||||||
|
settings = get_settings()
|
||||||
|
|
||||||
|
try:
|
||||||
|
# ── Load highlight with joined relations ────────────────────────
|
||||||
|
highlight = session.execute(
|
||||||
|
select(HighlightCandidate)
|
||||||
|
.where(HighlightCandidate.id == highlight_candidate_id)
|
||||||
|
).scalar_one_or_none()
|
||||||
|
|
||||||
|
if highlight is None:
|
||||||
|
logger.error(
|
||||||
|
"Highlight candidate not found: %s", highlight_candidate_id,
|
||||||
|
)
|
||||||
|
return highlight_candidate_id
|
||||||
|
|
||||||
|
if highlight.status.value != "approved":
|
||||||
|
logger.warning(
|
||||||
|
"Highlight %s status is %s, expected approved — skipping",
|
||||||
|
highlight_candidate_id, highlight.status.value,
|
||||||
|
)
|
||||||
|
return highlight_candidate_id
|
||||||
|
|
||||||
|
# Check for already-processing shorts (reject duplicate runs)
|
||||||
|
existing_processing = session.execute(
|
||||||
|
select(func.count())
|
||||||
|
.where(GeneratedShort.highlight_candidate_id == highlight_candidate_id)
|
||||||
|
.where(GeneratedShort.status == ShortStatus.processing)
|
||||||
|
).scalar()
|
||||||
|
if existing_processing and existing_processing > 0:
|
||||||
|
logger.warning(
|
||||||
|
"Highlight %s already has %d processing shorts — rejecting duplicate",
|
||||||
|
highlight_candidate_id, existing_processing,
|
||||||
|
)
|
||||||
|
return highlight_candidate_id
|
||||||
|
|
||||||
|
# Eager-load relations
|
||||||
|
key_moment = highlight.key_moment
|
||||||
|
source_video = highlight.source_video
|
||||||
|
|
||||||
|
# ── Resolve video file path ─────────────────────────────────────
|
||||||
|
try:
|
||||||
|
video_path = resolve_video_path(
|
||||||
|
settings.video_source_path, source_video.file_path,
|
||||||
|
)
|
||||||
|
except FileNotFoundError as fnf:
|
||||||
|
logger.error(
|
||||||
|
"Video file missing for highlight %s: %s",
|
||||||
|
highlight_candidate_id, fnf,
|
||||||
|
)
|
||||||
|
# Mark all presets as failed
|
||||||
|
for preset in FormatPreset:
|
||||||
|
spec = PRESETS[preset]
|
||||||
|
short = GeneratedShort(
|
||||||
|
highlight_candidate_id=highlight_candidate_id,
|
||||||
|
format_preset=preset,
|
||||||
|
width=spec.width,
|
||||||
|
height=spec.height,
|
||||||
|
status=ShortStatus.failed,
|
||||||
|
error_message=str(fnf),
|
||||||
|
)
|
||||||
|
session.add(short)
|
||||||
|
session.commit()
|
||||||
|
return highlight_candidate_id
|
||||||
|
|
||||||
|
# ── Compute effective start/end (trim overrides) ────────────────
|
||||||
|
clip_start = highlight.trim_start if highlight.trim_start is not None else key_moment.start_time
|
||||||
|
clip_end = highlight.trim_end if highlight.trim_end is not None else key_moment.end_time
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
"Generating shorts for highlight=%s video=%s [%.1f–%.1f]s",
|
||||||
|
highlight_candidate_id, source_video.file_path,
|
||||||
|
clip_start, clip_end,
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Process each preset independently ───────────────────────────
|
||||||
|
for preset in FormatPreset:
|
||||||
|
spec = PRESETS[preset]
|
||||||
|
preset_start = time.monotonic()
|
||||||
|
|
||||||
|
# Create DB row (status=processing)
|
||||||
|
short = GeneratedShort(
|
||||||
|
highlight_candidate_id=highlight_candidate_id,
|
||||||
|
format_preset=preset,
|
||||||
|
width=spec.width,
|
||||||
|
height=spec.height,
|
||||||
|
status=ShortStatus.processing,
|
||||||
|
duration_secs=clip_end - clip_start,
|
||||||
|
)
|
||||||
|
session.add(short)
|
||||||
|
session.commit()
|
||||||
|
session.refresh(short)
|
||||||
|
|
||||||
|
tmp_path = Path(f"/tmp/short_{short.id}_{preset.value}.mp4")
|
||||||
|
minio_key = f"shorts/{highlight_candidate_id}/{preset.value}.mp4"
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Extract clip
|
||||||
|
extract_clip(
|
||||||
|
input_path=video_path,
|
||||||
|
output_path=tmp_path,
|
||||||
|
start_secs=clip_start,
|
||||||
|
end_secs=clip_end,
|
||||||
|
vf_filter=spec.vf_filter,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Upload to MinIO
|
||||||
|
file_size = tmp_path.stat().st_size
|
||||||
|
with open(tmp_path, "rb") as f:
|
||||||
|
from minio_client import upload_file
|
||||||
|
upload_file(
|
||||||
|
object_key=minio_key,
|
||||||
|
data=f,
|
||||||
|
length=file_size,
|
||||||
|
content_type="video/mp4",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Update DB row — complete
|
||||||
|
short.status = ShortStatus.complete
|
||||||
|
short.file_size_bytes = file_size
|
||||||
|
short.minio_object_key = minio_key
|
||||||
|
session.commit()
|
||||||
|
|
||||||
|
elapsed_preset = time.monotonic() - preset_start
|
||||||
|
logger.info(
|
||||||
|
"Short generated: highlight=%s preset=%s "
|
||||||
|
"size=%d bytes duration=%.1fs elapsed=%.1fs",
|
||||||
|
highlight_candidate_id, preset.value,
|
||||||
|
file_size, clip_end - clip_start, elapsed_preset,
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
session.rollback()
|
||||||
|
# Re-fetch the short row after rollback
|
||||||
|
session.refresh(short)
|
||||||
|
short.status = ShortStatus.failed
|
||||||
|
short.error_message = str(exc)[:2000]
|
||||||
|
session.commit()
|
||||||
|
|
||||||
|
elapsed_preset = time.monotonic() - preset_start
|
||||||
|
logger.error(
|
||||||
|
"Short failed: highlight=%s preset=%s "
|
||||||
|
"error=%s elapsed=%.1fs",
|
||||||
|
highlight_candidate_id, preset.value,
|
||||||
|
str(exc)[:500], elapsed_preset,
|
||||||
|
)
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Clean up temp file
|
||||||
|
if tmp_path.exists():
|
||||||
|
try:
|
||||||
|
tmp_path.unlink()
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
elapsed = time.monotonic() - start
|
||||||
|
logger.info(
|
||||||
|
"Shorts generation complete for highlight=%s in %.1fs",
|
||||||
|
highlight_candidate_id, elapsed,
|
||||||
|
)
|
||||||
|
return highlight_candidate_id
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
session.rollback()
|
||||||
|
logger.error(
|
||||||
|
"Shorts generation failed for highlight=%s: %s",
|
||||||
|
highlight_candidate_id, exc,
|
||||||
|
)
|
||||||
|
raise self.retry(exc=exc)
|
||||||
|
finally:
|
||||||
|
session.close()
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue