Commit graph

3 commits

Author SHA1 Message Date
jlightner
c6f69019cf feat: Content hash dedup and prior-page versioning
- Add content_hash (SHA-256 of transcript text) to source_videos (migration 005)
- 3-tier duplicate detection at ingest: exact filename, content hash,
  then normalized filename + duration (handles yt-dlp re-downloads)
- Snapshot prior technique_page_ids to Redis before pipeline dispatch
- Stage 5 matches prior pages by creator+category before slug fallback,
  enabling version snapshots on reprocessing even when LLM generates
  different slugs
- Expose content_hash in API responses and admin pipeline dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 05:55:27 -05:00
jlightner
910e945d9c feat: Wired automatic run_pipeline.delay() dispatch after ingest commit…
- "backend/routers/pipeline.py"
- "backend/routers/ingest.py"
- "backend/main.py"

GSD-Task: S03/T04
2026-03-29 22:41:02 +00:00
jlightner
5bfeb50716 feat: Created POST /api/v1/ingest endpoint that accepts Whisper transcr…
- "backend/routers/ingest.py"
- "backend/schemas.py"
- "backend/requirements.txt"
- "backend/main.py"

GSD-Task: S02/T01
2026-03-29 22:09:46 +00:00