GET /admin/pipeline/embed-status/{video_id} returns technique pages
linked to the video, Qdrant vector count, and last stage 6 event —
provides data for the currently non-functional Embed tab in admin UI.
- Added avatar_url, avatar_source, avatar_fetched_at columns to Creator
model with Alembic migration 014
- New backend/services/avatar.py — TheAudioDB lookup with token-based
name similarity scoring and genre overlap bonus
- New Celery task fetch_creator_avatar for background avatar fetching
- Admin endpoints: POST /creators/{id}/fetch-avatar (single) and
POST /creators/fetch-all-avatars (batch for missing avatars)
- Wired avatar_url into CreatorRead, CreatorInfo, and CreatorBrowseItem
schemas so all API responses include avatar data
- Rewrote stale-pages endpoint to use a single query with row_number
window function instead of per-page queries for latest version + creator
- Added optional offset/limit/status/creator_id params to videos endpoint
(backward compatible — defaults return all results)
- Added 1-hour Redis cache to _find_dynamic_related technique scoring
Stage 5 parses LLM output into list[BodySection] (Pydantic models) but
SQLAlchemy's JSONB column needs plain dicts. Added _serialize_body_sections()
helper that calls .model_dump() on each BodySection before DB write.
Fixes 'Object of type BodySection is not JSON serializable' errors.
Deletes all technique pages, versions, links, key moments, pipeline
events/runs, Qdrant vectors, and Redis cache while preserving creators,
videos, and transcript segments. Resets all video status to not_started.
Double-confirm dialog in the UI prevents accidental use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drops prompt iteration cycles from 20-30 min to under 5 min by enabling
stage-isolated re-runs and offline prompt testing against exported fixtures.
Phase 1: Offline prompt test harness
- export_fixture.py: export stage 5 inputs from DB to reusable JSON fixtures
- test_harness.py: run synthesis offline with any prompt, no Docker needed
- promote subcommand: deploy winning prompts with backup and optional git commit
Phase 2: Classification data persistence
- Dual-write classification to PostgreSQL + Redis (fixes 24hr TTL data loss)
- Clean retrigger now clears Redis cache keys (fixes stale data bug)
- Alembic migration 011: classification_data JSONB column + stage_rerun enum
Phase 3: Stage-isolated re-run
- run_single_stage Celery task with prerequisite validation and prompt overrides
- _load_prompt supports per-video Redis overrides for testing custom prompts
- POST /admin/pipeline/rerun-stage/{video_id}/{stage_name} endpoint
- Frontend: Re-run Stage modal with stage selector and prompt override textarea
Phase 4: Chunking inspector
- GET /admin/pipeline/chunking/{video_id} returns topic boundaries,
classifications, and synthesis group breakdowns
- Frontend: collapsible Chunking Inspector panel per video
Phase 5: Prompt deployment & stale data cleanup
- GET /admin/pipeline/stale-pages detects pages from older prompts
- POST /admin/pipeline/bulk-resynthesize re-runs a stage on all completed videos
- Frontend: stale pages indicator badge with one-click bulk re-synth
Phase 6: Automated iteration foundation
- Quality CLI --video-id flag auto-exports fixture from DB
- POST /admin/pipeline/optimize-prompt/{stage} dispatches optimization as Celery task
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each video now completes all stages (2→6) before the worker picks up the
next queued video. Previously, dispatching celery_chain for multiple videos
caused interleaved execution — nothing finished until everything went through
all stages. Now run_pipeline calls each stage function synchronously within
the same worker task, so videos complete linearly and efficiently.
Audit findings & fixes:
- temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON
- llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768
- Output ratio estimates were 5-30x too high (based on actual pipeline data):
stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8
- request_params now structured as api_params (what's sent to LLM) vs pipeline_config
(internal estimator settings) — no more ambiguous 'hard_limit' in request params
- temperature=0.0 sent on both primary and fallback endpoints
- _make_llm_callback now accepts request_params dict
- All 6 LLM call sites pass max_tokens, model_override, modality, response_model, hard_limit
- request_params stored in payload JSONB on every llm_call event (always, not just debug mode)
- Frontend JSON export includes full payload + request_params at top level
- DebugPayloadViewer shows 'Request Params' section even with debug mode off
- Answers whether max_tokens is actually being sent on pipeline requests
- Search now runs semantic + keyword in parallel, merges and deduplicates
- Keyword results always included with match_context explaining WHY matched
- Semantic results filtered by minimum score threshold (0.45)
- match_context shows 'Creator: X', 'Tag: Y', 'Title match', 'Content: ...'
- Qdrant points use deterministic uuid5 IDs (no more duplicates on reindex)
- Embedding timeout raised from 300ms to 2s (Ollama needs it)
- _enrich_qdrant_results reads creator_name from payload before DB fallback
- Frontend displays match_context as highlighted bar on search result cards