When the LLM splits a category group into multiple technique pages,
moments were blanket-linked to the last page in the loop, leaving all
other pages as orphans with 0 key moments (48 out of 204 pages affected).
Added moment_indices field to SynthesizedPage schema and synthesis prompt
so the LLM explicitly declares which input moments each page covers.
Stage 5 now uses these indices for targeted linking instead of the broken
blanket approach. Tags are also computed per-page from linked moments
only, fixing cross-contamination (e.g. "stereo imaging" tag appearing
on gain staging pages).
Deleted 48 orphan technique pages from the database.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three resilience improvements to the pipeline:
1. LLMResponse(str) subclass carries finish_reason metadata from the LLM.
_safe_parse_llm_response detects truncation (finish=length) and raises
LLMTruncationError instead of wastefully retrying with a JSON nudge
that makes the prompt even longer.
2. Stage 4 classification now batches moments (20 per call) instead of
sending all moments in a single LLM call. Prevents context window
overflow for videos with many moments. Batch results are merged with
reindexed moment_index values.
3. run_pipeline auto-resumes from the last completed stage on error/retry
instead of always restarting from stage 2. Queries pipeline_events for
the most recent run to find completed stages. clean_reprocess trigger
still forces a full restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stage 4 classification was truncating (finish=length) because the 0.15x
output ratio underestimated token needs. Inflated all stage ratios,
bumped the buffer from 20% to 50%, raised the floor from 2048 to 4096,
and fixed _safe_parse_llm_response to forward max_tokens on retry
instead of falling back to the 65k default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stage 3 (extraction) LLM calls now show the topic group label (e.g.,
'Sound Design Basics') and Stage 5 (synthesis) calls show the category
name. Displayed as a cyan italic label in the event row between the
event type badge and model name. Helps admins understand why there are
multiple LLM calls per stage.
Data model:
- New pipeline_runs table (id, video_id, run_number, trigger, status,
started_at, finished_at, error_stage, total_tokens)
- pipeline_events gains run_id FK (nullable for backward compat)
- Alembic migration 010_add_pipeline_runs
Backend:
- run_pipeline() creates a PipelineRun, threads run_id through all stages
- _emit_event() and _make_llm_callback() accept and store run_id
- Stage 6 (final) calls _finish_run() to mark complete with token totals
- mark_pipeline_error marks run as error
- Revoke marks running runs as cancelled
- Trigger endpoints pass trigger type (manual, clean_reprocess)
- New GET /admin/pipeline/runs/{video_id} — lists runs with event counts
- GET /admin/pipeline/events supports ?run_id= filter
Frontend:
- Expanded video detail now shows RunList instead of flat EventLog
- Each run is a collapsible card showing: run number, trigger type,
status badge, timestamps, token count, event count
- Latest run auto-expands, older runs collapsed
- Legacy events (pre-run-tracking) shown as separate collapsible section
- Run cards color-coded: cyan border for running, red for error,
gray for cancelled
- EventLog accepts optional runId prop to scope events to a single run
- Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that
deletes pipeline_events, key_moments, transcript_segments, and Qdrant
vectors before retriggering the pipeline
- Backend: QdrantManager.delete_by_video_id() for vector cleanup
- Frontend: Creator filter dropdown on pipeline admin page
- Frontend: Checkbox selection column with select-all
- Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess
actions, sequential dispatch with progress bar, cancel support
- Bulk dispatch uses 500ms delay between requests to avoid slamming API
773 key moments sat at 'pending' with 0 approved/edited/rejected.
review_status was never checked by any public-facing query — all content
was always visible regardless of review state.
Removed:
- backend/routers/review.py (10 endpoints)
- backend/tests/test_review.py
- frontend ReviewQueue, MomentDetail pages
- frontend client.ts (review-only API client)
- frontend ModeToggle, StatusBadge components
- Review link from AdminDropdown, Moments link from pipeline rows
- ReviewStatus, PageReviewStatus enums from models
- review_mode config flag
- review_status columns (migration 007)
- ~80 lines of mode-toggle CSS
Pipeline now always sets processing_status to 'published'.
Migration 007 drops columns, enums, and migrates 'reviewed' → 'published'.
Auto-mode commit 7aa33cd accidentally deleted 78 files (14,814 lines) during M005
execution. Subsequent commits rebuilt some frontend files but backend/, alembic/,
tests/, whisper/, docker configs, and prompts were never restored in this repo.
This commit restores the full project tree by syncing from ub01's working directory,
which has all M001-M007 features running in production containers.
Restored: backend/ (config, models, routers, database, redis, search_service, worker),
alembic/ (6 migrations), docker/ (Dockerfiles, nginx, compose), prompts/ (4 stages),
tests/, whisper/, README.md, .env.example, chrysopedia-spec.md