When the LLM splits a category group into multiple technique pages,
moments were blanket-linked to the last page in the loop, leaving all
other pages as orphans with 0 key moments (48 out of 204 pages affected).
Added moment_indices field to SynthesizedPage schema and synthesis prompt
so the LLM explicitly declares which input moments each page covers.
Stage 5 now uses these indices for targeted linking instead of the broken
blanket approach. Tags are also computed per-page from linked moments
only, fixing cross-contamination (e.g. "stereo imaging" tag appearing
on gain staging pages).
Deleted 48 orphan technique pages from the database.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In-progress stages now show:
- Live elapsed time (ticks every second) next to the active stage dot
- Run-level token count so far
Performance: wrapped StageTimeline, StatusFilter, WorkerStatus, and
RecentActivityFeed with React.memo. Memoized filteredVideos with useMemo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three resilience improvements to the pipeline:
1. LLMResponse(str) subclass carries finish_reason metadata from the LLM.
_safe_parse_llm_response detects truncation (finish=length) and raises
LLMTruncationError instead of wastefully retrying with a JSON nudge
that makes the prompt even longer.
2. Stage 4 classification now batches moments (20 per call) instead of
sending all moments in a single LLM call. Prevents context window
overflow for videos with many moments. Batch results are merged with
reindexed moment_index values.
3. run_pipeline auto-resumes from the last completed stage on error/retry
instead of always restarting from stage 2. Queries pipeline_events for
the most recent run to find completed stages. clean_reprocess trigger
still forces a full restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stage 4 classification was truncating (finish=length) because the 0.15x
output ratio underestimated token needs. Inflated all stage ratios,
bumped the buffer from 20% to 50%, raised the floor from 2048 to 4096,
and fixed _safe_parse_llm_response to forward max_tokens on retry
instead of falling back to the 65k default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace default browser checkboxes with custom styled versions that blend
with the dark UI — transparent background, muted border, cyan accent on check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Filters video list by filename or creator name as you type. Works
alongside the existing status and creator dropdown filters. Includes
a clear button when text is entered.
Stage 3 (extraction) LLM calls now show the topic group label (e.g.,
'Sound Design Basics') and Stage 5 (synthesis) calls show the category
name. Displayed as a cyan italic label in the event row between the
event type badge and model name. Helps admins understand why there are
multiple LLM calls per stage.
Data model:
- New pipeline_runs table (id, video_id, run_number, trigger, status,
started_at, finished_at, error_stage, total_tokens)
- pipeline_events gains run_id FK (nullable for backward compat)
- Alembic migration 010_add_pipeline_runs
Backend:
- run_pipeline() creates a PipelineRun, threads run_id through all stages
- _emit_event() and _make_llm_callback() accept and store run_id
- Stage 6 (final) calls _finish_run() to mark complete with token totals
- mark_pipeline_error marks run as error
- Revoke marks running runs as cancelled
- Trigger endpoints pass trigger type (manual, clean_reprocess)
- New GET /admin/pipeline/runs/{video_id} — lists runs with event counts
- GET /admin/pipeline/events supports ?run_id= filter
Frontend:
- Expanded video detail now shows RunList instead of flat EventLog
- Each run is a collapsible card showing: run number, trigger type,
status badge, timestamps, token count, event count
- Latest run auto-expands, older runs collapsed
- Legacy events (pre-run-tracking) shown as separate collapsible section
- Run cards color-coded: cyan border for running, red for error,
gray for cancelled
- EventLog accepts optional runId prop to scope events to a single run
Deleting transcript_segments left the pipeline with nothing to process —
all stages would skip immediately. Segments come from the ingest step,
not from pipeline stages 2-6. Only pipeline_events and key_moments
(pipeline output) are deleted during clean reprocess.
Previously the event log only loaded once when the row was expanded,
so mid-pipeline videos only showed start events. Now the EventLog
component accepts a status prop and polls every 10s when the video is
processing or queued, silently updating without showing a loading spinner.
- Backend: Video list now includes active_stage, active_stage_status, and
stage_started_at fields via DISTINCT ON subquery
- Backend: New GET /admin/pipeline/recent-activity endpoint returns
latest stage completions/errors with video context
- Frontend: 15-second auto-refresh with change detection — video rows
flash when status changes
- Frontend: Stage timeline dots on processing/complete/error videos
showing progress through stages 2-5, active stage pulses
- Frontend: Collapsible Recent Activity feed at top showing last 8
stage completions/errors with duration and creator
- Frontend: Bulk operation scrollable log showing per-video results
as they complete
- Frontend: Auto-refresh checkbox toggle in header
- Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that
deletes pipeline_events, key_moments, transcript_segments, and Qdrant
vectors before retriggering the pipeline
- Backend: QdrantManager.delete_by_video_id() for vector cleanup
- Frontend: Creator filter dropdown on pipeline admin page
- Frontend: Checkbox selection column with select-all
- Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess
actions, sequential dispatch with progress bar, cancel support
- Bulk dispatch uses 500ms delay between requests to avoid slamming API