Commit graph

59 commits

Author SHA1 Message Date
jlightner
96491ac70a fix: LLM config, task time limits, base_url, fallback model, debug Redis caching
- Add BASE_URL setting to config.py, replace hardcoded localhost:8096 in notifications
- Fix LLM_FALLBACK_MODEL default from fyn-llm-agent-chat to qwen2.5:7b
- Fix docker-compose LLM_FALLBACK_MODEL to use env var with correct default
- Add BASE_URL env var to API and worker in docker-compose.yml
- Add soft_time_limit/time_limit to all pipeline stage tasks (prevent stuck workers)
- Cache Redis connection in _is_debug_mode() instead of creating per-call
- Remove duplicate test files in backend/tests/notifications/
2026-04-05 06:01:54 +00:00
jlightner
4854dad086 feat: Ran manual chat evaluation against live endpoint, documented qual…
- ".gsd/milestones/M025/slices/S09/S09-QUALITY-REPORT.md"
- "backend/pipeline/quality/results/chat_eval_baseline.json"

GSD-Task: S09/T03
2026-04-04 14:50:44 +00:00
jlightner
846db2aad5 test: Created chat-specific LLM-as-judge scorer (5 dimensions), SSE-par…
- "backend/pipeline/quality/chat_scorer.py"
- "backend/pipeline/quality/chat_eval.py"
- "backend/pipeline/quality/fixtures/chat_test_suite.yaml"
- "backend/pipeline/quality/__main__.py"

GSD-Task: S09/T01
2026-04-04 14:43:52 +00:00
jlightner
a60f4074dc chore: Add GET/PUT shorts-template admin endpoints, collapsible templat…
- "backend/routers/creators.py"
- "backend/schemas.py"
- "frontend/src/api/templates.ts"
- "frontend/src/pages/HighlightQueue.tsx"
- "frontend/src/pages/HighlightQueue.module.css"
- "backend/routers/shorts.py"
- "backend/pipeline/stages.py"
- "frontend/src/api/shorts.ts"

GSD-Task: S04/T03
2026-04-04 11:25:29 +00:00
jlightner
fa493e2640 feat: Built ffmpeg-based card renderer with concat demuxer pipeline and…
- "backend/pipeline/card_renderer.py"
- "backend/pipeline/shorts_generator.py"
- "backend/pipeline/stages.py"
- "backend/models.py"
- "alembic/versions/028_add_shorts_template.py"
- "backend/pipeline/test_card_renderer.py"

GSD-Task: S04/T02
2026-04-04 11:17:38 +00:00
jlightner
125983588d feat: Created ASS subtitle generator with karaoke word-by-word highligh…
- "backend/pipeline/caption_generator.py"
- "backend/pipeline/shorts_generator.py"
- "backend/pipeline/stages.py"
- "backend/models.py"
- "alembic/versions/027_add_captions_enabled.py"
- "backend/pipeline/test_caption_generator.py"

GSD-Task: S04/T01
2026-04-04 11:12:19 +00:00
jlightner
5f4b960dc1 feat: Added share_token column with migration 026, wired token generati…
- "backend/models.py"
- "alembic/versions/026_add_share_token.py"
- "backend/pipeline/stages.py"
- "backend/routers/shorts.py"
- "backend/routers/shorts_public.py"
- "backend/main.py"

GSD-Task: S01/T01
2026-04-04 10:33:00 +00:00
jlightner
0007528e77 feat: Added shorts_generator.py with 3 format presets and stage_generat…
- "backend/pipeline/shorts_generator.py"
- "backend/pipeline/stages.py"

GSD-Task: S03/T02
2026-04-04 09:47:40 +00:00
jlightner
2d9076ae92 feat: Added personality extraction pipeline: prompt template, 3-tier tr…
- "prompts/personality_extraction.txt"
- "backend/pipeline/stages.py"
- "backend/schemas.py"
- "backend/routers/admin.py"

GSD-Task: S06/T02
2026-04-04 08:28:18 +00:00
jlightner
fb6a4cc58a feat: Wired word-timing extraction into stage_highlight_detection — 62…
- "backend/pipeline/stages.py"
- ".gsd/KNOWLEDGE.md"

GSD-Task: S05/T02
2026-04-04 08:11:32 +00:00
jlightner
27c5f4866b test: Added 3 audio proxy scoring functions, extract_word_timings utili…
- "backend/pipeline/highlight_scorer.py"
- "backend/pipeline/highlight_schemas.py"
- "backend/pipeline/test_highlight_scorer.py"

GSD-Task: S05/T01
2026-04-04 08:05:22 +00:00
jlightner
6f12d5a240 feat: Wired stage_highlight_detection Celery task with bulk upsert, 4 a…
- "backend/pipeline/stages.py"
- "backend/routers/highlights.py"
- "backend/main.py"

GSD-Task: S04/T03
2026-04-04 05:36:10 +00:00
jlightner
2d7b812c6a test: Implemented pure-function scoring engine with 7 weighted dimensio…
- "backend/pipeline/highlight_scorer.py"
- "backend/pipeline/test_highlight_scorer.py"

GSD-Task: S04/T02
2026-04-04 05:33:04 +00:00
jlightner
289e707799 feat: Added HighlightCandidate ORM model, Alembic migration 019, and Py…
- "backend/models.py"
- "alembic/versions/019_add_highlight_candidates.py"
- "backend/pipeline/highlight_schemas.py"

GSD-Task: S04/T01
2026-04-04 05:30:36 +00:00
jlightner
17b43d9778 feat: Added LightRAG /query/data as primary search engine with file_sou…
- "backend/config.py"
- "backend/search_service.py"

GSD-Task: S01/T01
2026-04-04 04:44:24 +00:00
jlightner
906b6491fe fix: static 96k max_tokens for all pipeline stages — dynamic estimator was truncating thinking model output
The dynamic token estimator calculated max_tokens from input size × stage ratio,
which produced ~9k for stage 5 compose calls. Thinking models consume unpredictable
budget for internal reasoning, leaving 0 visible output tokens.

Changed: hard_limit 32768→96000, estimate_max_tokens now returns hard_limit directly.
2026-04-03 08:18:28 +00:00
jlightner
a16559e668 fix: add /app to sys.path for Celery forked workers importing services.avatar 2026-04-03 05:58:14 +00:00
jlightner
89ef2751fa feat: Added IntersectionObserver scroll-spy to ToC highlighting the act…
- "frontend/src/components/TableOfContents.tsx"
- "frontend/src/App.css"

GSD-Task: S04/T02
2026-04-03 05:54:14 +00:00
jlightner
6940f172a3 fix: Serialize BodySection Pydantic models to dicts before JSONB storage
Stage 5 parses LLM output into list[BodySection] (Pydantic models) but
SQLAlchemy's JSONB column needs plain dicts. Added _serialize_body_sections()
helper that calls .model_dump() on each BodySection before DB write.
Fixes 'Object of type BodySection is not JSON serializable' errors.
2026-04-03 03:38:32 +00:00
jlightner
57b8705e26 feat: Added per-section embedding to stage 6 for v2 technique pages wit…
- "backend/schemas.py"
- "backend/pipeline/stages.py"
- "backend/pipeline/qdrant_client.py"
- "backend/search_service.py"
- "backend/pipeline/test_section_embedding.py"

GSD-Task: S07/T01
2026-04-03 02:12:56 +00:00
jlightner
7070ef3f51 test: Added 12 unit tests covering compose prompt construction, branchi…
- "backend/pipeline/test_compose_pipeline.py"

GSD-Task: S04/T02
2026-04-03 01:33:16 +00:00
jlightner
d709c9edce feat: Added _build_compose_user_prompt(), _compose_into_existing(), and…
- "backend/pipeline/stages.py"

GSD-Task: S04/T01
2026-04-03 01:29:21 +00:00
jlightner
5cd7db8938 test: 16 unit tests covering compose prompt XML structure, citation off…
- "backend/pipeline/test_harness_compose.py"
- ".gsd/milestones/M014/slices/S02/tasks/T03-SUMMARY.md"

GSD-Task: S02/T03
2026-04-03 01:08:41 +00:00
jlightner
efe6d7197c test: Added compose subcommand with build_compose_prompt(), run_compose…
- "backend/pipeline/test_harness.py"

GSD-Task: S02/T02
2026-04-03 01:05:25 +00:00
jlightner
44197f550c test: Updated test_harness.py word-count/section-count logic for list[B…
- "backend/pipeline/test_harness.py"
- "backend/pipeline/test_harness_v2_format.py"

GSD-Task: S01/T03
2026-04-03 00:54:27 +00:00
jlightner
15dcab201a test: Added BodySection/BodySubSection schema models, changed Synthesiz…
- "backend/pipeline/schemas.py"
- "backend/pipeline/citation_utils.py"
- "backend/pipeline/test_citation_utils.py"

GSD-Task: S01/T01
2026-04-03 00:50:30 +00:00
jlightner
f0d0b8ac1a feat: add pipeline iteration tooling — offline test harness, stage re-runs, chunking inspector
Drops prompt iteration cycles from 20-30 min to under 5 min by enabling
stage-isolated re-runs and offline prompt testing against exported fixtures.

Phase 1: Offline prompt test harness
- export_fixture.py: export stage 5 inputs from DB to reusable JSON fixtures
- test_harness.py: run synthesis offline with any prompt, no Docker needed
- promote subcommand: deploy winning prompts with backup and optional git commit

Phase 2: Classification data persistence
- Dual-write classification to PostgreSQL + Redis (fixes 24hr TTL data loss)
- Clean retrigger now clears Redis cache keys (fixes stale data bug)
- Alembic migration 011: classification_data JSONB column + stage_rerun enum

Phase 3: Stage-isolated re-run
- run_single_stage Celery task with prerequisite validation and prompt overrides
- _load_prompt supports per-video Redis overrides for testing custom prompts
- POST /admin/pipeline/rerun-stage/{video_id}/{stage_name} endpoint
- Frontend: Re-run Stage modal with stage selector and prompt override textarea

Phase 4: Chunking inspector
- GET /admin/pipeline/chunking/{video_id} returns topic boundaries,
  classifications, and synthesis group breakdowns
- Frontend: collapsible Chunking Inspector panel per video

Phase 5: Prompt deployment & stale data cleanup
- GET /admin/pipeline/stale-pages detects pages from older prompts
- POST /admin/pipeline/bulk-resynthesize re-runs a stage on all completed videos
- Frontend: stale pages indicator badge with one-click bulk re-synth

Phase 6: Automated iteration foundation
- Quality CLI --video-id flag auto-exports fixture from DB
- POST /admin/pipeline/optimize-prompt/{stage} dispatches optimization as Celery task

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:47:46 +00:00
jlightner
29f6e74b4f pipeline: run stages inline instead of Celery chain dispatch
Each video now completes all stages (2→6) before the worker picks up the
next queued video. Previously, dispatching celery_chain for multiple videos
caused interleaved execution — nothing finished until everything went through
all stages. Now run_pipeline calls each stage function synchronously within
the same worker task, so videos complete linearly and efficiently.
2026-04-01 11:39:21 +00:00
jlightner
d75ec80c98 optimize: Stage 5 synthesis prompt — round 0 winner (0.95→1.0 composite)
Applied first optimization result: tighter voice preservation instructions,
improved section flow guidance, trimmed redundant metadata instructions.
13382→11123 chars (-17%).
2026-04-01 10:15:24 +00:00
jlightner
18520f7936 feat: Generalized OptimizationLoop to stages 2-5 with per-stage fixture…
- "backend/pipeline/quality/optimizer.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/fixtures/sample_segments.json"
- "backend/pipeline/quality/fixtures/sample_topic_group.json"
- "backend/pipeline/quality/fixtures/sample_classifications.json"

GSD-Task: S04/T02
2026-04-01 09:24:42 +00:00
jlightner
e740798f7c feat: Added STAGE_CONFIGS registry (stages 2-5) with per-stage rubrics,…
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/variant_generator.py"

GSD-Task: S04/T01
2026-04-01 09:20:24 +00:00
jlightner
84e85a52b3 perf: Added optimize CLI subcommand with leaderboard table, ASCII traje…
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/results/.gitkeep"

GSD-Task: S03/T02
2026-04-01 09:10:42 +00:00
jlightner
c6cbb09dd3 feat: Created PromptVariantGenerator (LLM-powered prompt mutation) and…
- "backend/pipeline/quality/variant_generator.py"
- "backend/pipeline/quality/optimizer.py"

GSD-Task: S03/T01
2026-04-01 09:08:01 +00:00
jlightner
15a7afdaff feat: Added VoiceDial class with 3-band prompt modification and ScoreRu…
- "backend/pipeline/quality/voice_dial.py"
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/__main__.py"

GSD-Task: S02/T02
2026-04-01 08:57:07 +00:00
jlightner
5223772756 feat: Built ScoreRunner with 5-dimension LLM-as-judge scoring rubric, C…
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/fixtures/sample_moments.json"
- "backend/pipeline/quality/fixtures/__init__.py"

GSD-Task: S02/T01
2026-04-01 08:53:40 +00:00
jlightner
c27cd77ae6 test: Built pipeline.quality package with FitnessRunner (9 tests, 4 cat…
- "backend/pipeline/quality/__init__.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/fitness.py"

GSD-Task: S01/T01
2026-04-01 08:45:05 +00:00
jlightner
fd1fd6c6f9 fix: Pipeline LLM audit — temperature=0, realistic token ratios, structured request_params
Audit findings & fixes:
- temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON
- llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768
- Output ratio estimates were 5-30x too high (based on actual pipeline data):
  stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8
- request_params now structured as api_params (what's sent to LLM) vs pipeline_config
  (internal estimator settings) — no more ambiguous 'hard_limit' in request params
- temperature=0.0 sent on both primary and fallback endpoints
2026-04-01 07:20:09 +00:00
jlightner
d58194ff96 feat: Store LLM request params (max_tokens, model, modality) in pipeline events
- _make_llm_callback now accepts request_params dict
- All 6 LLM call sites pass max_tokens, model_override, modality, response_model, hard_limit
- request_params stored in payload JSONB on every llm_call event (always, not just debug mode)
- Frontend JSON export includes full payload + request_params at top level
- DebugPayloadViewer shows 'Request Params' section even with debug mode off
- Answers whether max_tokens is actually being sent on pipeline requests
2026-04-01 07:01:57 +00:00
jlightner
a673e641b8 fix: Parallel search with match_context, deterministic Qdrant IDs, raised embedding timeout
- Search now runs semantic + keyword in parallel, merges and deduplicates
- Keyword results always included with match_context explaining WHY matched
- Semantic results filtered by minimum score threshold (0.45)
- match_context shows 'Creator: X', 'Tag: Y', 'Title match', 'Content: ...'
- Qdrant points use deterministic uuid5 IDs (no more duplicates on reindex)
- Embedding timeout raised from 300ms to 2s (Ollama needs it)
- _enrich_qdrant_results reads creator_name from payload before DB fallback
- Frontend displays match_context as highlighted bar on search result cards
2026-04-01 06:54:34 +00:00
jlightner
8272da430b fix: Variable ordering bug and stage 5 truncation recovery
Two fixes:

1. page_moment_indices was referenced before assignment in the page
   persist loop — moved assignment to top of loop body. This caused
   "cannot access local variable" errors on every stage 5 run.

2. Stage 5 now catches LLMTruncationError and splits the chunk in
   half for retry, instead of blindly retrying the same oversized
   prompt. This handles categories where synthesis output exceeds
   the model context window.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 01:51:28 -05:00
jlightner
fa82f1079a feat: Enriched Qdrant embedding text with creator_name/tags and added r…
- "backend/pipeline/stages.py"
- "backend/pipeline/qdrant_client.py"
- "backend/routers/pipeline.py"

GSD-Task: S01/T02
2026-04-01 06:41:52 +00:00
jlightner
c344b8c670 fix: Moment-to-page linking via moment_indices in stage 5 synthesis
When the LLM splits a category group into multiple technique pages,
moments were blanket-linked to the last page in the loop, leaving all
other pages as orphans with 0 key moments (48 out of 204 pages affected).

Added moment_indices field to SynthesizedPage schema and synthesis prompt
so the LLM explicitly declares which input moments each page covers.
Stage 5 now uses these indices for targeted linking instead of the broken
blanket approach. Tags are also computed per-page from linked moments
only, fixing cross-contamination (e.g. "stereo imaging" tag appearing
on gain staging pages).

Deleted 48 orphan technique pages from the database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:34:37 -05:00
jlightner
e80094dc05 feat: Truncation detection, batched classification, and pipeline auto-resume
Three resilience improvements to the pipeline:

1. LLMResponse(str) subclass carries finish_reason metadata from the LLM.
   _safe_parse_llm_response detects truncation (finish=length) and raises
   LLMTruncationError instead of wastefully retrying with a JSON nudge
   that makes the prompt even longer.

2. Stage 4 classification now batches moments (20 per call) instead of
   sending all moments in a single LLM call. Prevents context window
   overflow for videos with many moments. Batch results are merged with
   reindexed moment_index values.

3. run_pipeline auto-resumes from the last completed stage on error/retry
   instead of always restarting from stage 2. Queries pipeline_events for
   the most recent run to find completed stages. clean_reprocess trigger
   still forces a full restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:48:19 -05:00
jlightner
5984129e25 fix: Inflate LLM token estimates and forward max_tokens on retry
Stage 4 classification was truncating (finish=length) because the 0.15x
output ratio underestimated token needs. Inflated all stage ratios,
bumped the buffer from 20% to 50%, raised the floor from 2048 to 4096,
and fixed _safe_parse_llm_response to forward max_tokens on retry
instead of falling back to the 65k default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:28:58 -05:00
jlightner
c1583820ea feat: Add context labels to multi-call pipeline stages
Stage 3 (extraction) LLM calls now show the topic group label (e.g.,
'Sound Design Basics') and Stage 5 (synthesis) calls show the category
name. Displayed as a cyan italic label in the event row between the
event type badge and model name. Helps admins understand why there are
multiple LLM calls per stage.
2026-03-31 17:27:40 +00:00
jlightner
c2db9aa011 feat: Pipeline runs — per-execution tracking with run-scoped events
Data model:
- New pipeline_runs table (id, video_id, run_number, trigger, status,
  started_at, finished_at, error_stage, total_tokens)
- pipeline_events gains run_id FK (nullable for backward compat)
- Alembic migration 010_add_pipeline_runs

Backend:
- run_pipeline() creates a PipelineRun, threads run_id through all stages
- _emit_event() and _make_llm_callback() accept and store run_id
- Stage 6 (final) calls _finish_run() to mark complete with token totals
- mark_pipeline_error marks run as error
- Revoke marks running runs as cancelled
- Trigger endpoints pass trigger type (manual, clean_reprocess)
- New GET /admin/pipeline/runs/{video_id} — lists runs with event counts
- GET /admin/pipeline/events supports ?run_id= filter

Frontend:
- Expanded video detail now shows RunList instead of flat EventLog
- Each run is a collapsible card showing: run number, trigger type,
  status badge, timestamps, token count, event count
- Latest run auto-expands, older runs collapsed
- Legacy events (pre-run-tracking) shown as separate collapsible section
- Run cards color-coded: cyan border for running, red for error,
  gray for cancelled
- EventLog accepts optional runId prop to scope events to a single run
2026-03-31 17:13:41 +00:00
jlightner
e17132bd60 feat: Add bulk pipeline reprocessing — creator filter, multi-select, clean retrigger
- Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that
  deletes pipeline_events, key_moments, transcript_segments, and Qdrant
  vectors before retriggering the pipeline
- Backend: QdrantManager.delete_by_video_id() for vector cleanup
- Frontend: Creator filter dropdown on pipeline admin page
- Frontend: Checkbox selection column with select-all
- Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess
  actions, sequential dispatch with progress bar, cancel support
- Bulk dispatch uses 500ms delay between requests to avoid slamming API
2026-03-31 15:24:59 +00:00
jlightner
af250a6f5d feat: Added technique_page_slug to search results across Qdrant payload…
- "backend/schemas.py"
- "backend/search_service.py"
- "backend/pipeline/stages.py"
- "backend/pipeline/qdrant_client.py"
- "backend/tests/test_search.py"

GSD-Task: S01/T01
2026-03-31 05:02:48 +00:00
jlightner
720c2f501f feat: meaningful pipeline status lifecycle — Not Started → Queued → In Progress → Complete/Errored
Replace stage-level statuses (pending/transcribed/extracted/published) with
user-meaningful lifecycle states (not_started/queued/processing/error/complete).

Backend:
- ProcessingStatus enum: not_started, queued, processing, error, complete
- run_pipeline sets 'processing' before dispatching Celery chain
- stage5 sets 'complete' (was 'published')
- stage3 no longer sets intermediate status (stays 'processing')
- New mark_pipeline_error task wired as link_error on chain
- _set_error_status helper marks video on permanent failure
- Ingest sets 'queued' (was 'transcribed')
- Migration 008 renames all existing values

Frontend:
- StatusFilter shows fixed-order lifecycle tabs: Not Started | Queued | In Progress | Errored | Complete
- Per-video badges show friendly labels instead of raw enum values
- Badge colors mapped to new statuses
2026-03-31 02:43:49 +00:00
jlightner
52e7e3bbc2 feat: remove review workflow — unused gate that blocked nothing
773 key moments sat at 'pending' with 0 approved/edited/rejected.
review_status was never checked by any public-facing query — all content
was always visible regardless of review state.

Removed:
- backend/routers/review.py (10 endpoints)
- backend/tests/test_review.py
- frontend ReviewQueue, MomentDetail pages
- frontend client.ts (review-only API client)
- frontend ModeToggle, StatusBadge components
- Review link from AdminDropdown, Moments link from pipeline rows
- ReviewStatus, PageReviewStatus enums from models
- review_mode config flag
- review_status columns (migration 007)
- ~80 lines of mode-toggle CSS

Pipeline now always sets processing_status to 'published'.
Migration 007 drops columns, enums, and migrates 'reviewed' → 'published'.
2026-03-31 02:34:12 +00:00