chrysopedia

Author	SHA1	Message	Date
jlightner	fa82f1079a	feat: Enriched Qdrant embedding text with creator_name/tags and added r… - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/routers/pipeline.py" GSD-Task: S01/T02	2026-04-01 06:41:52 +00:00
jlightner	c344b8c670	fix: Moment-to-page linking via moment_indices in stage 5 synthesis When the LLM splits a category group into multiple technique pages, moments were blanket-linked to the last page in the loop, leaving all other pages as orphans with 0 key moments (48 out of 204 pages affected). Added moment_indices field to SynthesizedPage schema and synthesis prompt so the LLM explicitly declares which input moments each page covers. Stage 5 now uses these indices for targeted linking instead of the broken blanket approach. Tags are also computed per-page from linked moments only, fixing cross-contamination (e.g. "stereo imaging" tag appearing on gain staging pages). Deleted 48 orphan technique pages from the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:34:37 -05:00
jlightner	e80094dc05	feat: Truncation detection, batched classification, and pipeline auto-resume Three resilience improvements to the pipeline: 1. LLMResponse(str) subclass carries finish_reason metadata from the LLM. _safe_parse_llm_response detects truncation (finish=length) and raises LLMTruncationError instead of wastefully retrying with a JSON nudge that makes the prompt even longer. 2. Stage 4 classification now batches moments (20 per call) instead of sending all moments in a single LLM call. Prevents context window overflow for videos with many moments. Batch results are merged with reindexed moment_index values. 3. run_pipeline auto-resumes from the last completed stage on error/retry instead of always restarting from stage 2. Queries pipeline_events for the most recent run to find completed stages. clean_reprocess trigger still forces a full restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 17:48:19 -05:00
jlightner	5984129e25	fix: Inflate LLM token estimates and forward max_tokens on retry Stage 4 classification was truncating (finish=length) because the 0.15x output ratio underestimated token needs. Inflated all stage ratios, bumped the buffer from 20% to 50%, raised the floor from 2048 to 4096, and fixed _safe_parse_llm_response to forward max_tokens on retry instead of falling back to the 65k default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 17:28:58 -05:00
jlightner	c1583820ea	feat: Add context labels to multi-call pipeline stages Stage 3 (extraction) LLM calls now show the topic group label (e.g., 'Sound Design Basics') and Stage 5 (synthesis) calls show the category name. Displayed as a cyan italic label in the event row between the event type badge and model name. Helps admins understand why there are multiple LLM calls per stage.	2026-03-31 17:27:40 +00:00
jlightner	c2db9aa011	feat: Pipeline runs — per-execution tracking with run-scoped events Data model: - New pipeline_runs table (id, video_id, run_number, trigger, status, started_at, finished_at, error_stage, total_tokens) - pipeline_events gains run_id FK (nullable for backward compat) - Alembic migration 010_add_pipeline_runs Backend: - run_pipeline() creates a PipelineRun, threads run_id through all stages - _emit_event() and _make_llm_callback() accept and store run_id - Stage 6 (final) calls _finish_run() to mark complete with token totals - mark_pipeline_error marks run as error - Revoke marks running runs as cancelled - Trigger endpoints pass trigger type (manual, clean_reprocess) - New GET /admin/pipeline/runs/{video_id} — lists runs with event counts - GET /admin/pipeline/events supports ?run_id= filter Frontend: - Expanded video detail now shows RunList instead of flat EventLog - Each run is a collapsible card showing: run number, trigger type, status badge, timestamps, token count, event count - Latest run auto-expands, older runs collapsed - Legacy events (pre-run-tracking) shown as separate collapsible section - Run cards color-coded: cyan border for running, red for error, gray for cancelled - EventLog accepts optional runId prop to scope events to a single run	2026-03-31 17:13:41 +00:00
jlightner	e17132bd60	feat: Add bulk pipeline reprocessing — creator filter, multi-select, clean retrigger - Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that deletes pipeline_events, key_moments, transcript_segments, and Qdrant vectors before retriggering the pipeline - Backend: QdrantManager.delete_by_video_id() for vector cleanup - Frontend: Creator filter dropdown on pipeline admin page - Frontend: Checkbox selection column with select-all - Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess actions, sequential dispatch with progress bar, cancel support - Bulk dispatch uses 500ms delay between requests to avoid slamming API	2026-03-31 15:24:59 +00:00
jlightner	af250a6f5d	feat: Added technique_page_slug to search results across Qdrant payload… - "backend/schemas.py" - "backend/search_service.py" - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/tests/test_search.py" GSD-Task: S01/T01	2026-03-31 05:02:48 +00:00
jlightner	720c2f501f	feat: meaningful pipeline status lifecycle — Not Started → Queued → In Progress → Complete/Errored Replace stage-level statuses (pending/transcribed/extracted/published) with user-meaningful lifecycle states (not_started/queued/processing/error/complete). Backend: - ProcessingStatus enum: not_started, queued, processing, error, complete - run_pipeline sets 'processing' before dispatching Celery chain - stage5 sets 'complete' (was 'published') - stage3 no longer sets intermediate status (stays 'processing') - New mark_pipeline_error task wired as link_error on chain - _set_error_status helper marks video on permanent failure - Ingest sets 'queued' (was 'transcribed') - Migration 008 renames all existing values Frontend: - StatusFilter shows fixed-order lifecycle tabs: Not Started \| Queued \| In Progress \| Errored \| Complete - Per-video badges show friendly labels instead of raw enum values - Badge colors mapped to new statuses	2026-03-31 02:43:49 +00:00
jlightner	52e7e3bbc2	feat: remove review workflow — unused gate that blocked nothing 773 key moments sat at 'pending' with 0 approved/edited/rejected. review_status was never checked by any public-facing query — all content was always visible regardless of review state. Removed: - backend/routers/review.py (10 endpoints) - backend/tests/test_review.py - frontend ReviewQueue, MomentDetail pages - frontend client.ts (review-only API client) - frontend ModeToggle, StatusBadge components - Review link from AdminDropdown, Moments link from pipeline rows - ReviewStatus, PageReviewStatus enums from models - review_mode config flag - review_status columns (migration 007) - ~80 lines of mode-toggle CSS Pipeline now always sets processing_status to 'published'. Migration 007 drops columns, enums, and migrates 'reviewed' → 'published'.	2026-03-31 02:34:12 +00:00
jlightner	4b0914b12b	fix: restore complete project tree from ub01 canonical state Auto-mode commit `7aa33cd` accidentally deleted 78 files (14,814 lines) during M005 execution. Subsequent commits rebuilt some frontend files but backend/, alembic/, tests/, whisper/, docker configs, and prompts were never restored in this repo. This commit restores the full project tree by syncing from ub01's working directory, which has all M001-M007 features running in production containers. Restored: backend/ (config, models, routers, database, redis, search_service, worker), alembic/ (6 migrations), docker/ (Dockerfiles, nginx, compose), prompts/ (4 stages), tests/, whisper/, README.md, .env.example, chrysopedia-spec.md	2026-03-31 02:10:41 +00:00
jlightner	7aa33cd17f	fix: Fixed syntax errors in pipeline event instrumentation — _emit_even… - "backend/pipeline/stages.py" GSD-Task: S01/T01	2026-03-30 08:27:53 +00:00
jlightner	5c3e9b83c8	feat: Added TechniquePageVersion model, Alembic migration 002, pipeline… - "backend/models.py" - "alembic/versions/002_technique_page_versions.py" - "backend/pipeline/stages.py" GSD-Task: S04/T01	2026-03-30 07:27:40 +00:00
jlightner	0b0ca598b4	feat: Log LLM response token usage (prompt/completion/total, content_len, finish_reason)	2026-03-30 06:15:24 +00:00
jlightner	cf759f3739	fix: Add max_tokens=16384 to LLM requests (OpenWebUI defaults to 1000, truncating pipeline JSON)	2026-03-30 04:08:29 +00:00
jlightner	4aa4b08a7f	feat: Per-stage LLM model routing with thinking modality and think-tag stripping - Added 8 per-stage config fields: llm_stage{2-5}_model and llm_stage{2-5}_modality - LLMClient.complete() accepts modality ('chat'/'thinking') and model_override - Thinking modality: appends JSON instructions to system prompt, strips <think> tags - strip_think_tags() handles multiline, multiple blocks, and edge cases - Pipeline stages 2-5 read per-stage config and pass to LLM client - Updated .env.example with per-stage model/modality documentation - All 59 tests pass including new think-tag stripping test	2026-03-30 02:12:14 +00:00
jlightner	5c46d1e922	feat: Created sync EmbeddingClient, QdrantManager with idempotent colle… - "backend/pipeline/embedding_client.py" - "backend/pipeline/qdrant_client.py" - "backend/pipeline/stages.py" GSD-Task: S03/T03	2026-03-29 22:39:04 +00:00
jlightner	b5635a09db	feat: Created 4 prompt templates and implemented 5 Celery tasks (stages… - "prompts/stage2_segmentation.txt" - "prompts/stage3_extraction.txt" - "prompts/stage4_classification.txt" - "prompts/stage5_synthesis.txt" - "backend/pipeline/stages.py" - "backend/requirements.txt" GSD-Task: S03/T02	2026-03-29 22:36:06 +00:00
jlightner	12cc86aef9	chore: Extended Settings with 12 LLM/embedding/Qdrant config fields, cr… - "backend/config.py" - "backend/worker.py" - "backend/pipeline/schemas.py" - "backend/pipeline/llm_client.py" - "backend/requirements.txt" - "backend/pipeline/__init__.py" - "backend/pipeline/stages.py" GSD-Task: S03/T01	2026-03-29 22:30:31 +00:00

19 commits