chrysopedia

Author	SHA1	Message	Date
jlightner	e740798f7c	feat: Added STAGE_CONFIGS registry (stages 2-5) with per-stage rubrics,… - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/variant_generator.py" GSD-Task: S04/T01	2026-04-01 09:20:24 +00:00
jlightner	84e85a52b3	perf: Added optimize CLI subcommand with leaderboard table, ASCII traje… - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/results/.gitkeep" GSD-Task: S03/T02	2026-04-01 09:10:42 +00:00
jlightner	c6cbb09dd3	feat: Created PromptVariantGenerator (LLM-powered prompt mutation) and… - "backend/pipeline/quality/variant_generator.py" - "backend/pipeline/quality/optimizer.py" GSD-Task: S03/T01	2026-04-01 09:08:01 +00:00
jlightner	15a7afdaff	feat: Added VoiceDial class with 3-band prompt modification and ScoreRu… - "backend/pipeline/quality/voice_dial.py" - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/__main__.py" GSD-Task: S02/T02	2026-04-01 08:57:07 +00:00
jlightner	5223772756	feat: Built ScoreRunner with 5-dimension LLM-as-judge scoring rubric, C… - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/fixtures/sample_moments.json" - "backend/pipeline/quality/fixtures/__init__.py" GSD-Task: S02/T01	2026-04-01 08:53:40 +00:00
jlightner	c27cd77ae6	test: Built pipeline.quality package with FitnessRunner (9 tests, 4 cat… - "backend/pipeline/quality/__init__.py" - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/fitness.py" GSD-Task: S01/T01	2026-04-01 08:45:05 +00:00
jlightner	fd1fd6c6f9	fix: Pipeline LLM audit — temperature=0, realistic token ratios, structured request_params Audit findings & fixes: - temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON - llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768 - Output ratio estimates were 5-30x too high (based on actual pipeline data): stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8 - request_params now structured as api_params (what's sent to LLM) vs pipeline_config (internal estimator settings) — no more ambiguous 'hard_limit' in request params - temperature=0.0 sent on both primary and fallback endpoints	2026-04-01 07:20:09 +00:00
jlightner	d58194ff96	feat: Store LLM request params (max_tokens, model, modality) in pipeline events - _make_llm_callback now accepts request_params dict - All 6 LLM call sites pass max_tokens, model_override, modality, response_model, hard_limit - request_params stored in payload JSONB on every llm_call event (always, not just debug mode) - Frontend JSON export includes full payload + request_params at top level - DebugPayloadViewer shows 'Request Params' section even with debug mode off - Answers whether max_tokens is actually being sent on pipeline requests	2026-04-01 07:01:57 +00:00
jlightner	a673e641b8	fix: Parallel search with match_context, deterministic Qdrant IDs, raised embedding timeout - Search now runs semantic + keyword in parallel, merges and deduplicates - Keyword results always included with match_context explaining WHY matched - Semantic results filtered by minimum score threshold (0.45) - match_context shows 'Creator: X', 'Tag: Y', 'Title match', 'Content: ...' - Qdrant points use deterministic uuid5 IDs (no more duplicates on reindex) - Embedding timeout raised from 300ms to 2s (Ollama needs it) - _enrich_qdrant_results reads creator_name from payload before DB fallback - Frontend displays match_context as highlighted bar on search result cards	2026-04-01 06:54:34 +00:00
jlightner	8272da430b	fix: Variable ordering bug and stage 5 truncation recovery Two fixes: 1. page_moment_indices was referenced before assignment in the page persist loop — moved assignment to top of loop body. This caused "cannot access local variable" errors on every stage 5 run. 2. Stage 5 now catches LLMTruncationError and splits the chunk in half for retry, instead of blindly retrying the same oversized prompt. This handles categories where synthesis output exceeds the model context window. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 01:51:28 -05:00
jlightner	e0c73db8ff	feat: Added sort query parameter (relevance/newest/oldest/alpha/creator… - "backend/routers/search.py" - "backend/routers/topics.py" - "backend/routers/techniques.py" - "backend/search_service.py" GSD-Task: S02/T01	2026-04-01 06:41:52 +00:00
jlightner	fa82f1079a	feat: Enriched Qdrant embedding text with creator_name/tags and added r… - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/routers/pipeline.py" GSD-Task: S01/T02	2026-04-01 06:41:52 +00:00
jlightner	84e7a9906c	feat: Refactored keyword_search to multi-token AND with cross-field mat… - "backend/search_service.py" - "backend/schemas.py" - "backend/routers/search.py" - "backend/tests/test_search.py" GSD-Task: S01/T01	2026-04-01 06:41:52 +00:00
jlightner	c344b8c670	fix: Moment-to-page linking via moment_indices in stage 5 synthesis When the LLM splits a category group into multiple technique pages, moments were blanket-linked to the last page in the loop, leaving all other pages as orphans with 0 key moments (48 out of 204 pages affected). Added moment_indices field to SynthesizedPage schema and synthesis prompt so the LLM explicitly declares which input moments each page covers. Stage 5 now uses these indices for targeted linking instead of the broken blanket approach. Tags are also computed per-page from linked moments only, fixing cross-contamination (e.g. "stereo imaging" tag appearing on gain staging pages). Deleted 48 orphan technique pages from the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:34:37 -05:00
jlightner	e80094dc05	feat: Truncation detection, batched classification, and pipeline auto-resume Three resilience improvements to the pipeline: 1. LLMResponse(str) subclass carries finish_reason metadata from the LLM. _safe_parse_llm_response detects truncation (finish=length) and raises LLMTruncationError instead of wastefully retrying with a JSON nudge that makes the prompt even longer. 2. Stage 4 classification now batches moments (20 per call) instead of sending all moments in a single LLM call. Prevents context window overflow for videos with many moments. Batch results are merged with reindexed moment_index values. 3. run_pipeline auto-resumes from the last completed stage on error/retry instead of always restarting from stage 2. Queries pipeline_events for the most recent run to find completed stages. clean_reprocess trigger still forces a full restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 17:48:19 -05:00
jlightner	5984129e25	fix: Inflate LLM token estimates and forward max_tokens on retry Stage 4 classification was truncating (finish=length) because the 0.15x output ratio underestimated token needs. Inflated all stage ratios, bumped the buffer from 20% to 50%, raised the floor from 2048 to 4096, and fixed _safe_parse_llm_response to forward max_tokens on retry instead of falling back to the 65k default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 17:28:58 -05:00
jlightner	c1583820ea	feat: Add context labels to multi-call pipeline stages Stage 3 (extraction) LLM calls now show the topic group label (e.g., 'Sound Design Basics') and Stage 5 (synthesis) calls show the category name. Displayed as a cyan italic label in the event row between the event type badge and model name. Helps admins understand why there are multiple LLM calls per stage.	2026-03-31 17:27:40 +00:00
jlightner	c2db9aa011	feat: Pipeline runs — per-execution tracking with run-scoped events Data model: - New pipeline_runs table (id, video_id, run_number, trigger, status, started_at, finished_at, error_stage, total_tokens) - pipeline_events gains run_id FK (nullable for backward compat) - Alembic migration 010_add_pipeline_runs Backend: - run_pipeline() creates a PipelineRun, threads run_id through all stages - _emit_event() and _make_llm_callback() accept and store run_id - Stage 6 (final) calls _finish_run() to mark complete with token totals - mark_pipeline_error marks run as error - Revoke marks running runs as cancelled - Trigger endpoints pass trigger type (manual, clean_reprocess) - New GET /admin/pipeline/runs/{video_id} — lists runs with event counts - GET /admin/pipeline/events supports ?run_id= filter Frontend: - Expanded video detail now shows RunList instead of flat EventLog - Each run is a collapsible card showing: run number, trigger type, status badge, timestamps, token count, event count - Latest run auto-expands, older runs collapsed - Legacy events (pre-run-tracking) shown as separate collapsible section - Run cards color-coded: cyan border for running, red for error, gray for cancelled - EventLog accepts optional runId prop to scope events to a single run	2026-03-31 17:13:41 +00:00
jlightner	cd3b57a156	fix: Clean retrigger preserves transcript_segments (pipeline input data) Deleting transcript_segments left the pipeline with nothing to process — all stages would skip immediately. Segments come from the ingest step, not from pipeline stages 2-6. Only pipeline_events and key_moments (pipeline output) are deleted during clean reprocess.	2026-03-31 16:32:25 +00:00
jlightner	b0ad4c2dfc	feat: Add real-time pipeline visibility — auto-refresh, stage timeline, activity feed, bulk log - Backend: Video list now includes active_stage, active_stage_status, and stage_started_at fields via DISTINCT ON subquery - Backend: New GET /admin/pipeline/recent-activity endpoint returns latest stage completions/errors with video context - Frontend: 15-second auto-refresh with change detection — video rows flash when status changes - Frontend: Stage timeline dots on processing/complete/error videos showing progress through stages 2-5, active stage pulses - Frontend: Collapsible Recent Activity feed at top showing last 8 stage completions/errors with duration and creator - Frontend: Bulk operation scrollable log showing per-video results as they complete - Frontend: Auto-refresh checkbox toggle in header	2026-03-31 16:12:57 +00:00
jlightner	e17132bd60	feat: Add bulk pipeline reprocessing — creator filter, multi-select, clean retrigger - Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that deletes pipeline_events, key_moments, transcript_segments, and Qdrant vectors before retriggering the pipeline - Backend: QdrantManager.delete_by_video_id() for vector cleanup - Frontend: Creator filter dropdown on pipeline admin page - Frontend: Checkbox selection column with select-all - Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess actions, sequential dispatch with progress bar, cancel support - Bulk dispatch uses 500ms delay between requests to avoid slamming API	2026-03-31 15:24:59 +00:00
jlightner	717f6c0785	feat: Added GET /api/v1/techniques/random endpoint returning {slug}, fe… - "backend/routers/techniques.py" - "frontend/src/api/public-client.ts" - "frontend/src/pages/Home.tsx" - "frontend/src/App.css" GSD-Task: S01/T02	2026-03-31 08:24:38 +00:00
jlightner	1254e173d4	test: Added GET /api/v1/search/suggestions endpoint returning popular t… - "backend/schemas.py" - "backend/routers/search.py" - "backend/tests/test_search.py" GSD-Task: S04/T01	2026-03-31 06:35:37 +00:00
jlightner	5d0fd05b98	feat: Added scored dynamic related-techniques query returning up to 4 r… - "backend/schemas.py" - "backend/routers/techniques.py" - "backend/tests/test_public_api.py" GSD-Task: S02/T01	2026-03-31 06:13:59 +00:00
jlightner	8661549ab1	test: Added GET /topics/{category_slug}/{subtopic_slug} endpoint filter… - "backend/routers/topics.py" - "backend/tests/test_public_api.py" GSD-Task: S01/T01	2026-03-31 05:59:36 +00:00
jlightner	0b27e5752e	feat: Added sort=random\|recent query param to list_techniques endpoint… - "backend/routers/techniques.py" - "frontend/src/api/public-client.ts" GSD-Task: S03/T01	2026-03-31 05:46:31 +00:00
jlightner	95b11ae5bc	feat: Added key_moment_count correlated subquery to technique list API… - "backend/schemas.py" - "backend/routers/techniques.py" - "frontend/src/api/public-client.ts" - "frontend/src/pages/Home.tsx" - "frontend/src/App.css" GSD-Task: S03/T01	2026-03-31 05:23:37 +00:00
jlightner	127919565a	feat: Added hidden boolean column to Creator model, migration marking T… - "backend/models.py" - "backend/routers/creators.py" - "alembic/versions/009_add_creator_hidden_flag.py" GSD-Task: S02/T01	2026-03-31 05:13:17 +00:00
jlightner	af250a6f5d	feat: Added technique_page_slug to search results across Qdrant payload… - "backend/schemas.py" - "backend/search_service.py" - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/tests/test_search.py" GSD-Task: S01/T01	2026-03-31 05:02:48 +00:00
jlightner	720c2f501f	feat: meaningful pipeline status lifecycle — Not Started → Queued → In Progress → Complete/Errored Replace stage-level statuses (pending/transcribed/extracted/published) with user-meaningful lifecycle states (not_started/queued/processing/error/complete). Backend: - ProcessingStatus enum: not_started, queued, processing, error, complete - run_pipeline sets 'processing' before dispatching Celery chain - stage5 sets 'complete' (was 'published') - stage3 no longer sets intermediate status (stays 'processing') - New mark_pipeline_error task wired as link_error on chain - _set_error_status helper marks video on permanent failure - Ingest sets 'queued' (was 'transcribed') - Migration 008 renames all existing values Frontend: - StatusFilter shows fixed-order lifecycle tabs: Not Started \| Queued \| In Progress \| Errored \| Complete - Per-video badges show friendly labels instead of raw enum values - Badge colors mapped to new statuses	2026-03-31 02:43:49 +00:00
jlightner	52e7e3bbc2	feat: remove review workflow — unused gate that blocked nothing 773 key moments sat at 'pending' with 0 approved/edited/rejected. review_status was never checked by any public-facing query — all content was always visible regardless of review state. Removed: - backend/routers/review.py (10 endpoints) - backend/tests/test_review.py - frontend ReviewQueue, MomentDetail pages - frontend client.ts (review-only API client) - frontend ModeToggle, StatusBadge components - Review link from AdminDropdown, Moments link from pipeline rows - ReviewStatus, PageReviewStatus enums from models - review_mode config flag - review_status columns (migration 007) - ~80 lines of mode-toggle CSS Pipeline now always sets processing_status to 'published'. Migration 007 drops columns, enums, and migrates 'reviewed' → 'published'.	2026-03-31 02:34:12 +00:00
jlightner	4b0914b12b	fix: restore complete project tree from ub01 canonical state Auto-mode commit `7aa33cd` accidentally deleted 78 files (14,814 lines) during M005 execution. Subsequent commits rebuilt some frontend files but backend/, alembic/, tests/, whisper/, docker configs, and prompts were never restored in this repo. This commit restores the full project tree by syncing from ub01's working directory, which has all M001-M007 features running in production containers. Restored: backend/ (config, models, routers, database, redis, search_service, worker), alembic/ (6 migrations), docker/ (Dockerfiles, nginx, compose), prompts/ (4 stages), tests/, whisper/, README.md, .env.example, chrysopedia-spec.md	2026-03-31 02:10:41 +00:00
jlightner	5e408dff5a	feat: Built backend/watcher.py with PollingObserver-based folder watchi… - "backend/watcher.py" - "backend/requirements.txt" GSD-Task: S03/T01	2026-03-30 19:17:47 +00:00
jlightner	7aa33cd17f	fix: Fixed syntax errors in pipeline event instrumentation — _emit_even… - "backend/pipeline/stages.py" GSD-Task: S01/T01	2026-03-30 08:27:53 +00:00
jlightner	44fbbf030f	test: Added version list/detail API endpoints, Pydantic schemas, versio… - "backend/schemas.py" - "backend/routers/techniques.py" - "backend/tests/test_public_api.py" GSD-Task: S04/T02	2026-03-30 07:27:40 +00:00
jlightner	5c3e9b83c8	feat: Added TechniquePageVersion model, Alembic migration 002, pipeline… - "backend/models.py" - "alembic/versions/002_technique_page_versions.py" - "backend/pipeline/stages.py" GSD-Task: S04/T01	2026-03-30 07:27:40 +00:00
jlightner	0c4162a777	feat: Added video_filename field to KeyMomentSummary schema and populat… - "backend/schemas.py" - "backend/routers/techniques.py" GSD-Task: S03/T01	2026-03-30 06:50:01 +00:00
jlightner	76138887d2	fix: Creators endpoint returns paginated response, review queue limit raised to 1000, added GET /review/moments/{id} endpoint - Creators: response_model changed from list to {items, total, offset, limit} matching frontend CreatorBrowseResponse - Review queue: limit raised from 100 to 1000 - New GET /review/moments/{moment_id} endpoint for direct moment fetch - MomentDetail uses fetchMoment instead of fetching full queue - Merge candidates fetch uses limit=100	2026-03-30 01:26:12 -05:00
jlightner	0b0ca598b4	feat: Log LLM response token usage (prompt/completion/total, content_len, finish_reason)	2026-03-30 06:15:24 +00:00
jlightner	17347da87e	feat: Switch to FYN-LLM-Agent models — chat for stages 2/4, think for stages 3/5	2026-03-30 05:42:27 +00:00
jlightner	f67e676264	fix: Bump max_tokens to 65536 (model supports 94K context, extraction needs headroom)	2026-03-30 04:57:44 +00:00
jlightner	6fb497d03a	chore: Bump LLM max_tokens to 32768, commit M002/M003 GSD artifacts - max_tokens bumped from 16384 to 32768 (extraction responses still hitting limits) - All GSD planning/completion artifacts for M002 (deployment) and M003 (DNS + LLM routing) - KNOWLEDGE.md updated with XPLTD domain setup flow and container healthcheck patterns - DECISIONS.md updated with D015 (subnet) and D016 (Ollama for embeddings)	2026-03-30 04:22:45 +00:00
jlightner	cf759f3739	fix: Add max_tokens=16384 to LLM requests (OpenWebUI defaults to 1000, truncating pipeline JSON)	2026-03-30 04:08:29 +00:00
jlightner	4aa4b08a7f	feat: Per-stage LLM model routing with thinking modality and think-tag stripping - Added 8 per-stage config fields: llm_stage{2-5}_model and llm_stage{2-5}_modality - LLMClient.complete() accepts modality ('chat'/'thinking') and model_override - Thinking modality: appends JSON instructions to system prompt, strips <think> tags - strip_think_tags() handles multiline, multiple blocks, and edge cases - Pipeline stages 2-5 read per-stage config and pass to LLM client - Updated .env.example with per-stage model/modality documentation - All 59 tests pass including new think-tag stripping test	2026-03-30 02:12:14 +00:00
jlightner	5b8be50994	test: Added 18 integration tests for search and public API endpoints (t… - "backend/tests/test_search.py" - "backend/tests/test_public_api.py" GSD-Task: S05/T02	2026-03-30 00:01:32 +00:00
jlightner	c0df369018	feat: Created async search service with embedding+Qdrant+keyword fallba… - "backend/search_service.py" - "backend/schemas.py" - "backend/routers/search.py" - "backend/routers/techniques.py" - "backend/routers/topics.py" - "backend/routers/creators.py" - "backend/main.py" GSD-Task: S05/T01	2026-03-29 23:55:52 +00:00
jlightner	c2edba952c	test: Built 9 review queue API endpoints (queue, stats, approve, reject… - "backend/routers/review.py" - "backend/schemas.py" - "backend/redis_client.py" - "backend/main.py" - "backend/tests/test_review.py" GSD-Task: S04/T01	2026-03-29 23:13:43 +00:00
jlightner	2cb10b5db8	test: Added 10 integration tests covering pipeline stages 2-6, trigger… - "backend/tests/test_pipeline.py" - "backend/tests/fixtures/mock_llm_responses.py" - "backend/tests/conftest.py" GSD-Task: S03/T05	2026-03-29 22:51:26 +00:00
jlightner	910e945d9c	feat: Wired automatic run_pipeline.delay() dispatch after ingest commit… - "backend/routers/pipeline.py" - "backend/routers/ingest.py" - "backend/main.py" GSD-Task: S03/T04	2026-03-29 22:41:02 +00:00
jlightner	5c46d1e922	feat: Created sync EmbeddingClient, QdrantManager with idempotent colle… - "backend/pipeline/embedding_client.py" - "backend/pipeline/qdrant_client.py" - "backend/pipeline/stages.py" GSD-Task: S03/T03	2026-03-29 22:39:04 +00:00

1 2

57 commits