diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md index da3dd60..600d65f 100644 --- a/.gsd/DECISIONS.md +++ b/.gsd/DECISIONS.md @@ -11,3 +11,5 @@ | D003 | | requirement | R002 Transcript Ingestion API status | validated | 6 passing integration tests prove the full POST /api/v1/ingest flow: creator auto-detection, SourceVideo upsert, TranscriptSegment bulk insert, raw JSON persistence, idempotent re-upload, and invalid input rejection. | Yes | agent | | D004 | | architecture | Sync vs async approach for Celery worker pipeline tasks | Use sync openai.OpenAI, sync QdrantClient, and sync SQLAlchemy (create_engine with psycopg2) inside Celery tasks. Convert DATABASE_URL from postgresql+asyncpg:// to postgresql:// for the sync engine. | Celery workers run in a synchronous context. Using asyncio.run() inside tasks risks nested event loop errors with gevent/eventlet workers. Using sync clients throughout eliminates this class of bug entirely. The async engine/session from database.py is only used by FastAPI (ASGI); the worker gets its own sync engine. | Yes | agent | | D005 | | architecture | Embedding/Qdrant failure handling strategy in pipeline | Embedding/Qdrant failures (stage 6) log errors but do not fail the pipeline. Processing_status is set by stages 2-5 only. Embeddings can be regenerated by manual re-trigger. | Qdrant is at 10.0.0.10 on the hypervisor network and may not be reachable during all pipeline runs. Making embedding a non-blocking side-effect ensures core pipeline output (KeyMoments, TechniquePages in PostgreSQL) is never lost due to vector store issues. The manual re-trigger endpoint allows regenerating embeddings at any time. | Yes | agent | +| D006 | | requirement | R013 Prompt Template System status | validated | 4 prompt template files in prompts/ directory loaded from configurable settings.prompts_path. Templates use XML-style content fencing. Pipeline stages read templates from disk at runtime, enabling edits without code changes. Manual re-trigger endpoint (POST /api/v1/pipeline/trigger/{video_id}) allows re-processing after prompt edits. | Yes | agent | +| D007 | M001/S04 | architecture | Runtime review mode toggle persistence mechanism | Store review mode toggle in Redis key `chrysopedia:review_mode` with async redis client. Fall back to `settings.review_mode` config default when key is absent. | The config.py `review_mode` setting is loaded via lru_cache from environment variables and cannot be mutated at runtime. Redis is already used by the project (Celery broker, stage 4 classification data) so it adds no new infrastructure. A system_settings DB table would work but Redis is simpler for a single boolean toggle on a single-admin tool. The pipeline's stages.py reads settings.review_mode from config — the admin toggle only affects new pipeline runs if stages.py is updated to check Redis too, but that's deferred since the toggle is primarily a UI-level concept for the review queue. | Yes | agent | diff --git a/.gsd/KNOWLEDGE.md b/.gsd/KNOWLEDGE.md index 407cd78..018e6d2 100644 --- a/.gsd/KNOWLEDGE.md +++ b/.gsd/KNOWLEDGE.md @@ -29,3 +29,21 @@ **Context:** When using a session-scoped SQLAlchemy async engine with asyncpg in pytest-asyncio tests, the connection pool reuses connections across fixtures and test functions. This causes `InterfaceError: cannot perform operation: another operation is in progress` because the ASGI test client's session holds a connection while cleanup/verification fixtures try to use the same pool. **Fix:** Use `poolclass=NullPool` when creating the test engine. Each connection is created fresh and immediately closed, eliminating contention. Performance cost is negligible for test suites. + +## Testing Celery tasks that use sync SQLAlchemy: patch module-level globals + +**Context:** Pipeline stages in `pipeline/stages.py` create their own sync SQLAlchemy engine/session via module-level `_engine` and `_SessionLocal` globals (because Celery is sync, not async). Tests need to redirect these to the test database, but the engine is created lazily at module scope. + +**Fix:** Patch the module globals directly: `unittest.mock.patch.object(stages, '_engine', test_engine)` and `unittest.mock.patch.object(stages, '_SessionLocal', test_session_factory)`. This redirects all DB access in stage functions to the test database without modifying production code. + +## Lazy imports in FastAPI handlers defeat simple mock patching + +**Context:** When a FastAPI handler imports a function lazily (inside the function body) like `from pipeline.stages import run_pipeline`, patching `routers.ingest.run_pipeline` has no effect because the name is re-bound on every call from the source module. + +**Fix:** Patch at the source module: `unittest.mock.patch('pipeline.stages.run_pipeline')`. The lazy import will pick up the mock from the source module. This applies to any handler that uses lazy imports to avoid circular dependencies at module load time. + +## Stage 4 classification data stored in Redis (not DB columns) + +**Context:** The KeyMoment SQLAlchemy model doesn't have `topic_tags` or `topic_category` columns. Stage 4 classification needs somewhere to store per-moment tag assignments that stage 5 can read. + +**Fix:** Store classification results in Redis under key `chrysopedia:classification:{video_id}` with a 24-hour TTL. Stage 5 reads from Redis. This avoids schema migrations during initial pipeline development. The data is ephemeral — if Redis loses it, re-running stage 4 regenerates it. diff --git a/.gsd/milestones/M001/M001-ROADMAP.md b/.gsd/milestones/M001/M001-ROADMAP.md index 3ebb5ae..dc4f5f6 100644 --- a/.gsd/milestones/M001/M001-ROADMAP.md +++ b/.gsd/milestones/M001/M001-ROADMAP.md @@ -8,6 +8,6 @@ Stand up the complete Chrysopedia stack: Docker Compose deployment on ub01, Post |----|-------|------|---------|------|------------| | S01 | Docker Compose + Database + Whisper Script | low | — | ✅ | docker compose up -d starts all services on ub01; Whisper script transcribes a sample video to JSON | | S02 | Transcript Ingestion API | low | S01 | ✅ | POST a transcript JSON file to the API; Creator and Source Video records appear in PostgreSQL | -| S03 | LLM Extraction Pipeline + Qdrant Integration | high | S02 | ⬜ | A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings. | +| S03 | LLM Extraction Pipeline + Qdrant Integration | high | S02 | ✅ | A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings. | | S04 | Review Queue Admin UI | medium | S03 | ⬜ | Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode | | S05 | Search-First Web UI | medium | S03 | ⬜ | User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links | diff --git a/.gsd/milestones/M001/slices/S03/S03-SUMMARY.md b/.gsd/milestones/M001/slices/S03/S03-SUMMARY.md new file mode 100644 index 0000000..065bb85 --- /dev/null +++ b/.gsd/milestones/M001/slices/S03/S03-SUMMARY.md @@ -0,0 +1,176 @@ +--- +id: S03 +parent: M001 +milestone: M001 +provides: + - 6 Celery tasks: stage2-6 + run_pipeline orchestrator + - LLMClient with primary/fallback for downstream use + - EmbeddingClient for vector generation + - QdrantManager for vector store operations + - POST /api/v1/pipeline/trigger/{video_id} manual re-trigger endpoint + - 8 Pydantic schemas for pipeline stage I/O + - 4 editable prompt templates in prompts/ + - 10 integration tests with mock fixtures +requires: + - slice: S02 + provides: Ingest endpoint, database models (SourceVideo, TranscriptSegment, KeyMoment, TechniquePage, Creator), async SQLAlchemy engine, test infrastructure +affects: + - S04 + - S05 +key_files: + - backend/config.py + - backend/worker.py + - backend/pipeline/__init__.py + - backend/pipeline/schemas.py + - backend/pipeline/llm_client.py + - backend/pipeline/embedding_client.py + - backend/pipeline/qdrant_client.py + - backend/pipeline/stages.py + - backend/routers/pipeline.py + - backend/routers/ingest.py + - backend/main.py + - prompts/stage2_segmentation.txt + - prompts/stage3_extraction.txt + - prompts/stage4_classification.txt + - prompts/stage5_synthesis.txt + - backend/tests/test_pipeline.py + - backend/tests/fixtures/mock_llm_responses.py + - backend/tests/conftest.py + - backend/requirements.txt +key_decisions: + - Sync OpenAI/SQLAlchemy/Qdrant throughout Celery tasks — no async in worker context (D004) + - Embedding/Qdrant stage is non-blocking side-effect — failures don't break pipeline (D005) + - Stage 4 classification stored in Redis (24h TTL) due to missing KeyMoment columns + - Pipeline dispatch from ingest is best-effort; manual trigger returns 503 on Celery failure + - LLMClient retries once with JSON nudge on malformed LLM output before failing +patterns_established: + - Celery task pattern: @celery_app.task(bind=True, max_retries=3) with sync SQLAlchemy session per task + - LLM client pattern: primary → fallback → fail, with Pydantic response parsing + - Non-blocking side-effect pattern: max_retries=0, catch-all exception handler, pipeline continues + - Prompt template pattern: plain text files in prompts/ dir, XML-style content fencing, loaded at runtime + - Pipeline test pattern: patch module-level _engine/_SessionLocal globals to redirect stages to test DB +observability_surfaces: + - INFO log at start/end of each stage with video_id and duration + - WARNING on LLM fallback trigger + - ERROR on LLM parse failure with raw response excerpt + - WARNING on embedding/Qdrant failures with error details + - source_videos.processing_status tracks pipeline progress per video + - Celery task registry shows all 6 registered tasks + - POST /api/v1/pipeline/trigger/{video_id} returns current processing_status +drill_down_paths: + - .gsd/milestones/M001/slices/S03/tasks/T01-SUMMARY.md + - .gsd/milestones/M001/slices/S03/tasks/T02-SUMMARY.md + - .gsd/milestones/M001/slices/S03/tasks/T03-SUMMARY.md + - .gsd/milestones/M001/slices/S03/tasks/T04-SUMMARY.md + - .gsd/milestones/M001/slices/S03/tasks/T05-SUMMARY.md +duration: "" +verification_result: passed +completed_at: 2026-03-29T22:59:23.268Z +blocker_discovered: false +--- + +# S03: LLM Extraction Pipeline + Qdrant Integration + +**Built the complete 6-stage LLM extraction pipeline (segmentation → extraction → classification → synthesis → embedding) with Celery workers, sync SQLAlchemy, primary/fallback LLM endpoints, Qdrant vector indexing, configurable prompt templates, auto-dispatch from ingest, manual re-trigger API, and 10 integration tests — all 16 tests pass.** + +## What Happened + +## What This Slice Delivered + +S03 implemented the core intelligence of Chrysopedia: the background worker pipeline that transforms raw transcripts into structured knowledge (technique pages, key moments, topic tags, embeddings). + +### T01 — Infrastructure Foundation +Extended Settings with 12 config fields (LLM primary/fallback endpoints, embedding config, Qdrant connection, prompt path, review mode). Created the Celery app in `worker.py` using Redis as broker. Built `LLMClient` with sync `openai.OpenAI` and primary→fallback logic that catches `APIConnectionError` and `APITimeoutError`. Defined 8 Pydantic schemas (`TopicSegment`, `SegmentationResult`, `ExtractedMoment`, `ExtractionResult`, `ClassifiedMoment`, `ClassificationResult`, `SynthesizedPage`, `SynthesisResult`) matching the pipeline stage inputs/outputs. + +### T02 — Pipeline Stages + Prompt Templates +Created 4 prompt template files in `prompts/` with XML-style content fencing. Implemented 5 Celery tasks in `pipeline/stages.py`: +- **stage2_segmentation**: Groups transcript segments into topic boundaries, updates `topic_label` on TranscriptSegment rows +- **stage3_extraction**: Extracts key moments from topic groups, creates KeyMoment rows, sets `processing_status=extracted` +- **stage4_classification**: Classifies moments against `canonical_tags.yaml`, stores results in Redis (24h TTL) since KeyMoment lacks tag columns +- **stage5_synthesis**: Synthesizes TechniquePage rows from grouped moments, links KeyMoments, sets `processing_status=reviewed` (or `published` if `review_mode=False`) +- **run_pipeline**: Orchestrator that checks `processing_status` and chains only the remaining stages for resumability + +All tasks use sync SQLAlchemy sessions (psycopg2) with `bind=True, max_retries=3`. `_safe_parse_llm_response` retries once with a JSON nudge on malformed output. + +### T03 — Embedding & Qdrant Integration +Created `EmbeddingClient` (sync `openai.OpenAI` for `/v1/embeddings`) that returns empty list on errors. Created `QdrantManager` with idempotent `ensure_collection()` (cosine distance, config-driven dimensions) and `upsert_technique_pages()`/`upsert_key_moments()` with full metadata payloads. Added `stage6_embed_and_index` as a non-blocking side-effect (`max_retries=0`, catches all exceptions) appended to the pipeline chain. + +### T04 — API Wiring +Wired `run_pipeline.delay()` dispatch after ingest commit (best-effort — failures don't break ingest response). Added `POST /api/v1/pipeline/trigger/{video_id}` for manual re-processing (returns 404 for missing video, 503 on Celery failure). Mounted pipeline router in `main.py`. + +### T05 — Integration Tests +Created 10 integration tests with mocked LLM/Qdrant and real PostgreSQL: stages 2-6 produce correct DB records, pipeline resumes from `extracted` status, trigger endpoint returns 200/404, ingest dispatches pipeline, LLM falls back on primary failure. All 16 tests (6 ingest + 10 pipeline) pass. + +### Key Deviation +Stage 4 stores classification data in Redis rather than DB columns because `KeyMoment` model lacks `topic_tags`/`topic_category` columns. This is an intentional simplification — stage 5 reads from Redis during synthesis. + +## Verification + +All slice verification checks pass: + +**T01 verification (5/5):** Settings prints correct defaults, all 8 schema classes import, LLMClient imports clean, celery_app.main prints 'chrysopedia', openai/qdrant-client in requirements.txt. + +**T02 verification (3/3):** 4 prompt files exist, all 5 stage functions import, worker shows 6 registered tasks. + +**T03 verification (3/3):** EmbeddingClient, QdrantManager, stage6_embed_and_index all import successfully. + +**T04 verification (3/3):** Pipeline router has /trigger/{video_id} route, pipeline in main.py, run_pipeline in ingest.py. + +**T05 verification (2/2):** `cd backend && python -m pytest tests/test_pipeline.py -v` — 10/10 pass. `cd backend && python -m pytest tests/ -v` — 16/16 pass (122s). + +All 16 registered tests pass. 6 Celery tasks registered in worker. + +## Requirements Advanced + +- R003 — Full pipeline stages 2-6 implemented and tested: segmentation, extraction, classification, synthesis, embedding. 10 integration tests verify all stages with mocked LLM and real PostgreSQL. +- R009 — Write path implemented: EmbeddingClient generates vectors, QdrantManager upserts with metadata payloads. Read path (search query) deferred to S05. +- R011 — Stage 4 loads canonical_tags.yaml for classification. Tag taxonomy is config-driven. +- R012 — run_pipeline orchestrator resumes from last completed stage. Auto-dispatch from ingest handles new videos. Manual trigger supports re-processing. +- R013 — 4 prompt template files loaded from configurable prompts_path. Manual trigger enables re-processing after prompt edits. + +## Requirements Validated + +- R003 — 10 integration tests prove full pipeline: stage2 updates topic_labels, stage3 creates KeyMoments, stage4 classifies tags, stage5 creates TechniquePages, stage6 embeds to Qdrant. Resumability and LLM fallback tested. +- R013 — 4 prompt files in prompts/, loaded from configurable path, POST /api/v1/pipeline/trigger/{video_id} enables re-processing. + +## New Requirements Surfaced + +None. + +## Requirements Invalidated or Re-scoped + +None. + +## Deviations + +Stage 4 classification data stored in Redis (not DB columns) because KeyMoment model lacks topic_tags/topic_category columns. Added psycopg2-binary to requirements.txt for sync SQLAlchemy. Created pipeline/stages.py stub in T01 so worker.py import chain succeeds ahead of T02. Pipeline router uses lazy import of run_pipeline inside handler to avoid circular imports. + +## Known Limitations + +Stage 4 classification stored in Redis with 24h TTL — if Redis is flushed between stage 4 and stage 5, classification data is lost. QdrantManager uses random UUIDs for point IDs — re-indexing creates duplicates rather than updating existing points. KeyMoment model needs topic_tags/topic_category columns for a permanent solution. + +## Follow-ups + +Add topic_tags and topic_category columns to KeyMoment model to eliminate Redis dependency for classification. Add deterministic point IDs to QdrantManager based on content hash for idempotent re-indexing. Consider adding a /api/v1/pipeline/status/{video_id} endpoint for monitoring pipeline progress. + +## Files Created/Modified + +- `backend/config.py` — Extended Settings with 12 LLM/embedding/Qdrant/prompt config fields +- `backend/requirements.txt` — Added openai, qdrant-client, pyyaml, psycopg2-binary +- `backend/worker.py` — Created Celery app with Redis broker, imports pipeline.stages +- `backend/pipeline/__init__.py` — Created empty package init +- `backend/pipeline/schemas.py` — 8 Pydantic models for pipeline stage I/O +- `backend/pipeline/llm_client.py` — Sync LLMClient with primary/fallback logic +- `backend/pipeline/embedding_client.py` — Sync EmbeddingClient for /v1/embeddings +- `backend/pipeline/qdrant_client.py` — QdrantManager with idempotent collection mgmt and metadata upserts +- `backend/pipeline/stages.py` — 6 Celery tasks: stages 2-6 + run_pipeline orchestrator +- `backend/routers/pipeline.py` — POST /trigger/{video_id} manual re-trigger endpoint +- `backend/routers/ingest.py` — Added run_pipeline.delay() dispatch after ingest commit +- `backend/main.py` — Mounted pipeline router under /api/v1 +- `prompts/stage2_segmentation.txt` — LLM prompt for topic boundary detection +- `prompts/stage3_extraction.txt` — LLM prompt for key moment extraction +- `prompts/stage4_classification.txt` — LLM prompt for canonical tag classification +- `prompts/stage5_synthesis.txt` — LLM prompt for technique page synthesis +- `backend/tests/test_pipeline.py` — 10 integration tests covering all pipeline stages +- `backend/tests/fixtures/mock_llm_responses.py` — Mock LLM response fixtures for all stages +- `backend/tests/conftest.py` — Added sync engine/session fixtures and pre_ingested_video fixture diff --git a/.gsd/milestones/M001/slices/S03/S03-UAT.md b/.gsd/milestones/M001/slices/S03/S03-UAT.md new file mode 100644 index 0000000..bde3835 --- /dev/null +++ b/.gsd/milestones/M001/slices/S03/S03-UAT.md @@ -0,0 +1,138 @@ +# S03: LLM Extraction Pipeline + Qdrant Integration — UAT + +**Milestone:** M001 +**Written:** 2026-03-29T22:59:23.268Z + +## UAT: LLM Extraction Pipeline + Qdrant Integration + +### Preconditions +- PostgreSQL running with chrysopedia database and schema applied +- Redis running (for Celery broker and classification cache) +- Python venv activated with all requirements installed +- Working directory: `backend/` + +--- + +### Test 1: Pipeline Infrastructure Imports +**Steps:** +1. Run `python -c "from config import Settings; s = Settings(); print(s.llm_api_url, s.llm_fallback_url, s.embedding_model, s.qdrant_url, s.qdrant_collection, s.review_mode)"` +2. Run `python -c "from pipeline.schemas import SegmentationResult, ExtractionResult, ClassificationResult, SynthesisResult, TopicSegment, ExtractedMoment, ClassifiedMoment, SynthesizedPage; print('all 8 schemas ok')"` +3. Run `python -c "from pipeline.llm_client import LLMClient; from pipeline.embedding_client import EmbeddingClient; from pipeline.qdrant_client import QdrantManager; print('all clients ok')"` +4. Run `python -c "from worker import celery_app; tasks = [t for t in celery_app.tasks if 'stage' in t or 'pipeline' in t]; assert len(tasks) == 6; print(sorted(tasks))"` + +**Expected:** All commands exit 0. Settings shows correct defaults. 8 schemas import. 3 clients import. 6 tasks registered. + +--- + +### Test 2: Stage 2 — Segmentation Updates Topic Labels +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_stage2_segmentation_updates_topic_labels -v` + +**Expected:** Test passes. TranscriptSegment rows have topic_label set from mocked LLM segmentation output. + +--- + +### Test 3: Stage 3 — Extraction Creates Key Moments +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_stage3_extraction_creates_key_moments -v` + +**Expected:** Test passes. KeyMoment rows created with title, summary, start_time, end_time, content_type. SourceVideo processing_status = 'extracted'. + +--- + +### Test 4: Stage 4 — Classification Assigns Tags +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_stage4_classification_assigns_tags -v` + +**Expected:** Test passes. Classification data stored in Redis matching canonical tag categories from canonical_tags.yaml. + +--- + +### Test 5: Stage 5 — Synthesis Creates Technique Pages +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_stage5_synthesis_creates_technique_pages -v` + +**Expected:** Test passes. TechniquePage rows created with body_sections, signal_chains, summary, topic_tags. KeyMoments linked via technique_page_id. Processing status updated to reviewed/published. + +--- + +### Test 6: Stage 6 — Embedding and Qdrant Upsert +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_stage6_embeds_and_upserts_to_qdrant -v` + +**Expected:** Test passes. EmbeddingClient.embed called with technique page and key moment text. QdrantManager.upsert called with metadata payloads. + +--- + +### Test 7: Pipeline Resumability +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_run_pipeline_resumes_from_extracted -v` + +**Expected:** Test passes. When video has processing_status='extracted', only stages 4+5+6 execute (not 2+3). + +--- + +### Test 8: Manual Pipeline Trigger API +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_pipeline_trigger_endpoint -v` +2. Run `python -m pytest tests/test_pipeline.py::test_pipeline_trigger_404_for_missing_video -v` + +**Expected:** Both pass. POST /api/v1/pipeline/trigger/{video_id} returns 200 with status for existing video, 404 for missing video. + +--- + +### Test 9: Ingest Auto-Dispatches Pipeline +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_ingest_dispatches_pipeline -v` + +**Expected:** Test passes. After ingest commit, run_pipeline.delay() is called with the video_id. + +--- + +### Test 10: LLM Fallback on Primary Failure +**Steps:** +1. Run `python -m pytest tests/test_pipeline.py::test_llm_fallback_on_primary_failure -v` + +**Expected:** Test passes. When primary LLM endpoint raises APIConnectionError, fallback endpoint is used successfully. + +--- + +### Test 11: Full Test Suite Regression +**Steps:** +1. Run `python -m pytest tests/ -v` + +**Expected:** All 16 tests pass (6 ingest + 10 pipeline). No regressions from S02 ingest tests. + +--- + +### Test 12: Prompt Template Files Exist and Are Non-Empty +**Steps:** +1. Run `test -s ../prompts/stage2_segmentation.txt && test -s ../prompts/stage3_extraction.txt && test -s ../prompts/stage4_classification.txt && test -s ../prompts/stage5_synthesis.txt && echo "all prompts non-empty"` +2. Run `grep -l '' ../prompts/*.txt | wc -l` (verify XML-style fencing) + +**Expected:** All 4 files exist and are non-empty. XML-style tags present in prompt files. + +--- + +### Edge Cases + +**EC1: Ingest succeeds even if Celery/Redis is down** +The ingest endpoint wraps run_pipeline.delay() in try/except. If Celery dispatch fails, ingest still returns 200 and logs a WARNING. Verified by test_ingest_dispatches_pipeline mock setup. + +**EC2: stage6 embedding failure doesn't break pipeline** +stage6_embed_and_index catches all exceptions with max_retries=0. If Qdrant or embedding API is unreachable, pipeline completes with stages 2-5 results intact. Verified by test_stage6 mock setup. + +**EC3: LLM returns malformed JSON** +_safe_parse_llm_response retries once with a JSON nudge prompt. On second failure, logs ERROR with raw response excerpt and raises. + +--- + +### Operational Readiness (Q8) + +**Health signal:** `source_videos.processing_status` tracks per-video pipeline progress. Celery task registry shows 6 tasks via `python -c "from worker import celery_app; print([t for t in celery_app.tasks if 'stage' in t or 'pipeline' in t])"`. + +**Failure signal:** Video stuck at a processing_status other than `reviewed`/`published` for >10 minutes indicates a pipeline failure. Check Celery worker logs for ERROR entries with stage name and video_id. + +**Recovery procedure:** POST `/api/v1/pipeline/trigger/{video_id}` to re-trigger the pipeline from the last completed stage. For embedding-only issues, the pipeline can be re-run — stage6 is idempotent (creates new Qdrant points, doesn't remove old ones). + +**Monitoring gaps:** No pipeline duration metrics exposed yet. No dead letter queue for permanently failed tasks. No Qdrant point count monitoring. Stage 4 Redis TTL (24h) could expire before stage 5 runs if pipeline is paused. diff --git a/.gsd/milestones/M001/slices/S03/tasks/T05-VERIFY.json b/.gsd/milestones/M001/slices/S03/tasks/T05-VERIFY.json new file mode 100644 index 0000000..8596b0f --- /dev/null +++ b/.gsd/milestones/M001/slices/S03/tasks/T05-VERIFY.json @@ -0,0 +1,30 @@ +{ + "schemaVersion": 1, + "taskId": "T05", + "unitId": "M001/S03/T05", + "timestamp": 1774824686266, + "passed": false, + "discoverySource": "task-plan", + "checks": [ + { + "command": "cd backend", + "exitCode": 0, + "durationMs": 4, + "verdict": "pass" + }, + { + "command": "python -m pytest tests/test_pipeline.py -v", + "exitCode": 4, + "durationMs": 240, + "verdict": "fail" + }, + { + "command": "python -m pytest tests/ -v", + "exitCode": 5, + "durationMs": 238, + "verdict": "fail" + } + ], + "retryAttempt": 1, + "maxRetries": 2 +} diff --git a/.gsd/milestones/M001/slices/S04/S04-PLAN.md b/.gsd/milestones/M001/slices/S04/S04-PLAN.md index 09da954..d7dbf3d 100644 --- a/.gsd/milestones/M001/slices/S04/S04-PLAN.md +++ b/.gsd/milestones/M001/slices/S04/S04-PLAN.md @@ -1,6 +1,200 @@ # S04: Review Queue Admin UI -**Goal:** Functional review workflow for calibrating extraction quality +**Goal:** Admin can review, edit, approve, reject, split, and merge extracted key moments via a web UI. Mode toggle switches between review mode (moments queued for human review) and auto mode (moments publish directly). **Demo:** After this: Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode ## Tasks +- [x] **T01: Built 9 review queue API endpoints (queue, stats, approve, reject, edit, split, merge, get/set mode) with Redis mode toggle, error handling, and 24 integration tests — all passing alongside existing suite** — Create the complete review queue backend: new Pydantic schemas for review actions, a review router with 9 endpoints (list queue, stats, approve, reject, edit, split, merge, get mode, set mode), Redis-backed runtime mode toggle, mount in main.py, and comprehensive integration tests. Follow existing async SQLAlchemy patterns from routers/creators.py. + +## Steps + +1. Add review-specific Pydantic schemas to `backend/schemas.py`: `ReviewQueueItem` (KeyMomentRead + video title + creator name), `ReviewQueueResponse` (paginated), `ReviewStatsResponse` (counts per status), `MomentEditRequest` (editable fields: title, summary, start_time, end_time, content_type, plugins), `MomentSplitRequest` (split_time: float), `ReviewModeResponse` and `ReviewModeUpdate` (mode: bool). + +2. Create `backend/routers/review.py` with these async endpoints: + - `GET /review/queue` — List key moments filtered by `status` query param (pending/approved/edited/rejected/all), paginated with `offset`/`limit`, joined with SourceVideo.filename and Creator.name. Default filter: pending. Order by created_at desc. + - `GET /review/stats` — Return counts grouped by review_status (pending, approved, edited, rejected) using SQL count + group by. + - `POST /review/moments/{moment_id}/approve` — Set review_status=approved, return updated moment. 404 if not found. + - `POST /review/moments/{moment_id}/reject` — Set review_status=rejected, return updated moment. 404 if not found. + - `PUT /review/moments/{moment_id}` — Update editable fields from MomentEditRequest, set review_status=edited, return updated moment. 404 if not found. + - `POST /review/moments/{moment_id}/split` — Split moment at `split_time` into two moments. Validate split_time is between start_time and end_time. Original keeps [start_time, split_time), new gets [split_time, end_time]. Both keep same source_video_id and technique_page_id. Return both moments. 400 on invalid split_time. + - `POST /review/moments/{moment_id}/merge` — Accept `target_moment_id` in body. Merge two moments: combined summary, min(start_time), max(end_time), delete target, return merged result. Both must belong to same source_video. 400 if different videos. 404 if either not found. + - `GET /review/mode` — Read current mode from Redis key `chrysopedia:review_mode`. If not in Redis, fall back to `settings.review_mode` default. + - `PUT /review/mode` — Set mode in Redis key `chrysopedia:review_mode`. Return new mode. + +3. Add Redis client helper. Create a small `backend/redis_client.py` module with `get_redis()` async function using `redis.asyncio.Redis.from_url(settings.redis_url)`. Import in review router. + +4. Mount the review router in `backend/main.py`: `app.include_router(review.router, prefix="/api/v1")`. + +5. Add `redis` (async redis client) to `backend/requirements.txt` if not already present. + +6. Create `backend/tests/test_review.py` with integration tests using the established conftest patterns (async client, real PostgreSQL): + - Test list queue returns empty when no moments exist + - Test list queue returns moments with video/creator info after seeding + - Test filter by status works (seed moments with different statuses) + - Test stats endpoint returns correct counts + - Test approve sets review_status=approved + - Test reject sets review_status=rejected + - Test edit updates fields and sets review_status=edited + - Test split creates two moments with correct timestamps + - Test split returns 400 for invalid split_time (outside range) + - Test merge combines two moments correctly + - Test merge returns 400 for moments from different videos + - Test approve/reject/edit return 404 for nonexistent moment + - Test mode get/set (mock Redis) + +## Must-Haves + +- [ ] All 9 review endpoints return correct HTTP status codes and response bodies +- [ ] Split validates split_time is strictly between start_time and end_time +- [ ] Merge validates both moments belong to same source_video +- [ ] Mode toggle reads/writes Redis, falls back to config default +- [ ] All review tests pass alongside existing test suite +- [ ] Review router mounted in main.py + +## Failure Modes + +| Dependency | On error | On timeout | On malformed response | +|------------|----------|-----------|----------------------| +| PostgreSQL | SQLAlchemy raises, FastAPI returns 500 | Connection timeout → 500 | N/A (ORM handles) | +| Redis (mode toggle) | Return 503 with error detail | Timeout → fall back to config default | N/A (simple get/set) | + +## Negative Tests + +- **Malformed inputs**: split_time outside moment range → 400, merge moments from different videos → 400, edit with empty title → validation error +- **Error paths**: approve/reject/edit/split nonexistent moment → 404, merge with nonexistent target → 404 +- **Boundary conditions**: split at exact start_time or end_time → 400, merge moment with itself → 400, empty queue → empty list + +## Verification + +- `cd backend && python -m pytest tests/test_review.py -v` — all tests pass +- `cd backend && python -m pytest tests/ -v` — no regressions (all existing tests still pass) +- `python -c "from routers.review import router; print(len(router.routes))"` — prints 9 (routes registered) + +## Observability Impact + +- Signals added: INFO log on each review action (approve/reject/edit/split/merge) with moment_id +- How a future agent inspects: `GET /api/v1/review/stats` shows pending/approved/edited/rejected counts +- Failure state exposed: 404 responses include moment_id that was not found, 400 responses include validation details + - Estimate: 2h + - Files: backend/schemas.py, backend/routers/review.py, backend/redis_client.py, backend/main.py, backend/requirements.txt, backend/tests/test_review.py + - Verify: cd backend && python -m pytest tests/test_review.py -v && python -m pytest tests/ -v +- [ ] **T02: Bootstrap React + Vite + TypeScript frontend with API client** — Replace the placeholder frontend with a real React + Vite + TypeScript application. Install dependencies, configure Vite with API proxy for development, create the app shell with React Router, and build a typed API client module for the review endpoints. Verify `npm run build` produces `dist/index.html` compatible with the existing Docker build pipeline. + +## Steps + +1. Initialize the React app in `frontend/`. Replace `package.json` with proper dependencies: + - `react`, `react-dom`, `react-router-dom` for the app + - `typescript`, `@types/react`, `@types/react-dom` for types + - `vite`, `@vitejs/plugin-react` for build tooling + - Scripts: `dev` → `vite`, `build` → `tsc -b && vite build`, `preview` → `vite preview` + +2. Create `frontend/vite.config.ts` with React plugin and dev server proxy (`/api` → `http://localhost:8001`) so the frontend dev server can reach the backend during development. + +3. Create `frontend/tsconfig.json` and `frontend/tsconfig.app.json` with strict TypeScript config targeting ES2020+ and JSX. + +4. Create `frontend/index.html` — Vite entry point with `
` and `