chore: remove .gsd/ from tracking, add to .gitignore

fix: crypto.randomUUID fallback for HTTP contexts
Chat page and ChatWidget used crypto.randomUUID() for conversation IDs, which is only available in secure contexts (HTTPS). On HTTP, this throws 'crypto.randomUUID is not a function'. Added generateUUID() utility with Math.random-based fallback.
2026-04-13 23:50:05 -05:00 · 2026-04-05 06:05:17 +00:00 · 2026-04-05 06:01:54 +00:00 · 2026-04-04 15:14:05 +00:00 · 2026-04-04 14:50:44 +00:00 · 2026-04-04 14:45:09 +00:00
400 changed files with 83479 additions and 0 deletions
--- a/.artifacts/feature-synthesis-chunking.md
+++ b/.artifacts/feature-synthesis-chunking.md
@ -0,0 +1,111 @@
 # Feature: Stage 5 Synthesis Chunking for Large Category Groups
 ## Problem
 Stage 5 synthesis sends all key moments for a given `(video, topic_category)` group to the LLM in a single call. When a video produces a large number of moments in one category, the prompt exceeds what the model can process into a valid structured response.
 **Concrete failure:** COPYCATT's "Sound Design - Everything In 2 Hours Speedrun" (2,026 transcript segments) produced 198 moments classified as "Sound design" (175) / "Sound Design" (23 — casing inconsistency). The synthesis prompt for that category was ~42k tokens. The model (`fyn-llm-agent-think`, 128k context) accepted the prompt but returned only 5,407 completion tokens with `finish=stop` — valid JSON that was structurally incomplete, failing Pydantic `SynthesisResult` validation. The pipeline retried and failed identically each time.
 The other 37 videos in the corpus (up to 930 segments, ~60 moments per category max) all synthesized successfully.
 ## Root Causes
 Two independent issues compound into this failure:
 ### 1. No chunking in stage 5 synthesis
 `stage5_synthesis()` in `backend/pipeline/stages.py` iterates over `groups[category]` and builds one prompt containing ALL moments for that category. There's no upper bound on how many moments go into a single LLM call.
 **Location:** `stages.py` lines ~850-875 — the `for category, moment_group in groups.items()` loop builds the full `moments_text` without splitting.
 ### 2. Inconsistent category casing from stage 4
 Stage 4 classification produces `"Sound design"` and `"Sound Design"` as separate categories for the same video. Stage 5 groups by exact string match, so these stay separate — but even independently, 175 moments in one group is too many. The casing issue does inflate the problem by preventing natural splitting across categories.
 **Location:** Classification output stored in Redis at `chrysopedia:classification:{video_id}`. The `topic_category` values come directly from the LLM with no normalization.
 ## Proposed Changes
 ### Change 1: Chunked synthesis with merge pass
 Split large category groups into chunks before sending to the LLM. Each chunk produces technique pages independently, then a lightweight merge step combines pages with overlapping topics.
 **In `stage5_synthesis()` (`backend/pipeline/stages.py`):**
 1. After grouping moments by category, check each group's size against a configurable threshold (e.g., `SYNTHESIS_CHUNK_SIZE = 30` moments).
 2. Groups at or below the threshold: process as today — single LLM call.
 3. Groups above the threshold: split into chunks of `SYNTHESIS_CHUNK_SIZE` moments, ordered by `start_time` (preserving chronological context). Each chunk gets its own synthesis LLM call, producing its own `SynthesisResult` with 1+ pages.
 4. After all chunks for a category are processed, collect the resulting pages. Pages with the same or very similar slugs (e.g., Levenshtein distance < 3, or shared slug prefix before the creator suffix) should be merged. The merge is a second LLM call with a simpler prompt: "Here are N partial technique pages on the same topic from the same creator. Merge them into a single cohesive page, combining body sections, deduplicating signal chains and plugins, and writing a unified summary." This merge prompt is much smaller than the original 198-moment prompt because it takes synthesized prose as input, not raw moment data.
 5. If no pages share slugs across chunks, keep them all — they represent genuinely distinct sub-topics the LLM identified within the category.
 **New config setting in `backend/config.py`:**
 ```python
 synthesis_chunk_size: int = 30  # Max moments per synthesis LLM call
 ```
 **New prompt file:** `prompts/stage5_merge.txt` — instructions for combining partial technique pages into a unified page. Much simpler than the full synthesis prompt since it operates on already-synthesized prose rather than raw moments.
 **Token budget consideration:** 30 moments × ~200 tokens each (title + summary + metadata + transcript excerpt) = ~6k tokens of moment data + ~2k system prompt = ~8k input tokens. Well within what the model handles reliably. The merge call takes 2-4 partial pages of prose (~3-5k tokens total) — also very manageable.
 ### Change 2: Category casing normalization in stage 4
 Normalize `topic_category` values before storing classification results in Redis.
 **In `stage4_classification()` (`backend/pipeline/stages.py`):**
 After parsing the `ClassificationResult` from the LLM, apply title-case normalization to each moment's `topic_category`:
 ```python
 category = cls_result.topic_category.strip().title()
 # "Sound design" -> "Sound Design"
 # "sound design" -> "Sound Design"  
 # "SOUND DESIGN" -> "Sound Design"
 ```
 This is a one-line fix. It prevents the "Sound design" / "Sound Design" split that inflated the group sizes and would reduce the COPYCATT video from 198 → 198 moments in a single normalized "Sound Design" group — still too many without chunking, but it eliminates the class of bug where moments scatter across near-duplicate categories.
 **Also apply in stage 5 as a safety net:** When building the `groups` dict, normalize the category key:
 ```python
 category = cls_info.get("topic_category", "Uncategorized").strip().title()
 ```
 This handles data already in Redis from prior stage 4 runs without requiring reprocessing.
 ### Change 3: Estimated token pre-check before LLM call
 Before making the synthesis LLM call, estimate the total tokens (prompt + expected output) and log a warning if it exceeds a safety threshold. This doesn't block the call — chunking handles the splitting — but it provides observability for tuning `SYNTHESIS_CHUNK_SIZE`.
 **In the synthesis loop, after building `user_prompt`:**
 ```python
 estimated_input = estimate_tokens(system_prompt) + estimate_tokens(user_prompt)
 if estimated_input > 15000:
    logger.warning(
        "Stage 5: Large synthesis input for category '%s' video_id=%s: "
        "~%d input tokens, %d moments. Consider reducing SYNTHESIS_CHUNK_SIZE.",
        category, video_id, estimated_input, len(moment_group),
    )
 ```
 ## Files to Modify
 | File | Change |
 |------|--------|
 | `backend/pipeline/stages.py` | Chunk logic in `stage5_synthesis()`, casing normalization in `stage4_classification()` and `stage5_synthesis()` grouping |
 | `backend/pipeline/llm_client.py` | No changes needed — `estimate_max_tokens()` already handles per-call estimation |
 | `backend/config.py` | Add `synthesis_chunk_size: int = 30` setting |
 | `prompts/stage5_merge.txt` | New prompt for merging partial technique pages |
 | `backend/schemas.py` | No changes — `SynthesisResult` schema works for both chunk and merge calls |
 ## Testing
 1. **Unit test:** Mock the LLM and verify that a 90-moment group gets split into 3 chunks of 30, each producing a `SynthesisResult`, followed by a merge call.
 2. **Integration test:** Retrigger the COPYCATT "Sound Design - Everything In 2 Hours Speedrun" video and confirm it completes stage 5 without `LLMTruncationError`.
 3. **Regression test:** Retrigger a small video (e.g., Skope "Understanding Waveshapers", 9 moments) and confirm behavior is unchanged — no chunking triggered, same output.
 ## Rollback
 `SYNTHESIS_CHUNK_SIZE` can be set very high (e.g., 9999) to effectively disable chunking without a code change. The casing normalization is backward-compatible — it only affects new pipeline runs.
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,53 @@
 # ─── Chrysopedia Environment Variables ───
 # Copy to .env and fill in secrets before docker compose up
 # PostgreSQL
 POSTGRES_USER=chrysopedia
 POSTGRES_PASSWORD=changeme
 POSTGRES_DB=chrysopedia
 # Redis (Celery broker) — container-internal, no secret needed
 REDIS_URL=redis://chrysopedia-redis:6379/0
 # LLM endpoint (OpenAI-compatible — OpenWebUI on FYN DGX)
 # Use /api (not /api/v1) so calls route through OpenWebUI's tracked proxy for analytics
 LLM_API_URL=https://chat.forgetyour.name/api
 LLM_API_KEY=sk-changeme
 LLM_MODEL=fyn-llm-agent-chat
 LLM_FALLBACK_URL=https://chat.forgetyour.name/api
 LLM_FALLBACK_MODEL=fyn-llm-agent-chat
 # Per-stage LLM model overrides (optional — defaults to LLM_MODEL)
 # Modality: "chat" = standard JSON mode, "thinking" = reasoning model (strips <think> tags)
 # Stages 2 (segmentation) and 4 (classification) are mechanical — use fast chat model
 # Stages 3 (extraction) and 5 (synthesis) need reasoning — use thinking model
 LLM_STAGE2_MODEL=fyn-llm-agent-chat
 LLM_STAGE2_MODALITY=chat
 LLM_STAGE3_MODEL=fyn-llm-agent-think
 LLM_STAGE3_MODALITY=thinking
 LLM_STAGE4_MODEL=fyn-llm-agent-chat
 LLM_STAGE4_MODALITY=chat
 LLM_STAGE5_MODEL=fyn-llm-agent-think
 LLM_STAGE5_MODALITY=thinking
 # Max tokens for LLM responses (OpenWebUI defaults to 1000 — pipeline needs much more)
 LLM_MAX_TOKENS=65536
 # Embedding endpoint (Ollama container in the compose stack)
 EMBEDDING_API_URL=http://chrysopedia-ollama:11434/v1
 EMBEDDING_MODEL=nomic-embed-text
 # Qdrant (container-internal)
 QDRANT_URL=http://chrysopedia-qdrant:6333
 QDRANT_COLLECTION=chrysopedia
 # Application
 APP_ENV=production
 APP_LOG_LEVEL=info
 # File storage paths (inside container, bind-mounted to /vmPool/r/services/chrysopedia_data)
 TRANSCRIPT_STORAGE_PATH=/data/transcripts
 VIDEO_METADATA_PATH=/data/video_meta
 # Review mode toggle (true = moments require admin review before publishing)
 REVIEW_MODE=true
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,29 @@
 .bg-shell/
 .gsd/
 # ── GSD baseline (auto-generated) ──
 .DS_Store
 Thumbs.db
 *.swp
 *.swo
 *~
 .idea/
 .vscode/
 *.code-workspace
 .env
 .env.*
 !.env.example
 node_modules/
 .next/
 dist/
 build/
 __pycache__/
 *.pyc
 .venv/
 venv/
 target/
 vendor/
 *.log
 coverage/
 .cache/
 tmp/
--- a/.mcp.json
+++ b/.mcp.json
@ -0,0 +1,7 @@
 {
  "mcpServers": {
    "chrysopedia": {
      "url": "http://ub01:8101/mcp"
    }
  }
 }
--- a/.planning/M016-ux-brand-reading-experience.md
+++ b/.planning/M016-ux-brand-reading-experience.md
@ -0,0 +1,143 @@
 # M016: UX Polish, Brand & Reading Experience
 > **Stream:** Frontend — intended for a dedicated GSD milestone instance
 > **Conflict zone:** `frontend/src/` only — no backend Python changes
 > **Deploy cadence:** commit-build-redeploy after each slice completion
 ---
 ## Goal
 Modernize the public site's visual identity and reading experience, fix pipeline admin UI bugs, and establish a brand baseline (logo, favicon, OG tags). Every change in this milestone lives entirely in the frontend — CSS, React components, and static assets.
 ---
 ## Slice Breakdown
 ### S01: Landing Page Visual Fixes (Quick Wins)
 **Risk:** Low | **Effort:** Small | **Files:** `App.css`, `Home.tsx`
 Research found 5 concrete bugs/inconsistencies on the homepage:
 | # | Issue | Fix |
 |---|-------|-----|
 | 1 | Duplicate `.btn` rule at App.css:3185 overrides CTA sizing (renders 131x38 instead of ~195x48) | Remove or merge the duplicate `.btn` block |
 | 2 | `.home-featured` uses `border-image` which kills `border-radius` — card renders square | Replace with pseudo-element gradient border technique |
 | 3 | Three different `max-width` tracks (36rem, 42rem, none) create jagged center column | Unify to 42rem for all content sections |
 | 4 | Vertical spacing irregular — Random→Featured gap is only 8px vs 24px elsewhere | Normalize section margins to 1.5rem |
 | 5 | Two `border-radius` values (0.5rem vs 0.625rem) on home cards | Unify to 0.625rem |
 **Verification:** Visual screenshot comparison before/after on desktop (1280px) and mobile (375px).
 ---
 ### S02: Pipeline Admin UI Fixes
 **Risk:** Low | **Effort:** Small-Medium | **Files:** `AdminPipeline.tsx`, `App.css`
 Four issues identified with root causes already diagnosed:
 | # | Issue | Root Cause | Fix |
 |---|-------|-----------|-----|
 | 1 | Most-recent run won't collapse / flickers | `expandedRunId` in `load()` useCallback dependency array (line 729) causes race condition — collapsing sets null, which triggers load recreation, which re-expands | Remove `expandedRunId` from dependency array + use `useRef` for initial-load tracking |
 | 2 | Mobile job cards show vertical text ("C h e e") | `.pipeline-video__creator` missing overflow rules (App.css:4477) | Add `overflow: hidden; text-overflow: ellipsis; white-space: nowrap` (matches `.pipeline-video__filename` pattern) |
 | 3 | No stage direction chevrons | Pipeline stages listed without visual flow indicator | Add CSS chevron/arrow between stage indicators using `::after` pseudo-elements or inline SVG |
 | 4 | Filter text box should be replaced with button group | Current text input for status filter; should be "ALL \| Not Started \| In Progress \| Complete" buttons, end-aligned | Replace `<input>` with `<div className="filter-buttons">` flexbox, `justify-content: flex-end`, verify vertical alignment against adjacent elements |
 | 5 | Creators dropdown never populates (422 error) | Frontend requests `fetchCreators({ limit: 200 })` but backend validates `le=100` (line 1126) | Change to `limit: 100` |
 **Verification:** Test collapse toggle on most-recent run, resize to 375px and check creator name truncation, confirm chevrons render between stages, confirm filter buttons align right and sit level with row, confirm creator filter dropdown populates.
 ---
 ### S03: Brand Minimum (Favicon, OG Tags, Logo)
 **Risk:** Low | **Effort:** Small | **Files:** `index.html`, `App.tsx`, `App.css`, new static assets
 The site currently has:
 - No favicon (browser default icon)
 - No OG meta tags (no preview image when sharing URL via text/Discord)
 - No logo next to "Chrysopedia" in the header
 Tasks:
 1. **Design a simple logo** — something that fits the dark theme + cyan accent aesthetic. Could be a stylized book/page icon, a knowledge/crystal motif matching the "chryso-" (gold) prefix, or an abstract mark. Generate an SVG.
 2. **Add favicon** — export logo as favicon.ico + apple-touch-icon + 192/512 PNG for PWA manifest
 3. **Add OG meta tags** — `og:title`, `og:description`, `og:image`, `og:url`, `twitter:card` in index.html. Create a 1200x630 OG image using the logo + brand colors.
 4. **Place logo in header** — render the SVG inline next to "Chrysopedia" text with appropriate sizing
 **Verification:** Share URL in Discord/iMessage and confirm preview card renders. Check favicon in browser tab. Visual check logo in header.
 ---
 ### S04: ToC Modernization
 **Risk:** Medium | **Effort:** Medium | **Files:** `TableOfContents.tsx`, `App.css`, `TechniquePage.tsx`
 Research identified these dated elements:
 - CSS counter numbering ("1.", "1.2") — biggest offender
 - Boxed card container with solid border
 - Uppercase "CONTENTS" label
 - No active-section highlighting
 - Underline-only hover states
 Modernization plan:
 1. **Remove numbered counters** — switch to unordered list with clean indentation
 2. **Replace box border** with left accent bar (`border-left: 2px solid var(--color-accent)`)
 3. **Change heading** from "CONTENTS" to "On this page" in sentence case
 4. **Add hover background** (`rgba(34, 211, 238, 0.08)`) instead of underline
 5. **Add IntersectionObserver** — track which section heading is in the viewport, highlight the corresponding ToC entry with accent left-border + brighter text color
 6. **Make ToC sticky** — position it at the top of the existing sidebar (above Key Moments), `position: sticky; top: 1.5rem`
 **Verification:** Navigate to a technique page with 4+ sections. Scroll through — ToC should highlight current section. ToC stays visible in sidebar while scrolling. Hover states work. No numbering visible.
 ---
 ### S05: Sticky Reading Header
 **Risk:** Medium | **Effort:** Medium | **Files:** new `ReadingHeader.tsx`, `App.css`, `TechniquePage.tsx`
 New component that slides in when user scrolls past the article title:
 - Shows: article title (truncated) + current section name
 - `position: sticky; top: 0; z-index: 50`
 - Thin bar (~40px height), `var(--color-bg-header)` background, subtle bottom border
 - Hidden by default, slides in via `transform: translateY(-100%)` → `translateY(0)` transition
 - Uses IntersectionObserver on the technique header element as show/hide trigger
 - Shares the section-tracking observer from S04's ToC work
 - On mobile: compact single-line with optional dropdown for section jump
 - Update `scroll-margin-top` values on section anchors to account for new header height
 **Verification:** Open long technique page. Scroll past title — reading header appears. Correct section name updates as you scroll. Works on mobile (375px). Doesn't break existing header.
 ---
 ### S06: Landing Page Personality Pass
 **Risk:** Low | **Effort:** Small | **Files:** `Home.tsx`, `App.css`
 After S01 fixes the bugs, this slice adds polish:
 1. **Hero tightening** — reduce hero bottom padding and how-it-works top margin to get content above the fold faster (~40-50px reclaim)
 2. **Stats scorecard enhancement** — animated count-up on first view (simple `requestAnimationFrame` counter), subtle glow on numbers
 3. **Random button treatment** — wrap in a small card with "Feeling adventurous?" tagline, or embed as secondary action inside Trending Searches
 4. **Section heading standardization** — pick one treatment (title-case with left accent bar) and apply consistently to "Recently Added", "Trending Searches", "Popular Topics"
 5. **Header brand accent** — apply `color: var(--color-accent)` or subtle gradient to "Chrysopedia" text (pairs with S03 logo)
 **Verification:** Visual check desktop + mobile. Stats animate on page load. Section headings consistent. Content peeks above fold on standard viewport.
 ---
 ## Dependency Graph
 ```
 S01 (landing fixes) ──→ S06 (personality pass)
 S02 (pipeline fixes)     [independent]
 S03 (brand minimum)  ──→ S06 (header accent uses logo)
 S04 (ToC modern)     ──→ S05 (reading header shares IntersectionObserver pattern)
 ```
 S01, S02, S03, S04 can all start in parallel. S05 depends on S04. S06 depends on S01 + S03.
 Recommended execution order: **S01 → S02 → S03 → S04 → S05 → S06**
 ---
 ## Out of Scope (for this milestone)
 - Creator landing page redesign (depends on backend social links API — see backend session workplan)
 - Auto-avatar images (backend-gated — see backend session workplan)
 - Embed tab (needs backend investigation first — see backend session workplan)
 - Any backend Python changes
 - M015 S04/S05 leftovers (trending searches block, admin dropdown hover) — should be completed by M015's own GSD session first
--- a/.planning/backend-perf-creator-features.md
+++ b/.planning/backend-perf-creator-features.md
@ -0,0 +1,160 @@
 # Backend Performance & Creator Features — Session Workplan
 > **Stream:** Backend — run in a separate Claude Code session while GSD executes M016
 > **Conflict zone:** `backend/` only — zero frontend changes
 > **Deploy cadence:** commit-build-redeploy after each task group
 ---
 ## Goal
 Fix critical performance bottlenecks in the admin pipeline API, implement auto-avatar fetching for creators, and lay the backend groundwork for creator landing page improvements. Purely backend Python — no frontend changes (the creators 422 fix moved to M016 S02).
 ---
 ## Task Groups (execute sequentially)
 ### 1. Critical Fixes (do first, immediate impact)
 #### 1a. Fix `worker-status` async event loop blocking
 **File:** `backend/routers/pipeline.py` lines 1266-1313
 **Problem:** The endpoint is `async def` but calls three synchronous Celery inspect methods (`inspector.active()`, `.reserved()`, `.stats()`), each with a 1-second timeout. This blocks the entire uvicorn event loop for ~3 seconds, stalling ALL concurrent API requests.
 **Evidence:** Every parallel API call during page load takes ~3,024ms instead of their natural 6-38ms.
 **Fix:**
 ```python
 import asyncio
 active = await asyncio.to_thread(inspector.active) or {}
 reserved = await asyncio.to_thread(inspector.reserved) or {}
 stats = await asyncio.to_thread(inspector.stats) or {}
 ```
 Also consider: reduce inspect timeout to 0.5s, add Redis cache with 10-15s TTL to avoid repeated slow calls.
 **Impact:** Page load drops from ~3s to ~50ms.
 #### ~~1b. Fix creators endpoint 422 error~~ → Moved to M016 S02
 The frontend `fetchCreators({ limit: 200 })` fix (AdminPipeline.tsx line 1126) is now part of M016's pipeline UI fixes slice, since that slice already owns AdminPipeline.tsx.
 ---
 ### 2. Pipeline API Performance
 #### 2a. Rewrite `stale-pages` to eliminate N+1 queries
 **File:** `backend/routers/pipeline.py` lines 906-973
 **Problem:** Loads ALL technique pages, then runs a separate query per page for latest version + another for creator name. Currently 44 extra queries for 22 pages. Fast today (~30ms) but degrades linearly.
 **Fix:** Single query joining `technique_pages` with a lateral/window subquery for latest version + join to creators.
 #### 2b. Add pagination to videos endpoint
 **File:** `backend/routers/pipeline.py` lines 72-188
 **Problem:** Returns all 43 videos (23KB) with no offset/limit. Client-side filtering only.
 **Fix:** Add `offset`, `limit`, `status`, `creator_id` query params. Return paginated response with `total` count. Frontend can adopt server-side filtering later (or the M016 frontend stream can wire it up).
 #### 2c. Optimize `_find_dynamic_related` for technique pages
 **File:** `backend/routers/techniques.py` lines 33-111
 **Problem:** Loads ALL technique pages into memory to score relatedness in Python. O(n) in total page count.
 **Fix:** Move scoring to SQL (keyword overlap via `ts_rank` or simple tag intersection) or cache related links per technique page with invalidation on new page creation.
 ---
 ### 3. Auto-Avatar Integration (TheAudioDB)
 Research concluded: **TheAudioDB is the best first source** — free, no OAuth, no caching restrictions, decent coverage for established artists.
 #### 3a. Database migration
 Add to `Creator` model:
 - `avatar_url: String | None` — stored image URL or local path
 - `avatar_source: String` — enum: `"generated"`, `"theaudiodb"`, `"manual"`
 - `avatar_fetched_at: DateTime | None` — for cache invalidation
 Alembic migration (will be 014 or later depending on M015 state).
 #### 3b. TheAudioDB lookup service
 New file: `backend/services/avatar.py`
 - `async def fetch_avatar(creator_name: str, creator_genres: list[str]) -> AvatarResult | None`
 - Calls `https://www.theaudiodb.com/api/v1/json/{key}/search.php?s={name}`
 - Confidence scoring: name match via `thefuzz.fuzz.token_sort_ratio` (threshold ≥ 85%), genre overlap as tiebreaker
 - Returns `strArtistThumb` URL if confident match, None otherwise
 - Handle: no results, multiple results, missing image fields
 #### 3c. Celery worker task
 New task: `tasks.fetch_creator_avatar`
 - Called on creator creation or manually via admin endpoint
 - Runs TheAudioDB lookup → downloads image → stores locally (or stores URL)
 - Updates `Creator.avatar_url`, `avatar_source`, `avatar_fetched_at`
 - Falls back gracefully — if no match, leaves fields null (frontend already renders generated SVG as fallback)
 #### 3d. Admin endpoint for manual trigger
 `POST /admin/pipeline/creators/{id}/fetch-avatar` — triggers the worker task for a specific creator.
 `POST /admin/pipeline/creators/fetch-all-avatars` — batch trigger for all creators missing avatars.
 #### 3e. Wire avatar_url into creators API responses
 Add `avatar_url` to `CreatorBrowseItem` and `CreatorDetail` schemas. The frontend `CreatorAvatar` component already accepts an `imageUrl` prop — it will just work once the API returns it.
 ---
 ### 4. Creator Landing Page API Groundwork
 #### 4a. Social links model + migration
 Add to `Creator` model:
 - `social_links: JSON | None` — structured as `{"spotify": "url", "instagram": "url", "bandcamp": "url", "website": "url", ...}`
 - `bio: Text | None` — short creator bio/description
 - `featured: Boolean` — flag for homepage featuring
 #### 4b. Creator detail endpoint enhancement
 Expand `GET /api/v1/creators/{slug}` to return:
 - `social_links`
 - `bio`
 - `avatar_url`
 - `technique_count`
 - Full technique list with titles, slugs, created_at
 - Genre breakdown
 #### 4c. Admin endpoint for creator profile editing
 `PUT /admin/pipeline/creators/{id}` — update `bio`, `social_links`, `featured` flag, manually set `avatar_url`.
 ---
 ### 5. Embed Tab Investigation
 The "Embed" tab under pipeline jobs is non-functional. Before building, need to:
 - Read the existing frontend component to understand what it expects
 - Determine what "Embed" should show (embedding vectors? embed codes? embedded content?)
 - If it's about Qdrant vector embeddings: add an endpoint to query embedding status per technique page
 - If it's about iframe embed codes: generate shareable snippet per technique
 **This task starts as investigation — scope will be defined after reading the code.**
 ---
 ## General Load Time Optimization (apply throughout)
 As each endpoint is touched, also consider:
 - Add `Cache-Control` headers for public GET endpoints (technique pages, creators, search suggestions)
 - Add Redis caching (30s-5min TTL) for expensive or frequently-hit endpoints
 - Ensure database indexes exist on commonly filtered/sorted columns
 - Consider adding `select_in_loading` for SQLAlchemy relationships to avoid implicit lazy loads
 ---
 ## Key Files
 | File | What changes |
 |------|-------------|
 | `backend/routers/pipeline.py` | worker-status async fix, stale-pages rewrite, videos pagination, avatar admin endpoints |
 | `backend/routers/creators.py` | Creator detail expansion, social links, admin editing |
 | `backend/routers/techniques.py` | Related techniques optimization |
 | `backend/models.py` | Creator model additions (avatar, social_links, bio) |
 | `backend/schemas.py` | New response schemas |
 | `backend/services/avatar.py` | New — TheAudioDB integration |
 | `backend/tasks.py` | New avatar fetch task |
 | `alembic/versions/014_*.py` | Migration for creator columns |
 ---
 ## Merge Coordination with M016
 These two streams have **zero file overlap:**
 - **M016 touches:** `frontend/src/` only — `App.css`, `Home.tsx`, `TechniquePage.tsx`, `TableOfContents.tsx`, `AdminPipeline.tsx`, new frontend components, static assets
 - **This session touches:** `backend/` only — routers, models, schemas, services, tasks, `alembic/`
 No merge conflicts expected. The creators 422 fix (the former single overlap point) now lives in M016 S02.
 For avatar/social-links frontend wiring: this session ships the API, M016 (or a follow-up) consumes it. No conflict — just sequencing.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,48 @@
 # Chrysopedia — Development Reference
 ## ⚠️ Canonical Development Directory
 **This is NOT the canonical development directory.**
 The production codebase and all future development happens on **ub01**:
 ```
 ssh ub01
 cd /vmPool/r/repos/xpltdco/chrysopedia
 ```
 **Git:** https://git.xpltd.co/xpltdco/chrysopedia (Forgejo, xpltdco org)
 ## Why?
 The Docker Compose stack runs on ub01 with bind mounts at `/vmPool/r/services/chrysopedia_*`. Development, deployment, and testing all happen from the ub01 clone. This directory (`/home/aux/projects/content-to-kb-automator`) was the initial workspace used during M001 development and should not be used for future work.
 ## Stack Info
 - **Web UI:** http://ub01:8096
 - **API Health:** http://ub01:8096/health
 - **PostgreSQL:** ub01:5433 (user: chrysopedia)
 - **Compose project:** xpltd_chrysopedia
 - **Compose path:** /vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml (symlink to repo)
 - **Services:** chrysopedia-db, chrysopedia-redis, chrysopedia-qdrant, chrysopedia-ollama, chrysopedia-api, chrysopedia-worker, chrysopedia-web-8096
 ## Quick Commands (on ub01)
 ```bash
 # Check status
 docker ps --filter name=chrysopedia
 # Rebuild and restart after code changes
 cd /vmPool/r/repos/xpltdco/chrysopedia
 git pull
 docker compose build && docker compose up -d
 # Run Alembic migrations
 docker exec chrysopedia-api alembic upgrade head
 # View worker logs
 docker logs -f chrysopedia-worker
 # View API logs
 docker logs -f chrysopedia-api
 ```
--- a/README.md
+++ b/README.md
@ -0,0 +1,320 @@
 # Chrysopedia
 > From *chrysopoeia* (alchemical transmutation of base material into gold) + *encyclopedia*.
 > Chrysopedia transmutes raw video content into refined, searchable production knowledge.
 A self-hosted knowledge extraction system for electronic music production content. Video libraries are transcribed with Whisper, analyzed through a multi-stage LLM pipeline, curated via an admin review workflow, and served through a search-first web UI designed for mid-session retrieval.
 ---
 ## Information Flow
 Content moves through six stages from raw video to searchable knowledge:
 ```
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 1 · Transcription                            [Desktop / GPU]    │
 │                                                                         │
 │  Video files → Whisper large-v3 (CUDA) → JSON transcripts              │
 │  Output: timestamped segments with speaker text                         │
 └────────────────────────────────┬────────────────────────────────────────┘
                                  │ JSON files (manual or folder watcher)
                                  ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 2 · Ingestion                                [API + Watcher]    │
 │                                                                         │
 │  POST /api/v1/ingest ← watcher auto-submits from /watch folder         │
 │  • Validate JSON structure                                              │
 │  • Compute content hash (SHA-256) for deduplication                     │
 │  • Find-or-create Creator from folder name                              │
 │  • Upsert SourceVideo (exact filename → content hash → fuzzy match)     │
 │  • Bulk-insert TranscriptSegment rows                                   │
 │  • Dispatch pipeline to Celery worker                                   │
 └────────────────────────────────┬────────────────────────────────────────┘
                                  │ Celery task: run_pipeline(video_id)
                                  ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 3 · LLM Extraction Pipeline                  [Celery Worker]    │
 │                                                                         │
 │  Four sequential LLM stages, each with its own prompt template:         │
 │                                                                         │
 │  3a. Segmentation — Split transcript into semantic topic boundaries     │
 │      Model: chat (fast)         Prompt: stage2_segmentation.txt         │
 │                                                                         │
 │  3b. Extraction — Identify key moments (title, summary, timestamps)     │
 │      Model: reasoning (think)   Prompt: stage3_extraction.txt           │
 │                                                                         │
 │  3c. Classification — Assign content types + extract plugin names       │
 │      Model: chat (fast)         Prompt: stage4_classification.txt       │
 │                                                                         │
 │  3d. Synthesis — Compose technique pages from approved moments          │
 │      Model: reasoning (think)   Prompt: stage5_synthesis.txt            │
 │                                                                         │
 │  Each stage emits PipelineEvent rows (tokens, duration, model, errors)  │
 └────────────────────────────────┬────────────────────────────────────────┘
                                  │ KeyMoment rows (review_status: pending)
                                  ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 4 · Review & Curation                        [Admin UI]         │
 │                                                                         │
 │  Admin reviews extracted KeyMoments before they become technique pages:  │
 │  • Approve — moment proceeds to synthesis                               │
 │  • Edit — correct title, summary, content type, plugins, then approve   │
 │  • Reject — moment is excluded from knowledge base                      │
 │  (When REVIEW_MODE=false, moments auto-approve and skip this stage)     │
 └────────────────────────────────┬────────────────────────────────────────┘
                                  │ Approved moments → Stage 3d synthesis
                                  ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 5 · Knowledge Base                           [Web UI]           │
 │                                                                         │
 │  TechniquePages — the primary output:                                   │
 │  • Structured body sections, signal chains, plugin lists                │
 │  • Linked to source KeyMoments with video timestamps                    │
 │  • Cross-referenced via RelatedTechniqueLinks                           │
 │  • Versioned (snapshots before each re-synthesis)                       │
 │  • Organized by topic taxonomy (6 categories from canonical_tags.yaml)  │
 └────────────────────────────────┬────────────────────────────────────────┘
                                  │
                                  ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  STAGE 6 · Search & Retrieval                       [Web UI]           │
 │                                                                         │
 │  • Semantic search: query → embedding → Qdrant vector similarity        │
 │  • Keyword fallback: ILIKE search on title/summary (300ms timeout)      │
 │  • Browse by topic hierarchy, creator, or content type                  │
 │  • Typeahead search from home page (debounced, top 5 results)           │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
 ---
 ## Architecture
 ```
 ┌──────────────────────────────────────────────────────────────────────────┐
 │  Desktop (GPU workstation — hal0022)                                     │
 │  whisper/transcribe.py → JSON transcripts → copy to /watch folder        │
 └────────────────────────────┬─────────────────────────────────────────────┘
                             │
                             ▼
 ┌──────────────────────────────────────────────────────────────────────────┐
 │  Docker Compose: xpltd_chrysopedia (ub01)                                │
 │  Network: chrysopedia (172.32.0.0/24)                                    │
 │                                                                          │
 │  ┌────────────┐  ┌─────────────┐  ┌───────────────┐  ┌──────────────┐  │
 │  │  PostgreSQL │  │    Redis    │  │    Qdrant     │  │    Ollama    │  │
 │  │  :5433      │  │  broker +   │  │  vector DB    │  │  embeddings  │  │
 │  │  7 entities │  │  cache      │  │  semantic     │  │  nomic-embed │  │
 │  └─────┬───────┘  └──────┬──────┘  └───────┬───────┘  └──────┬───────┘  │
 │        │                 │                 │                 │           │
 │  ┌─────┴─────────────────┴─────────────────┴─────────────────┴────────┐  │
 │  │                         FastAPI (API)                              │  │
 │  │  Ingest · Pipeline control · Review · Search · CRUD · Reports     │  │
 │  └──────────────────────────────┬────────────────────────────────────┘  │
 │                                 │                                       │
 │  ┌──────────────┐  ┌────────────┴───┐  ┌──────────────────────────┐    │
 │  │   Watcher    │  │  Celery Worker │  │     Web UI (React)       │    │
 │  │  /watch →    │  │  LLM pipeline  │  │  nginx → :8096           │    │
 │  │  auto-ingest │  │  stages 2-5    │  │  search-first interface  │    │
 │  └──────────────┘  └────────────────┘  └──────────────────────────┘    │
 └──────────────────────────────────────────────────────────────────────────┘
 ```
 ### Services
 | Service | Image | Port | Purpose |
 |---------|-------|------|---------|
 | `chrysopedia-db` | `postgres:16-alpine` | `5433 → 5432` | Primary data store |
 | `chrysopedia-redis` | `redis:7-alpine` | — | Celery broker + feature flag cache |
 | `chrysopedia-qdrant` | `qdrant/qdrant:v1.13.2` | — | Vector DB for semantic search |
 | `chrysopedia-ollama` | `ollama/ollama` | — | Embedding model server (nomic-embed-text) |
 | `chrysopedia-api` | `Dockerfile.api` | `8000` | FastAPI REST API |
 | `chrysopedia-worker` | `Dockerfile.api` | — | Celery worker (LLM pipeline) |
 | `chrysopedia-watcher` | `Dockerfile.api` | — | Folder monitor → auto-ingest |
 | `chrysopedia-web` | `Dockerfile.web` | `8096 → 80` | React frontend (nginx) |
 ### Data Model
 | Entity | Purpose |
 |--------|---------|
 | **Creator** | Artists/producers whose content is indexed |
 | **SourceVideo** | Video files processed by the pipeline (with content hash dedup) |
 | **TranscriptSegment** | Timestamped text segments from Whisper |
 | **KeyMoment** | Discrete insights extracted by LLM analysis |
 | **TechniquePage** | Synthesized knowledge pages — the primary output |
 | **TechniquePageVersion** | Snapshots before re-synthesis overwrites |
 | **RelatedTechniqueLink** | Cross-references between technique pages |
 | **Tag** | Hierarchical topic taxonomy |
 | **ContentReport** | User-submitted content issues |
 | **PipelineEvent** | Structured pipeline execution logs (tokens, timing, errors) |
 ---
 ## Quick Start
 ### Prerequisites
 - Docker ≥ 24.0 and Docker Compose ≥ 2.20
 - Python 3.10+ with NVIDIA GPU + CUDA (for Whisper transcription)
 ### Setup
 ```bash
 # Clone and configure
 git clone git@github.com:xpltdco/chrysopedia.git
 cd chrysopedia
 cp .env.example .env    # edit with real values
 # Start the stack
 docker compose up -d
 # Run database migrations
 docker exec chrysopedia-api alembic upgrade head
 # Pull the embedding model (first time only)
 docker exec chrysopedia-ollama ollama pull nomic-embed-text
 # Verify
 curl http://localhost:8096/health
 ```
 ### Transcribe videos
 ```bash
 cd whisper && pip install -r requirements.txt
 # Single file
 python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
 # Batch
 python transcribe.py --input ./videos/ --output-dir ./transcripts
 ```
 See [`whisper/README.md`](whisper/README.md) for full transcription docs.
 ---
 ## Environment Variables
 Copy `.env.example` to `.env`. Key groups:
 | Group | Variables | Notes |
 |-------|-----------|-------|
 | **Database** | `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` | Default user: `chrysopedia` |
 | **LLM** | `LLM_API_URL`, `LLM_API_KEY`, `LLM_MODEL` | OpenAI-compatible endpoint |
 | **LLM Fallback** | `LLM_FALLBACK_URL`, `LLM_FALLBACK_MODEL` | Automatic failover |
 | **Per-Stage Models** | `LLM_STAGE{2-5}_MODEL`, `LLM_STAGE{2-5}_MODALITY` | `chat` for fast stages, `thinking` for reasoning |
 | **Embedding** | `EMBEDDING_API_URL`, `EMBEDDING_MODEL` | Ollama nomic-embed-text |
 | **Vector DB** | `QDRANT_URL`, `QDRANT_COLLECTION` | Container-internal |
 | **Features** | `REVIEW_MODE`, `DEBUG_MODE` | Review gate + LLM I/O capture |
 | **Storage** | `TRANSCRIPT_STORAGE_PATH`, `VIDEO_METADATA_PATH` | Container bind mounts |
 ---
 ## API Endpoints
 ### Public
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Health check (DB connectivity) |
 | GET | `/api/v1/search?q=&scope=&limit=` | Semantic + keyword search |
 | GET | `/api/v1/techniques` | List technique pages |
 | GET | `/api/v1/techniques/{slug}` | Technique detail + key moments |
 | GET | `/api/v1/techniques/{slug}/versions` | Version history |
 | GET | `/api/v1/creators` | List creators (sort, genre filter) |
 | GET | `/api/v1/creators/{slug}` | Creator detail |
 | GET | `/api/v1/topics` | Topic hierarchy with counts |
 | GET | `/api/v1/videos` | List source videos |
 | POST | `/api/v1/reports` | Submit content report |
 ### Admin
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/review/queue` | Review queue (status filter) |
 | POST | `/api/v1/review/moments/{id}/approve` | Approve key moment |
 | POST | `/api/v1/review/moments/{id}/reject` | Reject key moment |
 | PUT | `/api/v1/review/moments/{id}` | Edit key moment |
 | POST | `/api/v1/admin/pipeline/trigger/{video_id}` | Trigger/retrigger pipeline |
 | GET | `/api/v1/admin/pipeline/events/{video_id}` | Pipeline event log |
 | GET | `/api/v1/admin/pipeline/token-summary/{video_id}` | Token usage by stage |
 | GET | `/api/v1/admin/pipeline/worker-status` | Celery worker status |
 | PUT | `/api/v1/admin/pipeline/debug-mode` | Toggle debug mode |
 ### Ingest
 | Method | Path | Description |
 |--------|------|-------------|
 | POST | `/api/v1/ingest` | Upload Whisper JSON transcript |
 ---
 ## Development
 ```bash
 # Local backend (with Docker services)
 python -m venv .venv && source .venv/bin/activate
 pip install -r backend/requirements.txt
 docker compose up -d chrysopedia-db chrysopedia-redis
 alembic upgrade head
 cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # Database migrations
 alembic revision --autogenerate -m "describe_change"
 alembic upgrade head
 ```
 ### Project Structure
 ```
 chrysopedia/
 ├── backend/                 # FastAPI application
 │   ├── main.py              # Entry point, middleware, router mounting
 │   ├── config.py            # Pydantic Settings (all env vars)
 │   ├── models.py            # SQLAlchemy ORM models
 │   ├── schemas.py           # Pydantic request/response schemas
 │   ├── worker.py            # Celery app configuration
 │   ├── watcher.py           # Transcript folder watcher service
 │   ├── search_service.py    # Semantic search + keyword fallback
 │   ├── routers/             # API endpoint handlers
 │   ├── pipeline/            # LLM pipeline stages + clients
 │   │   ├── stages.py        # Stages 2-5 (Celery tasks)
 │   │   ├── llm_client.py    # OpenAI-compatible LLM client
 │   │   ├── embedding_client.py
 │   │   └── qdrant_client.py
 │   └── tests/
 ├── frontend/                # React + TypeScript + Vite
 │   └── src/
 │       ├── pages/           # Home, Search, Technique, Creator, Topic, Admin
 │       ├── components/      # Shared UI components
 │       └── api/             # Typed API clients
 ├── whisper/                 # Desktop transcription (Whisper large-v3)
 ├── docker/                  # Dockerfiles + nginx config
 ├── alembic/                 # Database migrations
 ├── config/                  # canonical_tags.yaml (topic taxonomy)
 ├── prompts/                 # LLM prompt templates (editable at runtime)
 ├── docker-compose.yml
 └── .env.example
 ```
 ---
 ## Deployment (ub01)
 ```bash
 ssh ub01
 cd /vmPool/r/repos/xpltdco/chrysopedia
 git pull && docker compose build && docker compose up -d
 ```
 | Resource | Location |
 |----------|----------|
 | Web UI | `http://ub01:8096` |
 | API | `http://ub01:8096/health` |
 | PostgreSQL | `ub01:5433` |
 | Compose config | `/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml` |
 | Persistent data | `/vmPool/r/services/chrysopedia_*` |
 XPLTD conventions: `xpltd_chrysopedia` project name, dedicated bridge network (`172.32.0.0/24`), bind mounts under `/vmPool/r/services/`, PostgreSQL on port `5433`.
--- a/alembic.ini
+++ b/alembic.ini
@ -0,0 +1,37 @@
 # Chrysopedia — Alembic configuration
 [alembic]
 script_location = alembic
 sqlalchemy.url = postgresql+asyncpg://chrysopedia:changeme@localhost:5433/chrysopedia
 [loggers]
 keys = root,sqlalchemy,alembic
 [handlers]
 keys = console
 [formatters]
 keys = generic
 [logger_root]
 level = WARN
 handlers = console
 [logger_sqlalchemy]
 level = WARN
 handlers =
 qualname = sqlalchemy.engine
 [logger_alembic]
 level = INFO
 handlers =
 qualname = alembic
 [handler_console]
 class = StreamHandler
 args = (sys.stderr,)
 level = NOTSET
 formatter = generic
 [formatter_generic]
 format = %(levelname)-5.5s [%(name)s] %(message)s
 datefmt = %H:%M:%S
--- a/alembic/env.py
+++ b/alembic/env.py
@ -0,0 +1,72 @@
 """Alembic env.py — async migration runner for Chrysopedia."""
 import asyncio
 import os
 import sys
 from logging.config import fileConfig
 from alembic import context
 from sqlalchemy import pool
 from sqlalchemy.ext.asyncio import async_engine_from_config
 # Ensure the backend package is importable
 # When running locally: alembic/ sits beside backend/, so ../backend works
 # When running in Docker: alembic/ is inside /app/ alongside the backend modules
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "backend"))
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
 from database import Base  # noqa: E402
 import models  # noqa: E402, F401  — registers all tables on Base.metadata
 config = context.config
 if config.config_file_name is not None:
    fileConfig(config.config_file_name)
 target_metadata = Base.metadata
 # Allow DATABASE_URL env var to override alembic.ini
 url_override = os.getenv("DATABASE_URL")
 if url_override:
    config.set_main_option("sqlalchemy.url", url_override)
 def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode — emit SQL to stdout."""
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )
    with context.begin_transaction():
        context.run_migrations()
 def do_run_migrations(connection):
    context.configure(connection=connection, target_metadata=target_metadata)
    with context.begin_transaction():
        context.run_migrations()
 async def run_async_migrations() -> None:
    """Run migrations in 'online' mode with an async engine."""
    connectable = async_engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )
    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)
    await connectable.dispose()
 def run_migrations_online() -> None:
    asyncio.run(run_async_migrations())
 if context.is_offline_mode():
    run_migrations_offline()
 else:
    run_migrations_online()
--- a/alembic/script.py.mako
+++ b/alembic/script.py.mako
@ -0,0 +1,25 @@
 """${message}
 Revision ID: ${up_revision}
 Revises: ${down_revision | comma,n}
 Create Date: ${create_date}
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 ${imports if imports else ""}
 # revision identifiers, used by Alembic.
 revision: str = ${repr(up_revision)}
 down_revision: Union[str, None] = ${repr(down_revision)}
 branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
 depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
 def upgrade() -> None:
    ${upgrades if upgrades else "pass"}
 def downgrade() -> None:
    ${downgrades if downgrades else "pass"}
--- a/alembic/versions/001_initial.py
+++ b/alembic/versions/001_initial.py
@ -0,0 +1,171 @@
 """initial schema — 7 core entities
 Revision ID: 001_initial
 Revises:
 Create Date: 2026-03-29
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import ARRAY, JSONB, UUID
 # revision identifiers, used by Alembic.
 revision: str = "001_initial"
 down_revision: Union[str, None] = None
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    # ── Enum types ───────────────────────────────────────────────────────
    content_type = sa.Enum(
        "tutorial", "livestream", "breakdown", "short_form",
        name="content_type",
    )
    processing_status = sa.Enum(
        "pending", "transcribed", "extracted", "reviewed", "published",
        name="processing_status",
    )
    key_moment_content_type = sa.Enum(
        "technique", "settings", "reasoning", "workflow",
        name="key_moment_content_type",
    )
    review_status = sa.Enum(
        "pending", "approved", "edited", "rejected",
        name="review_status",
    )
    source_quality = sa.Enum(
        "structured", "mixed", "unstructured",
        name="source_quality",
    )
    page_review_status = sa.Enum(
        "draft", "reviewed", "published",
        name="page_review_status",
    )
    relationship_type = sa.Enum(
        "same_technique_other_creator", "same_creator_adjacent", "general_cross_reference",
        name="relationship_type",
    )
    # ── creators ─────────────────────────────────────────────────────────
    op.create_table(
        "creators",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("name", sa.String(255), nullable=False),
        sa.Column("slug", sa.String(255), nullable=False, unique=True),
        sa.Column("genres", ARRAY(sa.String), nullable=True),
        sa.Column("folder_name", sa.String(255), nullable=False),
        sa.Column("view_count", sa.Integer, nullable=False, server_default="0"),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    # ── source_videos ────────────────────────────────────────────────────
    op.create_table(
        "source_videos",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("creator_id", UUID(as_uuid=True), sa.ForeignKey("creators.id", ondelete="CASCADE"), nullable=False),
        sa.Column("filename", sa.String(500), nullable=False),
        sa.Column("file_path", sa.String(1000), nullable=False),
        sa.Column("duration_seconds", sa.Integer, nullable=True),
        sa.Column("content_type", content_type, nullable=False),
        sa.Column("transcript_path", sa.String(1000), nullable=True),
        sa.Column("processing_status", processing_status, nullable=False, server_default="pending"),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index("ix_source_videos_creator_id", "source_videos", ["creator_id"])
    # ── transcript_segments ──────────────────────────────────────────────
    op.create_table(
        "transcript_segments",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("source_video_id", UUID(as_uuid=True), sa.ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False),
        sa.Column("start_time", sa.Float, nullable=False),
        sa.Column("end_time", sa.Float, nullable=False),
        sa.Column("text", sa.Text, nullable=False),
        sa.Column("segment_index", sa.Integer, nullable=False),
        sa.Column("topic_label", sa.String(255), nullable=True),
    )
    op.create_index("ix_transcript_segments_video_id", "transcript_segments", ["source_video_id"])
    # ── technique_pages (must come before key_moments due to FK) ─────────
    op.create_table(
        "technique_pages",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("creator_id", UUID(as_uuid=True), sa.ForeignKey("creators.id", ondelete="CASCADE"), nullable=False),
        sa.Column("title", sa.String(500), nullable=False),
        sa.Column("slug", sa.String(500), nullable=False, unique=True),
        sa.Column("topic_category", sa.String(255), nullable=False),
        sa.Column("topic_tags", ARRAY(sa.String), nullable=True),
        sa.Column("summary", sa.Text, nullable=True),
        sa.Column("body_sections", JSONB, nullable=True),
        sa.Column("signal_chains", JSONB, nullable=True),
        sa.Column("plugins", ARRAY(sa.String), nullable=True),
        sa.Column("source_quality", source_quality, nullable=True),
        sa.Column("view_count", sa.Integer, nullable=False, server_default="0"),
        sa.Column("review_status", page_review_status, nullable=False, server_default="draft"),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index("ix_technique_pages_creator_id", "technique_pages", ["creator_id"])
    op.create_index("ix_technique_pages_topic_category", "technique_pages", ["topic_category"])
    # ── key_moments ──────────────────────────────────────────────────────
    op.create_table(
        "key_moments",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("source_video_id", UUID(as_uuid=True), sa.ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False),
        sa.Column("technique_page_id", UUID(as_uuid=True), sa.ForeignKey("technique_pages.id", ondelete="SET NULL"), nullable=True),
        sa.Column("title", sa.String(500), nullable=False),
        sa.Column("summary", sa.Text, nullable=False),
        sa.Column("start_time", sa.Float, nullable=False),
        sa.Column("end_time", sa.Float, nullable=False),
        sa.Column("content_type", key_moment_content_type, nullable=False),
        sa.Column("plugins", ARRAY(sa.String), nullable=True),
        sa.Column("review_status", review_status, nullable=False, server_default="pending"),
        sa.Column("raw_transcript", sa.Text, nullable=True),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index("ix_key_moments_source_video_id", "key_moments", ["source_video_id"])
    op.create_index("ix_key_moments_technique_page_id", "key_moments", ["technique_page_id"])
    # ── related_technique_links ──────────────────────────────────────────
    op.create_table(
        "related_technique_links",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("source_page_id", UUID(as_uuid=True), sa.ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False),
        sa.Column("target_page_id", UUID(as_uuid=True), sa.ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False),
        sa.Column("relationship", relationship_type, nullable=False),
        sa.UniqueConstraint("source_page_id", "target_page_id", "relationship", name="uq_technique_link"),
    )
    # ── tags ─────────────────────────────────────────────────────────────
    op.create_table(
        "tags",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("name", sa.String(255), nullable=False, unique=True),
        sa.Column("category", sa.String(255), nullable=False),
        sa.Column("aliases", ARRAY(sa.String), nullable=True),
    )
    op.create_index("ix_tags_category", "tags", ["category"])
 def downgrade() -> None:
    op.drop_table("tags")
    op.drop_table("related_technique_links")
    op.drop_table("key_moments")
    op.drop_table("technique_pages")
    op.drop_table("transcript_segments")
    op.drop_table("source_videos")
    op.drop_table("creators")
    # Drop enum types
    for name in [
        "relationship_type", "page_review_status", "source_quality",
        "review_status", "key_moment_content_type", "processing_status",
        "content_type",
    ]:
        sa.Enum(name=name).drop(op.get_bind(), checkfirst=True)
--- a/alembic/versions/002_technique_page_versions.py
+++ b/alembic/versions/002_technique_page_versions.py
@ -0,0 +1,39 @@
 """technique_page_versions table for article versioning
 Revision ID: 002_technique_page_versions
 Revises: 001_initial
 Create Date: 2026-03-30
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB, UUID
 # revision identifiers, used by Alembic.
 revision: str = "002_technique_page_versions"
 down_revision: Union[str, None] = "001_initial"
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    op.create_table(
        "technique_page_versions",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("technique_page_id", UUID(as_uuid=True), sa.ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False),
        sa.Column("version_number", sa.Integer, nullable=False),
        sa.Column("content_snapshot", JSONB, nullable=False),
        sa.Column("pipeline_metadata", JSONB, nullable=True),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index(
        "ix_technique_page_versions_page_version",
        "technique_page_versions",
        ["technique_page_id", "version_number"],
        unique=True,
    )
 def downgrade() -> None:
    op.drop_table("technique_page_versions")
--- a/alembic/versions/003_content_reports.py
+++ b/alembic/versions/003_content_reports.py
@ -0,0 +1,47 @@
 """Create content_reports table.
 Revision ID: 003_content_reports
 Revises: 002_technique_page_versions
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "003_content_reports"
 down_revision = "002_technique_page_versions"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "content_reports",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("content_type", sa.String(50), nullable=False),
        sa.Column("content_id", UUID(as_uuid=True), nullable=True),
        sa.Column("content_title", sa.String(500), nullable=True),
        sa.Column("report_type", sa.Enum(
            "inaccurate", "missing_info", "wrong_attribution", "formatting", "other",
            name="report_type", create_constraint=True,
        ), nullable=False),
        sa.Column("description", sa.Text(), nullable=False),
        sa.Column("status", sa.Enum(
            "open", "acknowledged", "resolved", "dismissed",
            name="report_status", create_constraint=True,
        ), nullable=False, server_default="open"),
        sa.Column("admin_notes", sa.Text(), nullable=True),
        sa.Column("page_url", sa.String(1000), nullable=True),
        sa.Column("created_at", sa.DateTime(), server_default=sa.func.now(), nullable=False),
        sa.Column("resolved_at", sa.DateTime(), nullable=True),
    )
    op.create_index("ix_content_reports_status_created", "content_reports", ["status", "created_at"])
    op.create_index("ix_content_reports_content", "content_reports", ["content_type", "content_id"])
 def downgrade() -> None:
    op.drop_index("ix_content_reports_content")
    op.drop_index("ix_content_reports_status_created")
    op.drop_table("content_reports")
    sa.Enum(name="report_status").drop(op.get_bind(), checkfirst=True)
    sa.Enum(name="report_type").drop(op.get_bind(), checkfirst=True)
--- a/alembic/versions/004_pipeline_events.py
+++ b/alembic/versions/004_pipeline_events.py
@ -0,0 +1,37 @@
 """Create pipeline_events table.
 Revision ID: 004_pipeline_events
 Revises: 003_content_reports
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID, JSONB
 revision = "004_pipeline_events"
 down_revision = "003_content_reports"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "pipeline_events",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("video_id", UUID(as_uuid=True), nullable=False, index=True),
        sa.Column("stage", sa.String(50), nullable=False),
        sa.Column("event_type", sa.String(30), nullable=False),
        sa.Column("prompt_tokens", sa.Integer(), nullable=True),
        sa.Column("completion_tokens", sa.Integer(), nullable=True),
        sa.Column("total_tokens", sa.Integer(), nullable=True),
        sa.Column("model", sa.String(100), nullable=True),
        sa.Column("duration_ms", sa.Integer(), nullable=True),
        sa.Column("payload", JSONB(), nullable=True),
        sa.Column("created_at", sa.DateTime(), server_default=sa.func.now(), nullable=False),
    )
    # Composite index for event log queries (video + newest first)
    op.create_index("ix_pipeline_events_video_created", "pipeline_events", ["video_id", "created_at"])
 def downgrade() -> None:
    op.drop_index("ix_pipeline_events_video_created")
    op.drop_table("pipeline_events")
--- a/alembic/versions/005_content_hash.py
+++ b/alembic/versions/005_content_hash.py
@ -0,0 +1,29 @@
 """Add content_hash to source_videos for duplicate detection.
 Revision ID: 005_content_hash
 Revises: 004_pipeline_events
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "005_content_hash"
 down_revision = "004_pipeline_events"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "source_videos",
        sa.Column("content_hash", sa.String(64), nullable=True),
    )
    op.create_index(
        "ix_source_videos_content_hash",
        "source_videos",
        ["content_hash"],
    )
 def downgrade() -> None:
    op.drop_index("ix_source_videos_content_hash")
    op.drop_column("source_videos", "content_hash")
--- a/alembic/versions/006_debug_columns.py
+++ b/alembic/versions/006_debug_columns.py
@ -0,0 +1,33 @@
 """Add debug LLM I/O capture columns to pipeline_events.
 Revision ID: 006_debug_columns
 Revises: 005_content_hash
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "006_debug_columns"
 down_revision = "005_content_hash"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "pipeline_events",
        sa.Column("system_prompt_text", sa.Text(), nullable=True),
    )
    op.add_column(
        "pipeline_events",
        sa.Column("user_prompt_text", sa.Text(), nullable=True),
    )
    op.add_column(
        "pipeline_events",
        sa.Column("response_text", sa.Text(), nullable=True),
    )
 def downgrade() -> None:
    op.drop_column("pipeline_events", "response_text")
    op.drop_column("pipeline_events", "user_prompt_text")
    op.drop_column("pipeline_events", "system_prompt_text")
--- a/alembic/versions/007_drop_review_columns.py
+++ b/alembic/versions/007_drop_review_columns.py
@ -0,0 +1,30 @@
 """Drop review_status columns and enums.
 Revision ID: 007_drop_review_columns
 Revises: 006_debug_columns
 """
 from alembic import op
 revision = "007_drop_review_columns"
 down_revision = "006_debug_columns"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.drop_column("key_moments", "review_status")
    op.drop_column("technique_pages", "review_status")
    op.execute("DROP TYPE IF EXISTS review_status")
    op.execute("DROP TYPE IF EXISTS page_review_status")
    # Collapse 'reviewed' into 'published' for any existing rows
    op.execute(
        "UPDATE source_videos SET processing_status = 'published' "
        "WHERE processing_status = 'reviewed'"
    )
 def downgrade() -> None:
    op.execute("CREATE TYPE review_status AS ENUM ('pending', 'approved', 'edited', 'rejected')")
    op.execute("CREATE TYPE page_review_status AS ENUM ('draft', 'reviewed', 'published')")
    op.add_column("key_moments", op.Column("review_status", op.Enum("pending", "approved", "edited", "rejected", name="review_status"), server_default="pending", nullable=False))
    op.add_column("technique_pages", op.Column("review_status", op.Enum("draft", "reviewed", "published", name="page_review_status"), server_default="draft", nullable=False))
--- a/alembic/versions/008_rename_processing_status.py
+++ b/alembic/versions/008_rename_processing_status.py
@ -0,0 +1,79 @@
 """Rename processing_status values to user-meaningful lifecycle states.
 Old: pending, transcribed, extracted, published
 New: not_started, queued, processing, error, complete
 Uses text column conversion to avoid PG enum ADD VALUE transaction restriction.
 Revision ID: 008_rename_processing_status
 Revises: 007_drop_review_columns
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "008_rename_processing_status"
 down_revision = "007_drop_review_columns"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # 1. Drop server default (it references the old enum type)
    op.alter_column("source_videos", "processing_status", server_default=None)
    # 2. Convert column to text to break free of the old enum
    op.alter_column(
        "source_videos", "processing_status",
        type_=sa.Text(),
        existing_type=sa.Enum(name="processing_status"),
        postgresql_using="processing_status::text",
    )
    # 3. Drop old enum type
    op.execute("DROP TYPE IF EXISTS processing_status")
    # 4. Rename values in the text column
    op.execute("UPDATE source_videos SET processing_status = 'not_started' WHERE processing_status = 'pending'")
    op.execute("UPDATE source_videos SET processing_status = 'queued' WHERE processing_status = 'transcribed'")
    op.execute("UPDATE source_videos SET processing_status = 'processing' WHERE processing_status = 'extracted'")
    op.execute("UPDATE source_videos SET processing_status = 'complete' WHERE processing_status = 'published'")
    # 5. Create new enum type
    processing_status = sa.Enum(
        "not_started", "queued", "processing", "error", "complete",
        name="processing_status",
    )
    processing_status.create(op.get_bind(), checkfirst=True)
    # 6. Convert column back to enum with new default
    op.alter_column(
        "source_videos", "processing_status",
        type_=processing_status,
        existing_type=sa.Text(),
        postgresql_using="processing_status::processing_status",
        server_default="not_started",
    )
 def downgrade() -> None:
    op.alter_column("source_videos", "processing_status", server_default=None)
    op.alter_column(
        "source_videos", "processing_status",
        type_=sa.Text(),
        existing_type=sa.Enum(name="processing_status"),
        postgresql_using="processing_status::text",
    )
    op.execute("DROP TYPE IF EXISTS processing_status")
    op.execute("UPDATE source_videos SET processing_status = 'pending' WHERE processing_status = 'not_started'")
    op.execute("UPDATE source_videos SET processing_status = 'transcribed' WHERE processing_status = 'queued'")
    op.execute("UPDATE source_videos SET processing_status = 'extracted' WHERE processing_status = 'processing'")
    op.execute("UPDATE source_videos SET processing_status = 'published' WHERE processing_status = 'complete'")
    old_enum = sa.Enum("pending", "transcribed", "extracted", "published", name="processing_status")
    old_enum.create(op.get_bind(), checkfirst=True)
    op.alter_column(
        "source_videos", "processing_status",
        type_=old_enum,
        existing_type=sa.Text(),
        postgresql_using="processing_status::processing_status",
        server_default="pending",
    )
--- a/alembic/versions/009_add_creator_hidden_flag.py
+++ b/alembic/versions/009_add_creator_hidden_flag.py
@ -0,0 +1,28 @@
 """Add hidden boolean flag to creators table.
 Marks test/internal creators as hidden so they are filtered from
 public API responses.
 Revision ID: 009_add_creator_hidden_flag
 Revises: 008_rename_processing_status
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "009_add_creator_hidden_flag"
 down_revision = "008_rename_processing_status"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "creators",
        sa.Column("hidden", sa.Boolean(), server_default="false", nullable=False),
    )
    # Mark known test creator as hidden
    op.execute("UPDATE creators SET hidden = true WHERE slug = 'testcreator'")
 def downgrade() -> None:
    op.drop_column("creators", "hidden")
--- a/alembic/versions/010_add_pipeline_runs.py
+++ b/alembic/versions/010_add_pipeline_runs.py
@ -0,0 +1,54 @@
 """Add pipeline_runs table and run_id FK on pipeline_events.
 Each pipeline trigger creates a run. Events are scoped to runs
 for clean per-execution audit trails.
 Revision ID: 010_add_pipeline_runs
 Revises: 009_add_creator_hidden_flag
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "010_add_pipeline_runs"
 down_revision = "009_add_creator_hidden_flag"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Create enums
    pipeline_run_trigger = sa.Enum(
        "manual", "clean_reprocess", "auto_ingest", "bulk",
        name="pipeline_run_trigger",
    )
    pipeline_run_status = sa.Enum(
        "running", "complete", "error", "cancelled",
        name="pipeline_run_status",
    )
    op.create_table(
        "pipeline_runs",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("video_id", UUID(as_uuid=True), sa.ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("run_number", sa.Integer, nullable=False),
        sa.Column("trigger", pipeline_run_trigger, nullable=False),
        sa.Column("status", pipeline_run_status, nullable=False, server_default="running"),
        sa.Column("started_at", sa.DateTime, nullable=False, server_default=sa.text("now()")),
        sa.Column("finished_at", sa.DateTime, nullable=True),
        sa.Column("error_stage", sa.String(50), nullable=True),
        sa.Column("total_tokens", sa.Integer, nullable=False, server_default="0"),
    )
    # Add run_id to pipeline_events (nullable for backward compat)
    op.add_column(
        "pipeline_events",
        sa.Column("run_id", UUID(as_uuid=True), sa.ForeignKey("pipeline_runs.id", ondelete="SET NULL"), nullable=True, index=True),
    )
 def downgrade() -> None:
    op.drop_column("pipeline_events", "run_id")
    op.drop_table("pipeline_runs")
    op.execute("DROP TYPE IF EXISTS pipeline_run_trigger")
    op.execute("DROP TYPE IF EXISTS pipeline_run_status")
--- a/alembic/versions/011_classification_cache_and_stage_rerun.py
+++ b/alembic/versions/011_classification_cache_and_stage_rerun.py
@ -0,0 +1,35 @@
 """Add classification_data JSONB column to source_videos and stage_rerun trigger.
 Persists stage 4 classification data in PostgreSQL alongside Redis cache,
 eliminating the 24-hour TTL data loss risk. Also adds the 'stage_rerun'
 trigger value for single-stage re-run support.
 Revision ID: 011_cls_cache_rerun
 Revises: 010_add_pipeline_runs
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB
 revision = "011_cls_cache_rerun"
 down_revision = "010_add_pipeline_runs"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Add classification_data column to source_videos
    op.add_column(
        "source_videos",
        sa.Column("classification_data", JSONB, nullable=True),
    )
    # Add 'stage_rerun' to the pipeline_run_trigger enum
    # PostgreSQL enums require ALTER TYPE to add values
    op.execute("ALTER TYPE pipeline_run_trigger ADD VALUE IF NOT EXISTS 'stage_rerun'")
 def downgrade() -> None:
    op.drop_column("source_videos", "classification_data")
    # Note: PostgreSQL does not support removing values from enums.
    # The 'stage_rerun' value will remain but be unused after downgrade.
--- a/alembic/versions/012_multi_source_format.py
+++ b/alembic/versions/012_multi_source_format.py
@ -0,0 +1,55 @@
 """Add body_sections_format column and technique_page_videos association table.
 Supports multi-source technique pages: tracks which source videos contributed
 to a technique page, and marks the body_sections format version for future
 structured section layouts.
 Revision ID: 012_multi_source_fmt
 Revises: 011_cls_cache_rerun
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "012_multi_source_fmt"
 down_revision = "011_cls_cache_rerun"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Add body_sections_format to technique_pages with default for existing rows
    op.add_column(
        "technique_pages",
        sa.Column(
            "body_sections_format",
            sa.String(20),
            nullable=False,
            server_default="v1",
        ),
    )
    # Create technique_page_videos association table
    op.create_table(
        "technique_page_videos",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column(
            "technique_page_id",
            UUID(as_uuid=True),
            sa.ForeignKey("technique_pages.id", ondelete="CASCADE"),
            nullable=False,
        ),
        sa.Column(
            "source_video_id",
            UUID(as_uuid=True),
            sa.ForeignKey("source_videos.id", ondelete="CASCADE"),
            nullable=False,
        ),
        sa.Column("added_at", sa.TIMESTAMP(), server_default=sa.func.now(), nullable=False),
        sa.UniqueConstraint("technique_page_id", "source_video_id", name="uq_page_video"),
    )
 def downgrade() -> None:
    op.drop_table("technique_page_videos")
    op.drop_column("technique_pages", "body_sections_format")
--- a/alembic/versions/013_add_search_log.py
+++ b/alembic/versions/013_add_search_log.py
@ -0,0 +1,31 @@
 """Add search_log table for query analytics and popular searches.
 Revision ID: 013_add_search_log
 Revises: 012_multi_source_fmt
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "013_add_search_log"
 down_revision = "012_multi_source_fmt"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "search_log",
        sa.Column("id", sa.Integer, primary_key=True, autoincrement=True),
        sa.Column("query", sa.String(500), nullable=False),
        sa.Column("scope", sa.String(50), nullable=False),
        sa.Column("result_count", sa.Integer, nullable=False, server_default="0"),
        sa.Column("created_at", sa.TIMESTAMP(), server_default=sa.func.now(), nullable=False),
    )
    op.create_index("ix_search_log_query", "search_log", ["query"])
    op.create_index("ix_search_log_created_at", "search_log", ["created_at"])
 def downgrade() -> None:
    op.drop_index("ix_search_log_created_at", table_name="search_log")
    op.drop_index("ix_search_log_query", table_name="search_log")
    op.drop_table("search_log")
--- a/alembic/versions/014_add_creator_avatar.py
+++ b/alembic/versions/014_add_creator_avatar.py
@ -0,0 +1,24 @@
 """Add avatar columns to creators table.
 Revision ID: 014_add_creator_avatar
 Revises: 013_add_search_log
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "014_add_creator_avatar"
 down_revision = "013_add_search_log"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column("creators", sa.Column("avatar_url", sa.String(1000), nullable=True))
    op.add_column("creators", sa.Column("avatar_source", sa.String(50), nullable=True))
    op.add_column("creators", sa.Column("avatar_fetched_at", sa.TIMESTAMP(), nullable=True))
 def downgrade() -> None:
    op.drop_column("creators", "avatar_fetched_at")
    op.drop_column("creators", "avatar_source")
    op.drop_column("creators", "avatar_url")
--- a/alembic/versions/015_add_creator_profile.py
+++ b/alembic/versions/015_add_creator_profile.py
@ -0,0 +1,25 @@
 """Add bio, social_links, and featured columns to creators table.
 Revision ID: 015_add_creator_profile
 Revises: 014_add_creator_avatar
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB
 revision = "015_add_creator_profile"
 down_revision = "014_add_creator_avatar"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column("creators", sa.Column("bio", sa.Text(), nullable=True))
    op.add_column("creators", sa.Column("social_links", JSONB(), nullable=True))
    op.add_column("creators", sa.Column("featured", sa.Boolean(), server_default="false", nullable=False))
 def downgrade() -> None:
    op.drop_column("creators", "featured")
    op.drop_column("creators", "social_links")
    op.drop_column("creators", "bio")
--- a/alembic/versions/016_add_users_and_invite_codes.py
+++ b/alembic/versions/016_add_users_and_invite_codes.py
@ -0,0 +1,52 @@
 """Add users and invite_codes tables for creator authentication.
 Revision ID: 016_add_users_and_invite_codes
 Revises: 015_add_creator_profile
 """
 from alembic import op
 revision = "016_add_users_and_invite_codes"
 down_revision = "015_add_creator_profile"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Use raw SQL to avoid SQLAlchemy's Enum double-creation bug with asyncpg
    op.execute("""
        DO $$ BEGIN
            CREATE TYPE user_role AS ENUM ('creator', 'admin');
        EXCEPTION WHEN duplicate_object THEN NULL;
        END $$
    """)
    op.execute("""
        CREATE TABLE IF NOT EXISTS users (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            email VARCHAR(255) NOT NULL UNIQUE,
            hashed_password VARCHAR(255) NOT NULL,
            display_name VARCHAR(255) NOT NULL,
            role user_role NOT NULL DEFAULT 'creator',
            creator_id UUID REFERENCES creators(id) ON DELETE SET NULL,
            is_active BOOLEAN NOT NULL DEFAULT TRUE,
            created_at TIMESTAMP NOT NULL DEFAULT now(),
            updated_at TIMESTAMP NOT NULL DEFAULT now()
        )
    """)
    op.execute("""
        CREATE TABLE IF NOT EXISTS invite_codes (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            code VARCHAR(100) NOT NULL UNIQUE,
            uses_remaining INTEGER NOT NULL DEFAULT 1,
            created_by UUID REFERENCES users(id) ON DELETE SET NULL,
            expires_at TIMESTAMP,
            created_at TIMESTAMP NOT NULL DEFAULT now()
        )
    """)
 def downgrade() -> None:
    op.execute("DROP TABLE IF EXISTS invite_codes")
    op.execute("DROP TABLE IF EXISTS users")
    op.execute("DROP TYPE IF EXISTS user_role")
--- a/alembic/versions/017_add_consent_tables.py
+++ b/alembic/versions/017_add_consent_tables.py
@ -0,0 +1,51 @@
 """Add video_consents and consent_audit_log tables for per-video consent management.
 Revision ID: 017_add_consent_tables
 Revises: 016_add_users_and_invite_codes
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "017_add_consent_tables"
 down_revision = "016_add_users_and_invite_codes"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Create video_consents table
    op.create_table(
        "video_consents",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("source_video_id", UUID(as_uuid=True), sa.ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False),
        sa.Column("creator_id", UUID(as_uuid=True), sa.ForeignKey("creators.id", ondelete="CASCADE"), nullable=False),
        sa.Column("kb_inclusion", sa.Boolean(), nullable=False, server_default="false"),
        sa.Column("training_usage", sa.Boolean(), nullable=False, server_default="false"),
        sa.Column("public_display", sa.Boolean(), nullable=False, server_default="true"),
        sa.Column("updated_by", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="RESTRICT"), nullable=False),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
        sa.UniqueConstraint("source_video_id", name="uq_video_consent_video"),
    )
    # Create consent_audit_log table
    op.create_table(
        "consent_audit_log",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("video_consent_id", UUID(as_uuid=True), sa.ForeignKey("video_consents.id", ondelete="CASCADE"), nullable=False),
        sa.Column("version", sa.Integer(), nullable=False),
        sa.Column("field_name", sa.String(50), nullable=False),
        sa.Column("old_value", sa.Boolean(), nullable=True),
        sa.Column("new_value", sa.Boolean(), nullable=False),
        sa.Column("changed_by", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="RESTRICT"), nullable=False),
        sa.Column("ip_address", sa.String(45), nullable=True),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index("ix_consent_audit_log_video_consent_id", "consent_audit_log", ["video_consent_id"])
 def downgrade() -> None:
    op.drop_index("ix_consent_audit_log_video_consent_id", table_name="consent_audit_log")
    op.drop_table("consent_audit_log")
    op.drop_table("video_consents")
--- a/alembic/versions/018_add_impersonation_log.py
+++ b/alembic/versions/018_add_impersonation_log.py
@ -0,0 +1,37 @@
 """Add impersonation_log table for admin impersonation audit trail.
 Revision ID: 018_add_impersonation_log
 Revises: 017_add_consent_tables
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "018_add_impersonation_log"
 down_revision = "017_add_consent_tables"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "impersonation_log",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
        sa.Column("admin_user_id", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="CASCADE"), nullable=False),
        sa.Column("target_user_id", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="CASCADE"), nullable=False),
        sa.Column("action", sa.String(10), nullable=False),  # 'start' or 'stop'
        sa.Column("ip_address", sa.String(45), nullable=True),
        sa.Column("created_at", sa.DateTime, server_default=sa.func.now(), nullable=False),
    )
    op.create_index("ix_impersonation_log_admin", "impersonation_log", ["admin_user_id"])
    op.create_index("ix_impersonation_log_target", "impersonation_log", ["target_user_id"])
    op.create_index("ix_impersonation_log_created", "impersonation_log", ["created_at"])
 def downgrade() -> None:
    op.drop_index("ix_impersonation_log_created")
    op.drop_index("ix_impersonation_log_target")
    op.drop_index("ix_impersonation_log_admin")
    op.drop_table("impersonation_log")
--- a/alembic/versions/019_add_highlight_candidates.py
+++ b/alembic/versions/019_add_highlight_candidates.py
@ -0,0 +1,44 @@
 """Add highlight_candidates table for highlight detection scoring.
 Revision ID: 019_add_highlight_candidates
 Revises: 018_add_impersonation_log
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 revision = "019_add_highlight_candidates"
 down_revision = "018_add_impersonation_log"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Pure SQL — idempotent with IF NOT EXISTS / exception guards
    op.execute("DO $$ BEGIN CREATE TYPE highlight_status AS ENUM ('candidate', 'approved', 'rejected'); EXCEPTION WHEN duplicate_object THEN NULL; END $$")
    op.execute("""
        CREATE TABLE IF NOT EXISTS highlight_candidates (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            key_moment_id UUID NOT NULL UNIQUE REFERENCES key_moments(id) ON DELETE CASCADE,
            source_video_id UUID NOT NULL REFERENCES source_videos(id) ON DELETE CASCADE,
            score FLOAT NOT NULL,
            score_breakdown JSONB,
            duration_secs FLOAT NOT NULL,
            status highlight_status NOT NULL DEFAULT 'candidate',
            created_at TIMESTAMP NOT NULL DEFAULT now(),
            updated_at TIMESTAMP NOT NULL DEFAULT now()
        )
    """)
    op.execute("CREATE INDEX IF NOT EXISTS ix_highlight_candidates_source_video_id ON highlight_candidates (source_video_id)")
    op.execute("CREATE INDEX IF NOT EXISTS ix_highlight_candidates_score_desc ON highlight_candidates (score DESC)")
    op.execute("CREATE INDEX IF NOT EXISTS ix_highlight_candidates_status ON highlight_candidates (status)")
 def downgrade() -> None:
    op.drop_index("ix_highlight_candidates_status")
    op.drop_index("ix_highlight_candidates_score_desc")
    op.drop_index("ix_highlight_candidates_source_video_id")
    op.drop_table("highlight_candidates")
    sa.Enum(name="highlight_status").drop(op.get_bind(), checkfirst=True)
--- a/alembic/versions/020_add_chapter_status_and_sort_order.py
+++ b/alembic/versions/020_add_chapter_status_and_sort_order.py
@ -0,0 +1,29 @@
 """Add chapter_status and sort_order columns to key_moments.
 Revision ID: 020_add_chapter_status_and_sort_order
 Revises: 019_add_highlight_candidates
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "020_add_chapter_status_and_sort_order"
 down_revision = "019_add_highlight_candidates"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Pure SQL to avoid SQLAlchemy enum creation hooks
    op.execute("DO $$ BEGIN CREATE TYPE chapter_status AS ENUM ('draft', 'approved', 'hidden'); EXCEPTION WHEN duplicate_object THEN NULL; END $$")
    op.execute("ALTER TABLE key_moments ADD COLUMN IF NOT EXISTS chapter_status chapter_status NOT NULL DEFAULT 'draft'")
    op.execute("ALTER TABLE key_moments ADD COLUMN IF NOT EXISTS sort_order INTEGER NOT NULL DEFAULT 0")
    op.execute("CREATE INDEX IF NOT EXISTS ix_key_moments_chapter_status ON key_moments (chapter_status)")
 def downgrade() -> None:
    op.drop_index("ix_key_moments_chapter_status")
    op.drop_column("key_moments", "sort_order")
    op.drop_column("key_moments", "chapter_status")
    sa.Enum(name="chapter_status").drop(op.get_bind(), checkfirst=True)
--- a/alembic/versions/021_add_highlight_trim_columns.py
+++ b/alembic/versions/021_add_highlight_trim_columns.py
@ -0,0 +1,24 @@
 """Add trim_start and trim_end columns to highlight_candidates.
 Revision ID: 021_add_highlight_trim_columns
 Revises: 020_add_chapter_status_and_sort_order
 """
 from alembic import op
 import sqlalchemy as sa
 revision = "021_add_highlight_trim_columns"
 down_revision = "020_add_chapter_status_and_sort_order"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column("highlight_candidates", sa.Column("trim_start", sa.Float(), nullable=True))
    op.add_column("highlight_candidates", sa.Column("trim_end", sa.Float(), nullable=True))
 def downgrade() -> None:
    op.drop_column("highlight_candidates", "trim_end")
    op.drop_column("highlight_candidates", "trim_start")
--- a/alembic/versions/022_add_creator_follows.py
+++ b/alembic/versions/022_add_creator_follows.py
@ -0,0 +1,31 @@
 """Add creator_follows table for user follow system.
 Revision ID: 022_add_creator_follows
 Revises: 021_add_highlight_trim_columns
 """
 from alembic import op
 revision = "022_add_creator_follows"
 down_revision = "021_add_highlight_trim_columns"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.execute("""
        CREATE TABLE IF NOT EXISTS creator_follows (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
            creator_id UUID NOT NULL REFERENCES creators(id) ON DELETE CASCADE,
            created_at TIMESTAMP NOT NULL DEFAULT now(),
            CONSTRAINT uq_creator_follow_user_creator UNIQUE (user_id, creator_id)
        )
    """)
    op.execute("CREATE INDEX IF NOT EXISTS ix_creator_follows_user_id ON creator_follows (user_id)")
    op.execute("CREATE INDEX IF NOT EXISTS ix_creator_follows_creator_id ON creator_follows (creator_id)")
 def downgrade() -> None:
    op.execute("DROP TABLE IF EXISTS creator_follows")
--- a/alembic/versions/023_add_personality_profile.py
+++ b/alembic/versions/023_add_personality_profile.py
@ -0,0 +1,21 @@
 """Add personality_profile JSONB column to creators.
 Revision ID: 023_add_personality_profile
 Revises: 022_add_creator_follows
 """
 from alembic import op
 revision = "023_add_personality_profile"
 down_revision = "022_add_creator_follows"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.execute("ALTER TABLE creators ADD COLUMN IF NOT EXISTS personality_profile JSONB")
 def downgrade() -> None:
    op.execute("ALTER TABLE creators DROP COLUMN IF EXISTS personality_profile")
--- a/alembic/versions/024_add_posts_and_attachments.py
+++ b/alembic/versions/024_add_posts_and_attachments.py
@ -0,0 +1,44 @@
 """Add posts and post_attachments tables.
 Revision ID: 024_add_posts_and_attachments
 Revises: 023_add_personality_profile
 """
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB, UUID
 from alembic import op
 revision = "024_add_posts_and_attachments"
 down_revision = "023_add_personality_profile"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "posts",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("creator_id", UUID(as_uuid=True), sa.ForeignKey("creators.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("title", sa.String(500), nullable=False),
        sa.Column("body_json", JSONB, nullable=False),
        sa.Column("is_published", sa.Boolean, nullable=False, server_default="false"),
        sa.Column("created_at", sa.DateTime, nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime, nullable=False, server_default=sa.func.now()),
    )
    op.create_table(
        "post_attachments",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("post_id", UUID(as_uuid=True), sa.ForeignKey("posts.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("filename", sa.String(500), nullable=False),
        sa.Column("object_key", sa.String(1000), nullable=False),
        sa.Column("content_type", sa.String(255), nullable=False),
        sa.Column("size_bytes", sa.BigInteger, nullable=False),
        sa.Column("created_at", sa.DateTime, nullable=False, server_default=sa.func.now()),
    )
 def downgrade() -> None:
    op.drop_table("post_attachments")
    op.drop_table("posts")
--- a/alembic/versions/025_add_generated_shorts.py
+++ b/alembic/versions/025_add_generated_shorts.py
@ -0,0 +1,45 @@
 """Add generated_shorts table with format_preset and short_status enums.
 Revision ID: 025_add_generated_shorts
 Revises: 024_add_posts_and_attachments
 """
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 from alembic import op
 revision = "025_add_generated_shorts"
 down_revision = "024_add_posts_and_attachments"
 branch_labels = None
 depends_on = None
 format_preset_enum = sa.Enum("vertical", "square", "horizontal", name="format_preset")
 short_status_enum = sa.Enum("pending", "processing", "complete", "failed", name="short_status")
 def upgrade() -> None:
    format_preset_enum.create(op.get_bind(), checkfirst=True)
    short_status_enum.create(op.get_bind(), checkfirst=True)
    op.create_table(
        "generated_shorts",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("highlight_candidate_id", UUID(as_uuid=True), sa.ForeignKey("highlight_candidates.id", ondelete="CASCADE"), nullable=False, index=True),
        sa.Column("format_preset", format_preset_enum, nullable=False),
        sa.Column("minio_object_key", sa.String(1000), nullable=True),
        sa.Column("duration_secs", sa.Float, nullable=True),
        sa.Column("width", sa.Integer, nullable=False),
        sa.Column("height", sa.Integer, nullable=False),
        sa.Column("file_size_bytes", sa.BigInteger, nullable=True),
        sa.Column("status", short_status_enum, nullable=False, server_default="pending"),
        sa.Column("error_message", sa.Text, nullable=True),
        sa.Column("created_at", sa.DateTime, nullable=False, server_default=sa.func.now()),
        sa.Column("updated_at", sa.DateTime, nullable=False, server_default=sa.func.now()),
    )
 def downgrade() -> None:
    op.drop_table("generated_shorts")
    short_status_enum.drop(op.get_bind(), checkfirst=True)
    format_preset_enum.drop(op.get_bind(), checkfirst=True)
--- a/alembic/versions/026_add_share_token.py
+++ b/alembic/versions/026_add_share_token.py
@ -0,0 +1,45 @@
 """Add share_token column to generated_shorts for public sharing.
 Revision ID: 026_add_share_token
 Revises: 025_add_generated_shorts
 """
 import secrets
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 from alembic import op
 revision = "026_add_share_token"
 down_revision = "025_add_generated_shorts"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # Add nullable column first
    op.add_column(
        "generated_shorts",
        sa.Column("share_token", sa.String(16), nullable=True),
    )
    # Backfill existing complete shorts with unique tokens
    conn = op.get_bind()
    rows = conn.execute(
        sa.text("SELECT id FROM generated_shorts WHERE status = 'complete' AND share_token IS NULL")
    ).fetchall()
    for (row_id,) in rows:
        token = secrets.token_urlsafe(8)  # ~11 chars, fits in String(16)
        conn.execute(
            sa.text("UPDATE generated_shorts SET share_token = :token WHERE id = :id"),
            {"token": token, "id": row_id},
        )
    # Create unique index
    op.create_index("ix_generated_shorts_share_token", "generated_shorts", ["share_token"], unique=True)
 def downgrade() -> None:
    op.drop_index("ix_generated_shorts_share_token", table_name="generated_shorts")
    op.drop_column("generated_shorts", "share_token")
--- a/alembic/versions/027_add_captions_enabled.py
+++ b/alembic/versions/027_add_captions_enabled.py
@ -0,0 +1,30 @@
 """Add captions_enabled boolean to generated_shorts.
 Revision ID: 027_add_captions_enabled
 Revises: 026_add_share_token
 """
 import sqlalchemy as sa
 from alembic import op
 revision = "027_add_captions_enabled"
 down_revision = "026_add_share_token"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "generated_shorts",
        sa.Column(
            "captions_enabled",
            sa.Boolean(),
            nullable=False,
            server_default=sa.text("false"),
        ),
    )
 def downgrade() -> None:
    op.drop_column("generated_shorts", "captions_enabled")
--- a/alembic/versions/028_add_shorts_template.py
+++ b/alembic/versions/028_add_shorts_template.py
@ -0,0 +1,26 @@
 """Add shorts_template JSONB column to creators.
 Revision ID: 028_add_shorts_template
 Revises: 027_add_captions_enabled
 """
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB
 from alembic import op
 revision = "028_add_shorts_template"
 down_revision = "027_add_captions_enabled"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "creators",
        sa.Column("shorts_template", JSONB, nullable=True),
    )
 def downgrade() -> None:
    op.drop_column("creators", "shorts_template")
--- a/alembic/versions/029_add_email_digest.py
+++ b/alembic/versions/029_add_email_digest.py
@ -0,0 +1,48 @@
 """Add notification_preferences to users and email_digest_log table.
 Revision ID: 029_add_email_digest
 Revises: 028_add_shorts_template
 """
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import JSONB, UUID
 from alembic import op
 revision = "029_add_email_digest"
 down_revision = "028_add_shorts_template"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    # notification_preferences JSONB on users
    op.add_column(
        "users",
        sa.Column(
            "notification_preferences",
            JSONB,
            nullable=False,
            server_default='{"email_digests": true, "digest_frequency": "daily"}',
        ),
    )
    # email_digest_log table
    op.create_table(
        "email_digest_log",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("user_id", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="CASCADE"), nullable=False),
        sa.Column("digest_sent_at", sa.DateTime, server_default=sa.func.now(), nullable=False),
        sa.Column("content_summary", JSONB, nullable=True),
    )
    op.create_index(
        "ix_email_digest_log_user_sent",
        "email_digest_log",
        ["user_id", "digest_sent_at"],
    )
 def downgrade() -> None:
    op.drop_index("ix_email_digest_log_user_sent", table_name="email_digest_log")
    op.drop_table("email_digest_log")
    op.drop_column("users", "notification_preferences")
--- a/alembic/versions/030_add_onboarding_completed.py
+++ b/alembic/versions/030_add_onboarding_completed.py
@ -0,0 +1,31 @@
 """add_onboarding_completed
 Revision ID: 030_onboarding
 Revises: 029
 Create Date: 2026-04-04
 """
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers
 revision = "030_onboarding"
 down_revision = "029_add_email_digest"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.add_column(
        "users",
        sa.Column(
            "onboarding_completed",
            sa.Boolean(),
            server_default="false",
            nullable=False,
        ),
    )
 def downgrade() -> None:
    op.drop_column("users", "onboarding_completed")
--- a/alembic/versions/031_add_chat_usage_log.py
+++ b/alembic/versions/031_add_chat_usage_log.py
@ -0,0 +1,40 @@
 """add_chat_usage_log
 Revision ID: 031_chat_usage_log
 Revises: 030_onboarding
 Create Date: 2026-04-04
 """
 from alembic import op
 import sqlalchemy as sa
 from sqlalchemy.dialects.postgresql import UUID
 # revision identifiers
 revision = "031_chat_usage_log"
 down_revision = "030_onboarding"
 branch_labels = None
 depends_on = None
 def upgrade() -> None:
    op.create_table(
        "chat_usage_log",
        sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.func.gen_random_uuid()),
        sa.Column("user_id", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="SET NULL"), nullable=True),
        sa.Column("client_ip", sa.String(45), nullable=True),
        sa.Column("creator_slug", sa.String(255), nullable=True),
        sa.Column("query", sa.Text(), nullable=False),
        sa.Column("prompt_tokens", sa.Integer(), nullable=False, server_default="0"),
        sa.Column("completion_tokens", sa.Integer(), nullable=False, server_default="0"),
        sa.Column("total_tokens", sa.Integer(), nullable=False, server_default="0"),
        sa.Column("cascade_tier", sa.String(50), nullable=True),
        sa.Column("model", sa.String(100), nullable=True),
        sa.Column("latency_ms", sa.Float(), nullable=True),
        sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
    )
    op.create_index("ix_chat_usage_log_created_at", "chat_usage_log", ["created_at"])
 def downgrade() -> None:
    op.drop_index("ix_chat_usage_log_created_at", table_name="chat_usage_log")
    op.drop_table("chat_usage_log")
--- a/backend/auth.py
+++ b/backend/auth.py
@ -0,0 +1,193 @@
 """Authentication utilities — password hashing, JWT, FastAPI dependencies."""
 from __future__ import annotations
 import uuid
 from datetime import datetime, timedelta, timezone
 from typing import Annotated
 import bcrypt
 import jwt
 from fastapi import Depends, HTTPException, status
 from fastapi.security import OAuth2PasswordBearer
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 from config import get_settings
 from database import get_session
 from models import User, UserRole
 # ── Password hashing ─────────────────────────────────────────────────────────
 def hash_password(plain: str) -> str:
    """Hash a plaintext password with bcrypt."""
    return bcrypt.hashpw(plain.encode("utf-8"), bcrypt.gensalt()).decode("utf-8")
 def verify_password(plain: str, hashed: str) -> bool:
    """Verify a plaintext password against a bcrypt hash."""
    return bcrypt.checkpw(plain.encode("utf-8"), hashed.encode("utf-8"))
 # ── JWT ──────────────────────────────────────────────────────────────────────
 _ALGORITHM = "HS256"
 _ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 24  # 24 hours
 oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")
 def create_access_token(
    user_id: uuid.UUID | str,
    role: str,
    *,
    expires_minutes: int = _ACCESS_TOKEN_EXPIRE_MINUTES,
 ) -> str:
    """Create a signed JWT with user_id and role claims."""
    settings = get_settings()
    now = datetime.now(timezone.utc)
    payload = {
        "sub": str(user_id),
        "role": role,
        "iat": now,
        "exp": now + timedelta(minutes=expires_minutes),
    }
    return jwt.encode(payload, settings.app_secret_key, algorithm=_ALGORITHM)
 _IMPERSONATION_EXPIRE_MINUTES = 60  # 1 hour
 def create_impersonation_token(
    admin_user_id: uuid.UUID | str,
    target_user_id: uuid.UUID | str,
    target_role: str,
    *,
    write_mode: bool = False,
 ) -> str:
    """Create a scoped JWT for admin impersonation.
    The token has sub=target_user_id so get_current_user loads the target,
    plus original_user_id so the system knows it's impersonation.
    When write_mode is True, the token allows write operations.
    """
    settings = get_settings()
    now = datetime.now(timezone.utc)
    payload = {
        "sub": str(target_user_id),
        "role": target_role,
        "original_user_id": str(admin_user_id),
        "type": "impersonation",
        "iat": now,
        "exp": now + timedelta(minutes=_IMPERSONATION_EXPIRE_MINUTES),
    }
    if write_mode:
        payload["write_mode"] = True
    return jwt.encode(payload, settings.app_secret_key, algorithm=_ALGORITHM)
 def decode_access_token(token: str) -> dict:
    """Decode and validate a JWT. Raises on expiry or malformed tokens."""
    settings = get_settings()
    try:
        payload = jwt.decode(
            token,
            settings.app_secret_key,
            algorithms=[_ALGORITHM],
            options={"require": ["sub", "role", "exp"]},
        )
    except jwt.ExpiredSignatureError:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Token has expired",
        )
    except jwt.InvalidTokenError as exc:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail=f"Invalid token: {exc}",
        )
    return payload
 # ── FastAPI dependencies ─────────────────────────────────────────────────────
 async def get_current_user(
    token: Annotated[str, Depends(oauth2_scheme)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ) -> User:
    """Decode JWT, load User from DB, raise 401 if missing or inactive.
    If the token contains an original_user_id claim (impersonation),
    sets _impersonating_admin_id on the returned user object.
    """
    payload = decode_access_token(token)
    user_id = payload.get("sub")
    result = await session.execute(select(User).where(User.id == user_id))
    user = result.scalar_one_or_none()
    if user is None or not user.is_active:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="User not found or inactive",
        )
    # Attach impersonation metadata (non-column runtime attribute)
    user._impersonating_admin_id = payload.get("original_user_id")  # type: ignore[attr-defined]
    user._impersonation_write_mode = payload.get("write_mode", False)  # type: ignore[attr-defined]
    return user
 _optional_oauth2 = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login", auto_error=False)
 async def get_optional_user(
    token: Annotated[str | None, Depends(_optional_oauth2)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ) -> User | None:
    """Like get_current_user but returns None instead of 401 when no token."""
    if token is None:
        return None
    try:
        payload = decode_access_token(token)
    except HTTPException:
        return None
    user_id = payload.get("sub")
    result = await session.execute(select(User).where(User.id == user_id))
    user = result.scalar_one_or_none()
    if user is None or not user.is_active:
        return None
    return user
 def require_role(required_role: UserRole):
    """Return a dependency that checks the current user has the given role."""
    async def _check(
        current_user: Annotated[User, Depends(get_current_user)],
    ) -> User:
        if current_user.role != required_role:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail=f"Requires {required_role.value} role",
            )
        return current_user
    return _check
 async def reject_impersonation(
    current_user: Annotated[User, Depends(get_current_user)],
 ) -> User:
    """Dependency that blocks write operations during impersonation.
    If the impersonation token was issued with write_mode=True,
    writes are permitted.
    """
    admin_id = getattr(current_user, "_impersonating_admin_id", None)
    if admin_id is not None:
        write_mode = getattr(current_user, "_impersonation_write_mode", False)
        if not write_mode:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
                detail="Write operations are not allowed during impersonation",
            )
    return current_user
--- a/backend/chat_service.py
+++ b/backend/chat_service.py
@ -0,0 +1,519 @@
 """Chat service: retrieve context via search, stream LLM response as SSE events.
 Assembles a numbered context block from search results, then streams
 completion tokens from an OpenAI-compatible API. Yields SSE-formatted
 events: sources, token, done, and error.
 Multi-turn memory: When a conversation_id is provided, prior messages are
 loaded from Redis, injected into the LLM messages array, and the new
 user+assistant turn is appended after streaming completes. History is
 capped at 10 turn pairs (20 messages) and expires after 1 hour of
 inactivity.
 """
 from __future__ import annotations
 import json
 import logging
 import time
 import traceback
 import uuid
 from typing import Any, AsyncIterator
 import openai
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 from config import Settings
 from models import Creator
 from search_service import SearchService
 logger = logging.getLogger("chrysopedia.chat")
 _SYSTEM_PROMPT_TEMPLATE = """\
 You are Chrysopedia, an expert assistant for music production techniques — \
 synthesis, sound design, mixing, sampling, and audio processing.
 ## Rules
 - Use ONLY the numbered sources below. Do not invent facts.
 - Cite every factual claim inline with [N] immediately after the claim \
 (e.g. "Parallel compression adds sustain [2] while preserving transients [1].").
 - When sources disagree, present both perspectives with their citations.
 - If the sources lack enough information, say so honestly.
 ## Response format
 - Aim for 2–4 short paragraphs. Expand only when the question warrants detail.
 - Use bullet lists for steps, signal chains, or parameter lists.
 - **Bold** key terms on first mention.
 - Use audio/synthesis/mixing terminology naturally — do not over-explain \
 standard concepts (e.g. LFO, sidechain, wet/dry) unless the user asks.
 Sources:
 {context_block}
 """
 _MAX_CONTEXT_SOURCES = 10
 _MAX_TURN_PAIRS = 10
 _HISTORY_TTL_SECONDS = 3600  # 1 hour
 def _redis_key(conversation_id: str) -> str:
    return f"chrysopedia:chat:{conversation_id}"
 class ChatService:
    """Retrieve context from search, stream an LLM response with citations."""
    def __init__(self, settings: Settings, redis=None) -> None:
        self.settings = settings
        self._search = SearchService(settings)
        self._openai = openai.AsyncOpenAI(
            base_url=settings.llm_api_url,
            api_key=settings.llm_api_key,
        )
        self._fallback_openai = openai.AsyncOpenAI(
            base_url=settings.llm_fallback_url,
            api_key=settings.llm_api_key,
        )
        self._redis = redis
    async def _load_history(self, conversation_id: str) -> list[dict[str, str]]:
        """Load conversation history from Redis. Returns empty list on miss."""
        if not self._redis:
            return []
        try:
            raw = await self._redis.get(_redis_key(conversation_id))
            if raw:
                return json.loads(raw)
        except Exception:
            logger.warning("chat_history_load_error cid=%s", conversation_id, exc_info=True)
        return []
    async def _save_history(
        self,
        conversation_id: str,
        history: list[dict[str, str]],
        user_msg: str,
        assistant_msg: str,
    ) -> None:
        """Append the new turn pair and persist to Redis with TTL refresh."""
        if not self._redis:
            return
        history.append({"role": "user", "content": user_msg})
        history.append({"role": "assistant", "content": assistant_msg})
        # Cap at _MAX_TURN_PAIRS (keep most recent)
        if len(history) > _MAX_TURN_PAIRS * 2:
            history = history[-_MAX_TURN_PAIRS * 2:]
        try:
            await self._redis.set(
                _redis_key(conversation_id),
                json.dumps(history),
                ex=_HISTORY_TTL_SECONDS,
            )
        except Exception:
            logger.warning("chat_history_save_error cid=%s", conversation_id, exc_info=True)
    async def _inject_personality(
        self,
        system_prompt: str,
        db: AsyncSession,
        creator_name: str,
        weight: float,
    ) -> str:
        """Query creator personality_profile and append a voice block to the system prompt.
        Falls back to the unmodified prompt on DB error, missing creator, or null profile.
        """
        try:
            result = await db.execute(
                select(Creator).where(Creator.name == creator_name)
            )
            creator_row = result.scalars().first()
        except Exception:
            logger.warning("chat_personality_db_error creator=%r", creator_name, exc_info=True)
            return system_prompt
        if creator_row is None or creator_row.personality_profile is None:
            logger.debug("chat_personality_skip creator=%r reason=%s",
                         creator_name,
                         "not_found" if creator_row is None else "null_profile")
            return system_prompt
        profile = creator_row.personality_profile
        voice_block = _build_personality_block(creator_name, profile, weight)
        return system_prompt + "\n\n" + voice_block
    async def _log_usage(
        self,
        db: AsyncSession,
        user_id: Any | None,
        client_ip: str | None,
        creator_slug: str | None,
        query: str,
        usage: dict[str, int],
        cascade_tier: str,
        model: str,
        latency_ms: float,
    ) -> None:
        """Insert a ChatUsageLog row. Non-blocking — errors logged, not raised."""
        try:
            from models import ChatUsageLog
            log_entry = ChatUsageLog(
                user_id=user_id,
                client_ip=client_ip,
                creator_slug=creator_slug,
                query=query[:2000],  # truncate very long queries
                prompt_tokens=usage.get("prompt_tokens", 0),
                completion_tokens=usage.get("completion_tokens", 0),
                total_tokens=usage.get("total_tokens", 0),
                cascade_tier=cascade_tier,
                model=model,
                latency_ms=latency_ms,
            )
            db.add(log_entry)
            await db.commit()
        except Exception:
            logger.error(
                "chat_usage_log_insert_error user=%s ip=%s",
                user_id, client_ip, exc_info=True,
            )
            try:
                await db.rollback()
            except Exception:
                pass
    async def stream_response(
        self,
        query: str,
        db: AsyncSession,
        creator: str | None = None,
        conversation_id: str | None = None,
        personality_weight: float = 0.0,
        user_id: Any | None = None,
        client_ip: str | None = None,
    ) -> AsyncIterator[str]:
        """Yield SSE-formatted events for a chat query.
        Protocol:
        1. ``event: sources\ndata: <json array of citation metadata>\n\n``
        2. ``event: token\ndata: <text chunk>\n\n`` (repeated)
        3. ``event: done\ndata: <json with cascade_tier, conversation_id>\n\n``
        On error: ``event: error\ndata: <json with message>\n\n``
        """
        start = time.monotonic()
        # Assign conversation_id if not provided (single-turn becomes trackable)
        if conversation_id is None:
            conversation_id = str(uuid.uuid4())
        # ── 0. Load conversation history ────────────────────────────────
        history = await self._load_history(conversation_id)
        # ── 1. Retrieve context via search ──────────────────────────────
        try:
            search_result = await self._search.search(
                query=query,
                scope="all",
                limit=_MAX_CONTEXT_SOURCES,
                db=db,
                creator=creator,
            )
        except Exception:
            logger.exception("chat_search_error query=%r creator=%r", query, creator)
            yield _sse("error", {"message": "Search failed"})
            return
        items: list[dict[str, Any]] = search_result.get("items", [])
        cascade_tier: str = search_result.get("cascade_tier", "")
        # ── 2. Build citation metadata and context block ────────────────
        sources = _build_sources(items)
        context_block = _build_context_block(items)
        logger.info(
            "chat_search query=%r creator=%r cascade_tier=%s source_count=%d cid=%s",
            query, creator, cascade_tier, len(sources), conversation_id,
        )
        # Emit sources event first
        yield _sse("sources", sources)
        # ── 3. Stream LLM completion ────────────────────────────────────
        system_prompt = _SYSTEM_PROMPT_TEMPLATE.format(context_block=context_block)
        # Inject creator personality voice when weight > 0
        if personality_weight > 0 and creator:
            system_prompt = await self._inject_personality(
                system_prompt, db, creator, personality_weight,
            )
        # Scale temperature with personality weight: 0.3 (encyclopedic) → 0.5 (full personality)
        temperature = 0.3 + (personality_weight * 0.2)
        messages: list[dict[str, str]] = [
            {"role": "system", "content": system_prompt},
        ]
        # Inject conversation history between system prompt and current query
        messages.extend(history)
        messages.append({"role": "user", "content": query})
        accumulated_response = ""
        usage_data: dict[str, int] | None = None
        fallback_used = False
        try:
            stream = await self._openai.chat.completions.create(
                model=self.settings.llm_model,
                messages=messages,
                stream=True,
                stream_options={"include_usage": True},
                temperature=temperature,
                max_tokens=2048,
            )
            async for chunk in stream:
                # The final chunk with stream_options carries usage in chunk.usage
                if hasattr(chunk, "usage") and chunk.usage is not None:
                    usage_data = {
                        "prompt_tokens": chunk.usage.prompt_tokens or 0,
                        "completion_tokens": chunk.usage.completion_tokens or 0,
                        "total_tokens": chunk.usage.total_tokens or 0,
                    }
                choice = chunk.choices[0] if chunk.choices else None
                if choice and choice.delta and choice.delta.content:
                    text = choice.delta.content
                    accumulated_response += text
                    yield _sse("token", text)
        except (openai.APIConnectionError, openai.APITimeoutError, openai.InternalServerError) as exc:
            logger.warning(
                "chat_llm_fallback primary failed (%s: %s), retrying with fallback at %s",
                type(exc).__name__, exc, self.settings.llm_fallback_url,
            )
            fallback_used = True
            accumulated_response = ""
            usage_data = None
            try:
                stream = await self._fallback_openai.chat.completions.create(
                    model=self.settings.llm_fallback_model,
                    messages=messages,
                    stream=True,
                    stream_options={"include_usage": True},
                    temperature=temperature,
                    max_tokens=2048,
                )
                async for chunk in stream:
                    if hasattr(chunk, "usage") and chunk.usage is not None:
                        usage_data = {
                            "prompt_tokens": chunk.usage.prompt_tokens or 0,
                            "completion_tokens": chunk.usage.completion_tokens or 0,
                            "total_tokens": chunk.usage.total_tokens or 0,
                        }
                    choice = chunk.choices[0] if chunk.choices else None
                    if choice and choice.delta and choice.delta.content:
                        text = choice.delta.content
                        accumulated_response += text
                        yield _sse("token", text)
            except Exception:
                tb = traceback.format_exc()
                logger.error("chat_llm_error fallback also failed query=%r cid=%s\n%s", query, conversation_id, tb)
                yield _sse("error", {"message": "LLM generation failed"})
                return
        except Exception:
            tb = traceback.format_exc()
            logger.error("chat_llm_error query=%r cid=%s\n%s", query, conversation_id, tb)
            yield _sse("error", {"message": "LLM generation failed"})
            return
        # ── 4. Save conversation history ────────────────────────────────
        await self._save_history(conversation_id, history, query, accumulated_response)
        # ── 5. Log token usage ──────────────────────────────────────────
        latency_ms = (time.monotonic() - start) * 1000
        # Fallback: estimate tokens from character counts if stream_options not available
        if usage_data is None:
            prompt_chars = sum(len(m.get("content", "")) for m in messages)
            est_prompt = prompt_chars // 4
            est_completion = len(accumulated_response) // 4
            usage_data = {
                "prompt_tokens": est_prompt,
                "completion_tokens": est_completion,
                "total_tokens": est_prompt + est_completion,
            }
            logger.warning("chat_usage_estimated cid=%s (stream_options usage not available)", conversation_id)
        await self._log_usage(
            db=db,
            user_id=user_id,
            client_ip=client_ip,
            creator_slug=creator,
            query=query,
            usage=usage_data,
            cascade_tier=cascade_tier,
            model=self.settings.llm_fallback_model if fallback_used else self.settings.llm_model,
            latency_ms=latency_ms,
        )
        # ── 6. Done event ───────────────────────────────────────────────
        logger.info(
            "chat_done query=%r creator=%r cascade_tier=%s source_count=%d latency_ms=%.1f cid=%s tokens=%d",
            query, creator, cascade_tier, len(sources), latency_ms, conversation_id,
            usage_data.get("total_tokens", 0),
        )
        yield _sse("done", {"cascade_tier": cascade_tier, "conversation_id": conversation_id, "fallback_used": fallback_used})
 # ── Helpers ──────────────────────────────────────────────────────────────────
 def _sse(event: str, data: Any) -> str:
    """Format a single SSE event string."""
    payload = json.dumps(data) if not isinstance(data, str) else data
    return f"event: {event}\ndata: {payload}\n\n"
 def _build_sources(items: list[dict[str, Any]]) -> list[dict[str, str]]:
    """Build a numbered citation metadata list from search result items."""
    sources: list[dict[str, str]] = []
    for idx, item in enumerate(items, start=1):
        sources.append({
            "number": idx,
            "title": item.get("title", ""),
            "slug": item.get("technique_page_slug", "") or item.get("slug", ""),
            "creator_name": item.get("creator_name", ""),
            "topic_category": item.get("topic_category", ""),
            "summary": (item.get("summary", "") or "")[:200],
            "section_anchor": item.get("section_anchor", ""),
            "section_heading": item.get("section_heading", ""),
            "source_video_id": item.get("source_video_id", ""),
            "start_time": item.get("start_time"),
            "end_time": item.get("end_time"),
            "video_filename": item.get("video_filename", ""),
        })
    return sources
 def _build_context_block(items: list[dict[str, Any]]) -> str:
    """Build a numbered context block string for the LLM system prompt."""
    if not items:
        return "(No sources available)"
    lines: list[str] = []
    for idx, item in enumerate(items, start=1):
        title = item.get("title", "Untitled")
        creator = item.get("creator_name", "")
        summary = item.get("summary", "")
        section = item.get("section_heading", "")
        parts = [f"[{idx}] {title}"]
        if creator:
            parts.append(f"by {creator}")
        if section:
            parts.append(f"— {section}")
        header = " ".join(parts)
        lines.append(header)
        if summary:
            lines.append(f"   {summary}")
        lines.append("")
    return "\n".join(lines)
 def _build_personality_block(creator_name: str, profile: dict[str, Any], weight: float) -> str:
    """Build a personality voice injection block from a creator's personality_profile JSONB.
    The ``weight`` (0.0–1.0) controls progressive inclusion of personality
    fields via 5 tiers of continuous interpolation:
    - < 0.2:  no personality block (empty string)
    - 0.2–0.39: basic tone — teaching_style, formality, energy + subtle hint
    - 0.4–0.59: + descriptors, explanation_approach + adopt-voice instruction
    - 0.6–0.79: + signature_phrases (count scaled by weight) + creator-voice
    - 0.8–0.89: + distinctive_terms, sound_descriptions, sound_words,
                  self_references, pacing + fully-embody instruction
    - >= 0.9:  + full summary paragraph
    """
    if weight < 0.2:
        return ""
    vocab = profile.get("vocabulary", {})
    tone = profile.get("tone", {})
    style = profile.get("style_markers", {})
    teaching_style = tone.get("teaching_style", "")
    energy = tone.get("energy", "moderate")
    formality = tone.get("formality", "conversational")
    descriptors = tone.get("descriptors", [])
    phrases = vocab.get("signature_phrases", [])
    parts: list[str] = []
    # --- Tier 1 (0.2–0.39): basic tone ---
    if weight < 0.4:
        parts.append(
            f"When relevant, subtly reference {creator_name}'s communication style."
        )
    elif weight < 0.6:
        parts.append(f"Adopt {creator_name}'s tone and communication style.")
    elif weight < 0.8:
        parts.append(
            f"Respond as {creator_name} would, using their voice and manner."
        )
    else:
        parts.append(
            f"Fully embody {creator_name} — use their exact phrases, energy, and teaching approach."
        )
    if teaching_style:
        parts.append(f"Teaching style: {teaching_style}.")
    parts.append(f"Match their {formality} {energy} tone.")
    # --- Tier 2 (0.4+): descriptors, explanation_approach, uses_analogies, audience_engagement ---
    if weight >= 0.4:
        if descriptors:
            parts.append(f"Tone: {', '.join(descriptors[:5])}.")
        explanation = style.get("explanation_approach", "")
        if explanation:
            parts.append(f"Explanation approach: {explanation}.")
        if style.get("uses_analogies"):
            parts.append("Use analogies when helpful.")
        if style.get("audience_engagement"):
            parts.append(f"Audience engagement: {style['audience_engagement']}.")
    # --- Tier 3 (0.6+): signature phrases (count scaled by weight) ---
    if weight >= 0.6 and phrases:
        count = max(2, round(weight * len(phrases)))
        parts.append(f"Use their signature phrases: {', '.join(phrases[:count])}.")
    # --- Tier 4 (0.8+): distinctive_terms, sound_descriptions, sound_words, self_references, pacing ---
    if weight >= 0.8:
        distinctive = vocab.get("distinctive_terms", [])
        if distinctive:
            parts.append(f"Distinctive terms: {', '.join(distinctive)}.")
        sound_desc = vocab.get("sound_descriptions", [])
        if sound_desc:
            parts.append(f"Sound descriptions: {', '.join(sound_desc)}.")
        sound_words = style.get("sound_words", [])
        if sound_words:
            parts.append(f"Sound words: {', '.join(sound_words)}.")
        self_refs = style.get("self_references", "")
        if self_refs:
            parts.append(f"Self-references: {self_refs}.")
        pacing = style.get("pacing", "")
        if pacing:
            parts.append(f"Pacing: {pacing}.")
    # --- Tier 5 (0.9+): full summary paragraph ---
    if weight >= 0.9:
        summary = profile.get("summary", "")
        if summary:
            parts.append(summary)
    return " ".join(parts)
--- a/backend/config.py
+++ b/backend/config.py
@ -0,0 +1,115 @@
 """Application configuration loaded from environment variables."""
 from functools import lru_cache
 from pydantic_settings import BaseSettings
 class Settings(BaseSettings):
    """Chrysopedia API settings.
    Values are loaded from environment variables (or .env file via
    pydantic-settings' dotenv support).
    """
    # Database
    database_url: str = "postgresql+asyncpg://chrysopedia:changeme@localhost:5433/chrysopedia"
    # Redis
    redis_url: str = "redis://localhost:6379/0"
    # Application
    app_env: str = "development"
    app_log_level: str = "info"
    app_secret_key: str = "changeme-generate-a-real-secret"
    # CORS
    cors_origins: list[str] = ["*"]
    # LLM endpoint (OpenAI-compatible)
    llm_api_url: str = "http://localhost:11434/v1"
    llm_api_key: str = "sk-placeholder"
    llm_model: str = "fyn-llm-agent-chat"
    llm_fallback_url: str = "http://localhost:11434/v1"
    llm_fallback_model: str = "qwen2.5:7b"
    # Per-stage model overrides (optional — falls back to llm_model / "chat")
    llm_stage2_model: str | None = "fyn-llm-agent-chat"   # segmentation — mechanical, fast chat
    llm_stage2_modality: str = "chat"
    llm_stage3_model: str | None = "fyn-llm-agent-think"  # extraction — reasoning
    llm_stage3_modality: str = "thinking"
    llm_stage4_model: str | None = "fyn-llm-agent-chat"   # classification — mechanical, fast chat
    llm_stage4_modality: str = "chat"
    llm_stage5_model: str | None = "fyn-llm-agent-think"  # synthesis — reasoning
    llm_stage5_modality: str = "thinking"
    # Token limits — static across all stages
    llm_max_tokens_hard_limit: int = 96000   # Hard ceiling for dynamic estimator
    llm_max_tokens: int = 96000              # Fallback when no estimate is provided (must not exceed hard_limit)
    llm_temperature: float = 0.0             # Deterministic output for structured JSON extraction
    # Stage 5 synthesis chunking — max moments per LLM call before splitting
    synthesis_chunk_size: int = 30
    # Embedding endpoint
    embedding_api_url: str = "http://localhost:11434/v1"
    embedding_model: str = "nomic-embed-text"
    embedding_dimensions: int = 768
    # Qdrant
    qdrant_url: str = "http://localhost:6333"
    qdrant_collection: str = "chrysopedia"
    # LightRAG
    lightrag_url: str = "http://chrysopedia-lightrag:9621"
    lightrag_search_timeout: float = 2.0
    lightrag_min_query_length: int = 3
    # Prompt templates
    prompts_path: str = "./prompts"
    # Debug mode — when True, pipeline captures full LLM prompts and responses
    debug_mode: bool = False
    # MinIO (file storage for post attachments)
    minio_url: str = "chrysopedia-minio:9000"
    minio_access_key: str = "chrysopedia"
    minio_secret_key: str = "changeme-minio"
    minio_bucket: str = "chrysopedia"
    minio_secure: bool = False
    # File storage
    transcript_storage_path: str = "/data/transcripts"
    video_metadata_path: str = "/data/video_meta"
    video_source_path: str = "/videos"
    # SMTP (email digests)
    smtp_host: str = ""
    smtp_port: int = 587
    smtp_user: str = ""
    smtp_password: str = ""
    smtp_from_address: str = ""
    smtp_tls: bool = True
    # Public base URL for links in emails and external references
    base_url: str = "http://localhost:8096"
    # Rate limiting (per hour)
    rate_limit_user_per_hour: int = 30
    rate_limit_ip_per_hour: int = 10
    rate_limit_creator_per_hour: int = 60
    # Git commit SHA (set at Docker build time or via env var)
    git_commit_sha: str = "unknown"
    model_config = {
        "env_file": ".env",
        "env_file_encoding": "utf-8",
        "case_sensitive": False,
    }
@lru_cache
 def get_settings() -> Settings:
    """Return cached application settings (singleton)."""
    return Settings()
--- a/backend/database.py
+++ b/backend/database.py
@ -0,0 +1,26 @@
 """Database engine, session factory, and declarative base for Chrysopedia."""
 import os
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
 from sqlalchemy.orm import DeclarativeBase
 DATABASE_URL = os.getenv(
    "DATABASE_URL",
    "postgresql+asyncpg://chrysopedia:changeme@localhost:5433/chrysopedia",
 )
 engine = create_async_engine(DATABASE_URL, echo=False, pool_pre_ping=True)
 async_session = async_sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
 class Base(DeclarativeBase):
    """Declarative base for all ORM models."""
    pass
 async def get_session() -> AsyncSession:  # type: ignore[misc]
    """FastAPI dependency that yields an async DB session."""
    async with async_session() as session:
        yield session
--- a/backend/main.py
+++ b/backend/main.py
@ -0,0 +1,117 @@
 """Chrysopedia API — Knowledge extraction and retrieval system.
 Entry point for the FastAPI application. Configures middleware,
 structured logging, and mounts versioned API routers.
 """
 import logging
 import sys
 from contextlib import asynccontextmanager
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from config import get_settings
 from routers import admin, auth, chat, consent, creator_chapters, creator_dashboard, creator_highlights, creators, files, follows, health, highlights, ingest, notifications, pipeline, posts, reports, search, shorts, shorts_public, stats, techniques, topics, videos
 def _setup_logging() -> None:
    """Configure structured logging to stdout."""
    settings = get_settings()
    level = getattr(logging, settings.app_log_level.upper(), logging.INFO)
    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(
        logging.Formatter(
            fmt="%(asctime)s | %(levelname)-8s | %(name)s | %(message)s",
            datefmt="%Y-%m-%dT%H:%M:%S",
        )
    )
    root = logging.getLogger()
    root.setLevel(level)
    # Avoid duplicate handlers on reload
    root.handlers.clear()
    root.addHandler(handler)
    # Quiet noisy libraries
    logging.getLogger("uvicorn.access").setLevel(logging.WARNING)
    logging.getLogger("sqlalchemy.engine").setLevel(logging.WARNING)
@asynccontextmanager
 async def lifespan(app: FastAPI):  # noqa: ARG001
    """Application lifespan: setup on startup, teardown on shutdown."""
    _setup_logging()
    logger = logging.getLogger("chrysopedia")
    settings = get_settings()
    logger.info(
        "Chrysopedia API starting (env=%s, log_level=%s)",
        settings.app_env,
        settings.app_log_level,
    )
    # Ensure MinIO bucket exists (best-effort — API still starts if MinIO is down)
    try:
        from minio_client import ensure_bucket
        ensure_bucket()
        logger.info("MinIO bucket ready")
    except Exception as exc:
        logger.warning("MinIO bucket init failed (will retry on first upload): %s", exc)
    yield
    logger.info("Chrysopedia API shutting down")
 app = FastAPI(
    title="Chrysopedia API",
    description="Knowledge extraction and retrieval for music production content",
    version="0.1.0",
    lifespan=lifespan,
 )
 # ── Middleware ────────────────────────────────────────────────────────────────
 settings = get_settings()
 app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.cors_origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # ── Routers ──────────────────────────────────────────────────────────────────
 # Root-level health (no prefix)
 app.include_router(health.router)
 # Versioned API
 app.include_router(admin.router, prefix="/api/v1")
 app.include_router(auth.router, prefix="/api/v1")
 app.include_router(chat.router, prefix="/api/v1")
 app.include_router(consent.router, prefix="/api/v1")
 app.include_router(creator_dashboard.router, prefix="/api/v1")
 app.include_router(creator_chapters.router, prefix="/api/v1")
 app.include_router(creator_highlights.router, prefix="/api/v1")
 app.include_router(creators.router, prefix="/api/v1")
 app.include_router(creators.admin_router, prefix="/api/v1")
 app.include_router(follows.router, prefix="/api/v1")
 app.include_router(highlights.router, prefix="/api/v1")
 app.include_router(ingest.router, prefix="/api/v1")
 app.include_router(notifications.router, prefix="/api/v1")
 app.include_router(pipeline.router, prefix="/api/v1")
 app.include_router(posts.router, prefix="/api/v1")
 app.include_router(files.router, prefix="/api/v1")
 app.include_router(reports.router, prefix="/api/v1")
 app.include_router(search.router, prefix="/api/v1")
 app.include_router(shorts.router, prefix="/api/v1")
 app.include_router(shorts_public.router, prefix="/api/v1")
 app.include_router(stats.router, prefix="/api/v1")
 app.include_router(techniques.router, prefix="/api/v1")
 app.include_router(topics.router, prefix="/api/v1")
 app.include_router(videos.router, prefix="/api/v1")
@app.get("/api/v1/health")
 async def api_health():
    """Lightweight version-prefixed health endpoint (no DB check)."""
    return {"status": "ok", "version": "0.1.0"}
--- a/backend/minio_client.py
+++ b/backend/minio_client.py
@ -0,0 +1,116 @@
 """MinIO client singleton with lazy initialization.
 Provides file upload, presigned download URL generation, and automatic
 bucket creation for the Chrysopedia post attachment storage.
 """
 from __future__ import annotations
 import io
 import logging
 from datetime import timedelta
 from minio import Minio
 from minio.error import S3Error
 from config import get_settings
 logger = logging.getLogger(__name__)
 _client: Minio | None = None
 _bucket_ensured: bool = False
 def get_minio_client() -> Minio:
    """Return the singleton MinIO client, creating it on first call."""
    global _client
    if _client is None:
        settings = get_settings()
        _client = Minio(
            settings.minio_url,
            access_key=settings.minio_access_key,
            secret_key=settings.minio_secret_key,
            secure=settings.minio_secure,
        )
        logger.info("MinIO client initialized (endpoint=%s)", settings.minio_url)
    return _client
 def ensure_bucket() -> None:
    """Create the configured bucket if it doesn't already exist."""
    global _bucket_ensured
    if _bucket_ensured:
        return
    settings = get_settings()
    client = get_minio_client()
    bucket = settings.minio_bucket
    try:
        if not client.bucket_exists(bucket):
            client.make_bucket(bucket)
            logger.info("Created MinIO bucket: %s", bucket)
        else:
            logger.debug("MinIO bucket already exists: %s", bucket)
        _bucket_ensured = True
    except S3Error as exc:
        logger.error("MinIO bucket check/create failed: %s", exc)
        raise
 def upload_file(
    object_key: str,
    data: bytes | io.BytesIO,
    length: int,
    content_type: str = "application/octet-stream",
 ) -> None:
    """Upload a file to MinIO.
    Args:
        object_key: The storage path within the bucket.
        data: File content as bytes or BytesIO stream.
        length: Size in bytes.
        content_type: MIME type for the object.
    """
    ensure_bucket()
    settings = get_settings()
    client = get_minio_client()
    stream = io.BytesIO(data) if isinstance(data, bytes) else data
    client.put_object(
        settings.minio_bucket,
        object_key,
        stream,
        length,
        content_type=content_type,
    )
    logger.info("Uploaded %s (%d bytes, %s)", object_key, length, content_type)
 def generate_download_url(object_key: str, expires: int = 3600) -> str:
    """Generate a presigned GET URL for downloading a file.
    Args:
        object_key: The storage path within the bucket.
        expires: URL validity in seconds (default 1 hour).
    Returns:
        Presigned URL string.
    """
    settings = get_settings()
    client = get_minio_client()
    url: str = client.presigned_get_object(
        settings.minio_bucket,
        object_key,
        expires=timedelta(seconds=expires),
    )
    return url
 def delete_file(object_key: str) -> None:
    """Delete a file from MinIO.
    Args:
        object_key: The storage path within the bucket.
    """
    settings = get_settings()
    client = get_minio_client()
    client.remove_object(settings.minio_bucket, object_key)
    logger.info("Deleted %s from MinIO", object_key)
--- a/backend/models.py
+++ b/backend/models.py
@ -0,0 +1,932 @@
 """SQLAlchemy ORM models for the Chrysopedia knowledge base.
 Seven entities matching chrysopedia-spec.md §6.1:
  Creator, SourceVideo, TranscriptSegment, KeyMoment,
  TechniquePage, RelatedTechniqueLink, Tag
 """
 from __future__ import annotations
 import enum
 import uuid
 from datetime import datetime, timezone
 from sqlalchemy import (
    BigInteger,
    Boolean,
    Enum,
    Float,
    ForeignKey,
    Index,
    Integer,
    String,
    Text,
    UniqueConstraint,
    func,
    text,
 )
 from sqlalchemy.dialects.postgresql import ARRAY, JSONB, UUID
 from sqlalchemy.orm import Mapped, mapped_column
 from sqlalchemy.orm import relationship as sa_relationship
 from database import Base
 # ── Enums ────────────────────────────────────────────────────────────────────
 class ContentType(str, enum.Enum):
    """Source video content type."""
    tutorial = "tutorial"
    livestream = "livestream"
    breakdown = "breakdown"
    short_form = "short_form"
 class ProcessingStatus(str, enum.Enum):
    """Pipeline processing status for a source video.
    User-facing lifecycle: not_started → queued → processing → complete
    Error branch: processing → error (retrigger resets to queued)
    """
    not_started = "not_started"
    queued = "queued"
    processing = "processing"
    error = "error"
    complete = "complete"
 class KeyMomentContentType(str, enum.Enum):
    """Content classification for a key moment."""
    technique = "technique"
    settings = "settings"
    reasoning = "reasoning"
    workflow = "workflow"
 class SourceQuality(str, enum.Enum):
    """Derived source quality for technique pages."""
    structured = "structured"
    mixed = "mixed"
    unstructured = "unstructured"
 class RelationshipType(str, enum.Enum):
    """Types of links between technique pages."""
    same_technique_other_creator = "same_technique_other_creator"
    same_creator_adjacent = "same_creator_adjacent"
    general_cross_reference = "general_cross_reference"
 class UserRole(str, enum.Enum):
    """Roles for authenticated users."""
    creator = "creator"
    admin = "admin"
 class HighlightStatus(str, enum.Enum):
    """Triage status for highlight candidates."""
    candidate = "candidate"
    approved = "approved"
    rejected = "rejected"
 class ChapterStatus(str, enum.Enum):
    """Review status for auto-detected chapters."""
    draft = "draft"
    approved = "approved"
    hidden = "hidden"
 # ── Helpers ──────────────────────────────────────────────────────────────────
 def _uuid_pk() -> Mapped[uuid.UUID]:
    return mapped_column(
        UUID(as_uuid=True),
        primary_key=True,
        default=uuid.uuid4,
        server_default=func.gen_random_uuid(),
    )
 def _now() -> datetime:
    """Return current UTC time as a naive datetime (no tzinfo).
    PostgreSQL TIMESTAMP WITHOUT TIME ZONE columns require naive datetimes.
    asyncpg rejects timezone-aware datetimes for such columns.
    """
    return datetime.now(timezone.utc).replace(tzinfo=None)
 # ── Models ───────────────────────────────────────────────────────────────────
 class Creator(Base):
    __tablename__ = "creators"
    id: Mapped[uuid.UUID] = _uuid_pk()
    name: Mapped[str] = mapped_column(String(255), nullable=False)
    slug: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    genres: Mapped[list[str] | None] = mapped_column(ARRAY(String), nullable=True)
    folder_name: Mapped[str] = mapped_column(String(255), nullable=False)
    avatar_url: Mapped[str | None] = mapped_column(String(1000), nullable=True)
    avatar_source: Mapped[str | None] = mapped_column(String(50), nullable=True)
    avatar_fetched_at: Mapped[datetime | None] = mapped_column(nullable=True)
    bio: Mapped[str | None] = mapped_column(Text, nullable=True)
    social_links: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    personality_profile: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    shorts_template: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    featured: Mapped[bool] = mapped_column(default=False, server_default="false")
    view_count: Mapped[int] = mapped_column(Integer, default=0, server_default="0")
    hidden: Mapped[bool] = mapped_column(default=False, server_default="false")
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    videos: Mapped[list[SourceVideo]] = sa_relationship(back_populates="creator")
    technique_pages: Mapped[list[TechniquePage]] = sa_relationship(back_populates="creator")
    posts: Mapped[list[Post]] = sa_relationship(back_populates="creator")
 class User(Base):
    """Authenticated user account for the creator dashboard."""
    __tablename__ = "users"
    id: Mapped[uuid.UUID] = _uuid_pk()
    email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    hashed_password: Mapped[str] = mapped_column(String(255), nullable=False)
    display_name: Mapped[str] = mapped_column(String(255), nullable=False)
    role: Mapped[UserRole] = mapped_column(
        Enum(UserRole, name="user_role", create_constraint=True),
        default=UserRole.creator,
        server_default="creator",
    )
    creator_id: Mapped[uuid.UUID | None] = mapped_column(
        ForeignKey("creators.id", ondelete="SET NULL"), nullable=True
    )
    is_active: Mapped[bool] = mapped_column(
        Boolean, default=True, server_default="true"
    )
    onboarding_completed: Mapped[bool] = mapped_column(
        Boolean, default=False, server_default="false"
    )
    notification_preferences: Mapped[dict] = mapped_column(
        JSONB, nullable=False,
        server_default='{"email_digests": true, "digest_frequency": "daily"}',
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    creator: Mapped[Creator | None] = sa_relationship()
 class EmailDigestLog(Base):
    """Record of a digest email sent to a user."""
    __tablename__ = "email_digest_log"
    __table_args__ = (
        Index("ix_email_digest_log_user_sent", "user_id", "digest_sent_at"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    user_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="CASCADE"), nullable=False,
    )
    digest_sent_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    content_summary: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    # relationships
    user: Mapped[User] = sa_relationship()
 class InviteCode(Base):
    """Single-use or limited-use invite codes for registration gating."""
    __tablename__ = "invite_codes"
    id: Mapped[uuid.UUID] = _uuid_pk()
    code: Mapped[str] = mapped_column(String(100), unique=True, nullable=False)
    uses_remaining: Mapped[int] = mapped_column(Integer, default=1, server_default="1")
    created_by: Mapped[uuid.UUID | None] = mapped_column(
        ForeignKey("users.id", ondelete="SET NULL"), nullable=True
    )
    expires_at: Mapped[datetime | None] = mapped_column(nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
 class SourceVideo(Base):
    __tablename__ = "source_videos"
    id: Mapped[uuid.UUID] = _uuid_pk()
    creator_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("creators.id", ondelete="CASCADE"), nullable=False
    )
    filename: Mapped[str] = mapped_column(String(500), nullable=False)
    file_path: Mapped[str] = mapped_column(String(1000), nullable=False)
    duration_seconds: Mapped[int] = mapped_column(Integer, nullable=True)
    content_type: Mapped[ContentType] = mapped_column(
        Enum(ContentType, name="content_type", create_constraint=True),
        nullable=False,
    )
    transcript_path: Mapped[str | None] = mapped_column(String(1000), nullable=True)
    content_hash: Mapped[str | None] = mapped_column(String(64), nullable=True, index=True)
    processing_status: Mapped[ProcessingStatus] = mapped_column(
        Enum(ProcessingStatus, name="processing_status", create_constraint=True),
        default=ProcessingStatus.not_started,
        server_default="not_started",
    )
    classification_data: Mapped[list | None] = mapped_column(JSONB, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    creator: Mapped[Creator] = sa_relationship(back_populates="videos")
    segments: Mapped[list[TranscriptSegment]] = sa_relationship(back_populates="source_video")
    key_moments: Mapped[list[KeyMoment]] = sa_relationship(back_populates="source_video")
 class TranscriptSegment(Base):
    __tablename__ = "transcript_segments"
    id: Mapped[uuid.UUID] = _uuid_pk()
    source_video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False
    )
    start_time: Mapped[float] = mapped_column(Float, nullable=False)
    end_time: Mapped[float] = mapped_column(Float, nullable=False)
    text: Mapped[str] = mapped_column(Text, nullable=False)
    segment_index: Mapped[int] = mapped_column(Integer, nullable=False)
    topic_label: Mapped[str | None] = mapped_column(String(255), nullable=True)
    # relationships
    source_video: Mapped[SourceVideo] = sa_relationship(back_populates="segments")
 class KeyMoment(Base):
    __tablename__ = "key_moments"
    id: Mapped[uuid.UUID] = _uuid_pk()
    source_video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False
    )
    technique_page_id: Mapped[uuid.UUID | None] = mapped_column(
        ForeignKey("technique_pages.id", ondelete="SET NULL"), nullable=True
    )
    title: Mapped[str] = mapped_column(String(500), nullable=False)
    summary: Mapped[str] = mapped_column(Text, nullable=False)
    start_time: Mapped[float] = mapped_column(Float, nullable=False)
    end_time: Mapped[float] = mapped_column(Float, nullable=False)
    content_type: Mapped[KeyMomentContentType] = mapped_column(
        Enum(KeyMomentContentType, name="key_moment_content_type", create_constraint=True),
        nullable=False,
    )
    plugins: Mapped[list[str] | None] = mapped_column(ARRAY(String), nullable=True)
    raw_transcript: Mapped[str | None] = mapped_column(Text, nullable=True)
    chapter_status: Mapped[ChapterStatus] = mapped_column(
        Enum(ChapterStatus, name="chapter_status", create_constraint=True),
        nullable=False,
        server_default="draft",
        default=ChapterStatus.draft,
    )
    sort_order: Mapped[int] = mapped_column(Integer, nullable=False, server_default="0", default=0)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    source_video: Mapped[SourceVideo] = sa_relationship(back_populates="key_moments")
    technique_page: Mapped[TechniquePage | None] = sa_relationship(
        back_populates="key_moments", foreign_keys=[technique_page_id]
    )
 class TechniquePage(Base):
    __tablename__ = "technique_pages"
    id: Mapped[uuid.UUID] = _uuid_pk()
    creator_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("creators.id", ondelete="CASCADE"), nullable=False
    )
    title: Mapped[str] = mapped_column(String(500), nullable=False)
    slug: Mapped[str] = mapped_column(String(500), unique=True, nullable=False)
    topic_category: Mapped[str] = mapped_column(String(255), nullable=False)
    topic_tags: Mapped[list[str] | None] = mapped_column(ARRAY(String), nullable=True)
    summary: Mapped[str | None] = mapped_column(Text, nullable=True)
    body_sections: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    body_sections_format: Mapped[str] = mapped_column(
        String(20), nullable=False, default="v1", server_default="v1"
    )
    signal_chains: Mapped[list | None] = mapped_column(JSONB, nullable=True)
    plugins: Mapped[list[str] | None] = mapped_column(ARRAY(String), nullable=True)
    source_quality: Mapped[SourceQuality | None] = mapped_column(
        Enum(SourceQuality, name="source_quality", create_constraint=True),
        nullable=True,
    )
    view_count: Mapped[int] = mapped_column(Integer, default=0, server_default="0")
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    creator: Mapped[Creator] = sa_relationship(back_populates="technique_pages")
    key_moments: Mapped[list[KeyMoment]] = sa_relationship(
        back_populates="technique_page", foreign_keys=[KeyMoment.technique_page_id]
    )
    versions: Mapped[list[TechniquePageVersion]] = sa_relationship(
        back_populates="technique_page", order_by="TechniquePageVersion.version_number"
    )
    outgoing_links: Mapped[list[RelatedTechniqueLink]] = sa_relationship(
        foreign_keys="RelatedTechniqueLink.source_page_id", back_populates="source_page"
    )
    incoming_links: Mapped[list[RelatedTechniqueLink]] = sa_relationship(
        foreign_keys="RelatedTechniqueLink.target_page_id", back_populates="target_page"
    )
    source_video_links: Mapped[list[TechniquePageVideo]] = sa_relationship(
        back_populates="technique_page"
    )
 class RelatedTechniqueLink(Base):
    __tablename__ = "related_technique_links"
    __table_args__ = (
        UniqueConstraint("source_page_id", "target_page_id", "relationship", name="uq_technique_link"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    source_page_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False
    )
    target_page_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False
    )
    relationship: Mapped[RelationshipType] = mapped_column(
        Enum(RelationshipType, name="relationship_type", create_constraint=True),
        nullable=False,
    )
    # relationships
    source_page: Mapped[TechniquePage] = sa_relationship(
        foreign_keys=[source_page_id], back_populates="outgoing_links"
    )
    target_page: Mapped[TechniquePage] = sa_relationship(
        foreign_keys=[target_page_id], back_populates="incoming_links"
    )
 class TechniquePageVersion(Base):
    """Snapshot of a TechniquePage before a pipeline re-synthesis overwrites it."""
    __tablename__ = "technique_page_versions"
    id: Mapped[uuid.UUID] = _uuid_pk()
    technique_page_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False
    )
    version_number: Mapped[int] = mapped_column(Integer, nullable=False)
    content_snapshot: Mapped[dict] = mapped_column(JSONB, nullable=False)
    pipeline_metadata: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # relationships
    technique_page: Mapped[TechniquePage] = sa_relationship(
        back_populates="versions"
    )
 class Tag(Base):
    __tablename__ = "tags"
    id: Mapped[uuid.UUID] = _uuid_pk()
    name: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    category: Mapped[str] = mapped_column(String(255), nullable=False)
    aliases: Mapped[list[str] | None] = mapped_column(ARRAY(String), nullable=True)
 class TechniquePageVideo(Base):
    """Association linking a technique page to its contributing source videos."""
    __tablename__ = "technique_page_videos"
    __table_args__ = (
        UniqueConstraint("technique_page_id", "source_video_id", name="uq_page_video"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    technique_page_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False
    )
    source_video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False
    )
    added_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # relationships
    technique_page: Mapped[TechniquePage] = sa_relationship(
        back_populates="source_video_links"
    )
    source_video: Mapped[SourceVideo] = sa_relationship()
 # ── Content Report Enums ─────────────────────────────────────────────────────
 class ReportType(str, enum.Enum):
    """Classification of user-submitted content reports."""
    inaccurate = "inaccurate"
    missing_info = "missing_info"
    wrong_attribution = "wrong_attribution"
    formatting = "formatting"
    other = "other"
 class ReportStatus(str, enum.Enum):
    """Triage status for content reports."""
    open = "open"
    acknowledged = "acknowledged"
    resolved = "resolved"
    dismissed = "dismissed"
 # ── Content Report ───────────────────────────────────────────────────────────
 class ContentReport(Base):
    """User-submitted report about a content issue.
    Generic: content_type + content_id can reference any entity
    (technique_page, key_moment, creator, or general).
    """
    __tablename__ = "content_reports"
    id: Mapped[uuid.UUID] = _uuid_pk()
    content_type: Mapped[str] = mapped_column(
        String(50), nullable=False, doc="Entity type: technique_page, key_moment, creator, general"
    )
    content_id: Mapped[uuid.UUID | None] = mapped_column(
        UUID(as_uuid=True), nullable=True, doc="FK to the reported entity (null for general reports)"
    )
    content_title: Mapped[str | None] = mapped_column(
        String(500), nullable=True, doc="Snapshot of entity title at report time"
    )
    report_type: Mapped[ReportType] = mapped_column(
        Enum(ReportType, name="report_type", create_constraint=True),
        nullable=False,
    )
    description: Mapped[str] = mapped_column(Text, nullable=False)
    status: Mapped[ReportStatus] = mapped_column(
        Enum(ReportStatus, name="report_status", create_constraint=True),
        default=ReportStatus.open,
        server_default="open",
    )
    admin_notes: Mapped[str | None] = mapped_column(Text, nullable=True)
    page_url: Mapped[str | None] = mapped_column(
        String(1000), nullable=True, doc="URL the user was on when reporting"
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    resolved_at: Mapped[datetime | None] = mapped_column(nullable=True)
 # ── Pipeline Event ───────────────────────────────────────────────────────────
 class SearchLog(Base):
    """Logged search query for analytics and popular searches."""
    __tablename__ = "search_log"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    query: Mapped[str] = mapped_column(String(500), nullable=False, index=True)
    scope: Mapped[str] = mapped_column(String(50), nullable=False)
    result_count: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), index=True
    )
 class PipelineRunStatus(str, enum.Enum):
    """Status of a pipeline run."""
    running = "running"
    complete = "complete"
    error = "error"
    cancelled = "cancelled"
 class PipelineRunTrigger(str, enum.Enum):
    """What initiated a pipeline run."""
    manual = "manual"
    clean_reprocess = "clean_reprocess"
    auto_ingest = "auto_ingest"
    bulk = "bulk"
    stage_rerun = "stage_rerun"
 class PipelineRun(Base):
    """A single execution of the pipeline for a video.
    Each trigger/retrigger creates a new run. Events are scoped to a run
    via run_id, giving a clean audit trail per execution.
    """
    __tablename__ = "pipeline_runs"
    id: Mapped[uuid.UUID] = _uuid_pk()
    video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    run_number: Mapped[int] = mapped_column(
        Integer, nullable=False, doc="Auto-increment per video, 1-indexed"
    )
    trigger: Mapped[PipelineRunTrigger] = mapped_column(
        Enum(PipelineRunTrigger, name="pipeline_run_trigger", create_constraint=True),
        nullable=False,
    )
    status: Mapped[PipelineRunStatus] = mapped_column(
        Enum(PipelineRunStatus, name="pipeline_run_status", create_constraint=True),
        default=PipelineRunStatus.running,
        server_default="running",
    )
    started_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    finished_at: Mapped[datetime | None] = mapped_column(nullable=True)
    error_stage: Mapped[str | None] = mapped_column(String(50), nullable=True)
    total_tokens: Mapped[int] = mapped_column(Integer, default=0, server_default="0")
    # relationships
    video: Mapped[SourceVideo] = sa_relationship()
    events: Mapped[list[PipelineEvent]] = sa_relationship(
        back_populates="run", foreign_keys="PipelineEvent.run_id"
    )
 # ── Pipeline Event ───────────────────────────────────────────────────────────
 class PipelineEvent(Base):
    """Structured log entry for pipeline execution.
    Captures per-stage start/complete/error/llm_call events with
    token usage and optional response payloads for debugging.
    """
    __tablename__ = "pipeline_events"
    id: Mapped[uuid.UUID] = _uuid_pk()
    video_id: Mapped[uuid.UUID] = mapped_column(
        UUID(as_uuid=True), nullable=False, index=True,
    )
    run_id: Mapped[uuid.UUID | None] = mapped_column(
        ForeignKey("pipeline_runs.id", ondelete="SET NULL"), nullable=True, index=True,
    )
    stage: Mapped[str] = mapped_column(
        String(50), nullable=False, doc="stage2_segmentation, stage3_extraction, etc."
    )
    event_type: Mapped[str] = mapped_column(
        String(30), nullable=False, doc="start, complete, error, llm_call"
    )
    prompt_tokens: Mapped[int | None] = mapped_column(Integer, nullable=True)
    completion_tokens: Mapped[int | None] = mapped_column(Integer, nullable=True)
    total_tokens: Mapped[int | None] = mapped_column(Integer, nullable=True)
    model: Mapped[str | None] = mapped_column(String(100), nullable=True)
    duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    payload: Mapped[dict | None] = mapped_column(
        JSONB, nullable=True, doc="LLM response content, error details, stage metadata"
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # Debug mode — full LLM I/O capture columns
    system_prompt_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    user_prompt_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    response_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    # relationships
    run: Mapped[PipelineRun | None] = sa_relationship(
        back_populates="events", foreign_keys=[run_id]
    )
 # ── Consent Enums ────────────────────────────────────────────────────────────
 class ConsentField(str, enum.Enum):
    """Fields that can be individually consented to per video."""
    kb_inclusion = "kb_inclusion"
    training_usage = "training_usage"
    public_display = "public_display"
 # ── Consent Models ───────────────────────────────────────────────────────────
 class VideoConsent(Base):
    """Current consent state for a source video.
    One row per video. Mutable — updated when a creator toggles consent.
    The full change history lives in ConsentAuditLog.
    """
    __tablename__ = "video_consents"
    __table_args__ = (
        UniqueConstraint("source_video_id", name="uq_video_consent_video"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    source_video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False,
    )
    creator_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("creators.id", ondelete="CASCADE"), nullable=False,
    )
    kb_inclusion: Mapped[bool] = mapped_column(
        Boolean, default=False, server_default="false",
    )
    training_usage: Mapped[bool] = mapped_column(
        Boolean, default=False, server_default="false",
    )
    public_display: Mapped[bool] = mapped_column(
        Boolean, default=True, server_default="true",
    )
    updated_by: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="RESTRICT"), nullable=False,
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    source_video: Mapped[SourceVideo] = sa_relationship()
    creator: Mapped[Creator] = sa_relationship()
    audit_entries: Mapped[list[ConsentAuditLog]] = sa_relationship(
        back_populates="video_consent", order_by="ConsentAuditLog.version"
    )
 class ConsentAuditLog(Base):
    """Append-only versioned record of per-field consent changes.
    Each row captures a single field change. Version is auto-assigned
    in application code (max(version) + 1 per video_consent_id).
    """
    __tablename__ = "consent_audit_log"
    id: Mapped[uuid.UUID] = _uuid_pk()
    video_consent_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("video_consents.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    version: Mapped[int] = mapped_column(Integer, nullable=False)
    field_name: Mapped[str] = mapped_column(
        String(50), nullable=False, doc="ConsentField value: kb_inclusion, training_usage, public_display"
    )
    old_value: Mapped[bool | None] = mapped_column(Boolean, nullable=True)
    new_value: Mapped[bool] = mapped_column(Boolean, nullable=False)
    changed_by: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="RESTRICT"), nullable=False,
    )
    ip_address: Mapped[str | None] = mapped_column(String(45), nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # relationships
    video_consent: Mapped[VideoConsent] = sa_relationship(
        back_populates="audit_entries"
    )
 class ImpersonationLog(Base):
    """Audit trail for admin impersonation sessions."""
    __tablename__ = "impersonation_log"
    id: Mapped[uuid.UUID] = _uuid_pk()
    admin_user_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    target_user_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    action: Mapped[str] = mapped_column(
        String(10), nullable=False, doc="'start' or 'stop'"
    )
    write_mode: Mapped[bool] = mapped_column(
        default=False, server_default=text("false"),
    )
    ip_address: Mapped[str | None] = mapped_column(String(45), nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
 # ── Highlight Detection ─────────────────────────────────────────────────────
 class HighlightCandidate(Base):
    """Scored candidate for highlight detection, one per KeyMoment."""
    __tablename__ = "highlight_candidates"
    __table_args__ = (
        UniqueConstraint("key_moment_id", name="uq_highlight_candidate_moment"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    key_moment_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("key_moments.id", ondelete="CASCADE"), nullable=False, unique=True,
    )
    source_video_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("source_videos.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    score: Mapped[float] = mapped_column(Float, nullable=False)
    score_breakdown: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    duration_secs: Mapped[float] = mapped_column(Float, nullable=False)
    status: Mapped[HighlightStatus] = mapped_column(
        Enum(HighlightStatus, name="highlight_status", create_constraint=True),
        default=HighlightStatus.candidate,
        server_default="candidate",
    )
    trim_start: Mapped[float | None] = mapped_column(Float, nullable=True, default=None)
    trim_end: Mapped[float | None] = mapped_column(Float, nullable=True, default=None)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    key_moment: Mapped[KeyMoment] = sa_relationship()
    source_video: Mapped[SourceVideo] = sa_relationship()
 # ── Follow System ────────────────────────────────────────────────────────────
 class CreatorFollow(Base):
    """A user following a creator."""
    __tablename__ = "creator_follows"
    __table_args__ = (
        UniqueConstraint("user_id", "creator_id", name="uq_creator_follow_user_creator"),
    )
    id: Mapped[uuid.UUID] = _uuid_pk()
    user_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    creator_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("creators.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # relationships
    user: Mapped[User] = sa_relationship()
    creator: Mapped[Creator] = sa_relationship()
 # ── Posts (Creator content feed) ─────────────────────────────────────────────
 class Post(Base):
    """A rich text post by a creator, optionally with file attachments."""
    __tablename__ = "posts"
    id: Mapped[uuid.UUID] = _uuid_pk()
    creator_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("creators.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    title: Mapped[str] = mapped_column(String(500), nullable=False)
    body_json: Mapped[dict] = mapped_column(JSONB, nullable=False)
    is_published: Mapped[bool] = mapped_column(
        Boolean, default=False, server_default="false",
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    # relationships
    creator: Mapped[Creator] = sa_relationship(back_populates="posts")
    attachments: Mapped[list[PostAttachment]] = sa_relationship(
        back_populates="post", cascade="all, delete-orphan"
    )
 class PostAttachment(Base):
    """A file attachment on a post, stored in MinIO."""
    __tablename__ = "post_attachments"
    id: Mapped[uuid.UUID] = _uuid_pk()
    post_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("posts.id", ondelete="CASCADE"), nullable=False, index=True,
    )
    filename: Mapped[str] = mapped_column(String(500), nullable=False)
    object_key: Mapped[str] = mapped_column(String(1000), nullable=False)
    content_type: Mapped[str] = mapped_column(String(255), nullable=False)
    size_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    # relationships
    post: Mapped[Post] = sa_relationship(back_populates="attachments")
 # ── Shorts Generation ────────────────────────────────────────────────────────
 class FormatPreset(str, enum.Enum):
    """Output format presets for generated shorts."""
    vertical = "vertical"      # 9:16 (1080x1920)
    square = "square"          # 1:1  (1080x1080)
    horizontal = "horizontal"  # 16:9 (1920x1080)
 class ShortStatus(str, enum.Enum):
    """Processing status for a generated short."""
    pending = "pending"
    processing = "processing"
    complete = "complete"
    failed = "failed"
 class GeneratedShort(Base):
    """A video short generated from a highlight candidate in a specific format."""
    __tablename__ = "generated_shorts"
    id: Mapped[uuid.UUID] = _uuid_pk()
    highlight_candidate_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("highlight_candidates.id", ondelete="CASCADE"),
        nullable=False, index=True,
    )
    format_preset: Mapped[FormatPreset] = mapped_column(
        Enum(FormatPreset, name="format_preset", create_constraint=True),
        nullable=False,
    )
    minio_object_key: Mapped[str | None] = mapped_column(String(1000), nullable=True)
    duration_secs: Mapped[float | None] = mapped_column(Float, nullable=True)
    width: Mapped[int] = mapped_column(Integer, nullable=False)
    height: Mapped[int] = mapped_column(Integer, nullable=False)
    file_size_bytes: Mapped[int | None] = mapped_column(BigInteger, nullable=True)
    status: Mapped[ShortStatus] = mapped_column(
        Enum(ShortStatus, name="short_status", create_constraint=True),
        default=ShortStatus.pending,
        server_default="pending",
    )
    error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
    share_token: Mapped[str | None] = mapped_column(
        String(16), nullable=True, unique=True, index=True,
    )
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now()
    )
    updated_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), onupdate=_now
    )
    captions_enabled: Mapped[bool] = mapped_column(
        Boolean, default=False, server_default=text("'false'"),
    )
    # relationships
    highlight_candidate: Mapped[HighlightCandidate] = sa_relationship()
 # ── Chat Usage Tracking ──────────────────────────────────────────────────────
 class ChatUsageLog(Base):
    """Per-request token usage log for chat completions.
    Append-only table — one row per chat request. Used for cost tracking,
    rate limit analytics, and the admin usage dashboard.
    """
    __tablename__ = "chat_usage_log"
    id: Mapped[uuid.UUID] = _uuid_pk()
    user_id: Mapped[uuid.UUID | None] = mapped_column(
        ForeignKey("users.id", ondelete="SET NULL"), nullable=True,
    )
    client_ip: Mapped[str | None] = mapped_column(String(45), nullable=True)
    creator_slug: Mapped[str | None] = mapped_column(String(255), nullable=True)
    query: Mapped[str] = mapped_column(Text, nullable=False)
    prompt_tokens: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    completion_tokens: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    total_tokens: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
    cascade_tier: Mapped[str | None] = mapped_column(String(50), nullable=True)
    model: Mapped[str | None] = mapped_column(String(100), nullable=True)
    latency_ms: Mapped[float | None] = mapped_column(Float, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        default=_now, server_default=func.now(), index=True,
    )
--- a/backend/pipeline/init.py
+++ b/backend/pipeline/init.py
--- a/backend/pipeline/caption_generator.py
+++ b/backend/pipeline/caption_generator.py
@ -0,0 +1,155 @@
 r"""ASS (Advanced SubStation Alpha) caption generator for shorts.
 Converts word-level timings from Whisper transcripts into ASS subtitle
 files with word-by-word karaoke highlighting. Each word gets its own
 Dialogue line with {\k} tags that control highlight duration.
 Pure functions — no DB access, no Celery dependency.
 """
 from __future__ import annotations
 import logging
 from pathlib import Path
 from typing import Any
 logger = logging.getLogger(__name__)
 # ── Default style configuration ──────────────────────────────────────────────
 DEFAULT_STYLE: dict[str, Any] = {
    "font_name": "Arial",
    "font_size": 48,
    "primary_colour": "&H00FFFFFF",   # white (BGR + alpha)
    "secondary_colour": "&H0000FFFF",  # yellow highlight
    "outline_colour": "&H00000000",   # black outline
    "back_colour": "&H80000000",      # semi-transparent black shadow
    "bold": -1,                       # bold
    "outline": 3,
    "shadow": 1,
    "alignment": 2,                   # bottom-center
    "margin_v": 60,                   # 60px from bottom (~15% on 1920h)
 }
 def _format_ass_time(seconds: float) -> str:
    """Convert seconds to ASS timestamp format: H:MM:SS.cc (centiseconds).
    >>> _format_ass_time(65.5)
    '0:01:05.50'
    >>> _format_ass_time(0.0)
    '0:00:00.00'
    """
    if seconds < 0:
        seconds = 0.0
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = seconds % 60
    return f"{h}:{m:02d}:{s:05.2f}"
 def _build_ass_header(style_config: dict[str, Any]) -> str:
    """Build ASS file header with script info and style definition."""
    cfg = {**DEFAULT_STYLE, **(style_config or {})}
    header = (
        "[Script Info]\n"
        "Title: Chrysopedia Auto-Captions\n"
        "ScriptType: v4.00+\n"
        "PlayResX: 1080\n"
        "PlayResY: 1920\n"
        "WrapStyle: 0\n"
        "ScaledBorderAndShadow: yes\n"
        "\n"
        "[V4+ Styles]\n"
        "Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, "
        "OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, "
        "ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, "
        "Alignment, MarginL, MarginR, MarginV, Encoding\n"
        f"Style: Default,{cfg['font_name']},{cfg['font_size']},"
        f"{cfg['primary_colour']},{cfg['secondary_colour']},"
        f"{cfg['outline_colour']},{cfg['back_colour']},"
        f"{cfg['bold']},0,0,0,"
        f"100,100,0,0,1,{cfg['outline']},{cfg['shadow']},"
        f"{cfg['alignment']},20,20,{cfg['margin_v']},1\n"
        "\n"
        "[Events]\n"
        "Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text\n"
    )
    return header
 def generate_ass_captions(
    word_timings: list[dict[str, Any]],
    clip_start: float,
    style_config: dict[str, Any] | None = None,
 ) -> str:
    """Generate ASS subtitle content from word-level timings.
    Each word is emitted as a separate Dialogue line with karaoke timing
    (``{\\k<centiseconds>}``) so that words highlight one-by-one.
    All word timestamps are offset by ``-clip_start`` to make them
    clip-relative (i.e. the first frame of the clip is t=0).
    Parameters
    ----------
    word_timings : list[dict]
        Word-timing dicts with ``word``, ``start``, ``end`` keys.
        ``start`` and ``end`` are absolute times in seconds.
    clip_start : float
        Absolute start time of the clip in seconds. Subtracted from
        all word timestamps.
    style_config : dict | None
        Override style parameters (merged onto DEFAULT_STYLE).
    Returns
    -------
    str — Full ASS file content. Empty dialogue section if no timings.
    """
    header = _build_ass_header(style_config)
    if not word_timings:
        logger.debug("No word timings provided — returning header-only ASS")
        return header
    lines: list[str] = [header]
    for w in word_timings:
        word_text = w.get("word", "").strip()
        if not word_text:
            continue
        abs_start = float(w.get("start", 0.0))
        abs_end = float(w.get("end", abs_start))
        # Make clip-relative
        rel_start = max(0.0, abs_start - clip_start)
        rel_end = max(rel_start, abs_end - clip_start)
        # Karaoke duration in centiseconds
        k_duration = max(1, round((rel_end - rel_start) * 100))
        start_ts = _format_ass_time(rel_start)
        end_ts = _format_ass_time(rel_end)
        # Dialogue line with karaoke tag
        line = (
            f"Dialogue: 0,{start_ts},{end_ts},Default,,0,0,0,,"
            f"{{\\k{k_duration}}}{word_text}"
        )
        lines.append(line)
    return "\n".join(lines) + "\n"
 def write_ass_file(ass_content: str, output_path: Path) -> Path:
    """Write ASS content to disk.
    Creates parent directories if needed. Returns the output path.
    """
    output_path = Path(output_path)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    output_path.write_text(ass_content, encoding="utf-8")
    logger.debug("Wrote ASS captions to %s (%d bytes)", output_path, len(ass_content))
    return output_path
--- a/backend/pipeline/card_renderer.py
+++ b/backend/pipeline/card_renderer.py
@ -0,0 +1,298 @@
 """FFmpeg-based intro/outro card video generation and segment concatenation.
 Generates solid-color card clips with centered text using ffmpeg lavfi
 (color + drawtext filters). Provides concat demuxer logic to assemble
 intro + main clip + outro into a final short.
 Pure functions — no DB access, no Celery dependency.
 """
 from __future__ import annotations
 import logging
 import subprocess
 import tempfile
 from pathlib import Path
 logger = logging.getLogger(__name__)
 FFMPEG_TIMEOUT_SECS = 120
 # Default template values
 DEFAULT_ACCENT_COLOR = "#22d3ee"
 DEFAULT_FONT_FAMILY = "Inter"
 DEFAULT_INTRO_DURATION = 2.0
 DEFAULT_OUTRO_DURATION = 2.0
 def render_card(
    text: str,
    duration_secs: float,
    width: int,
    height: int,
    accent_color: str = DEFAULT_ACCENT_COLOR,
    font_family: str = DEFAULT_FONT_FAMILY,
 ) -> list[str]:
    """Build ffmpeg command args that generate a card mp4 from lavfi input.
    Produces a solid black background with centered white text and a thin
    accent-color underline bar at the bottom third.
    Args:
        text: Display text (e.g., creator name or "Thanks for watching").
        duration_secs: Card duration in seconds.
        width: Output width in pixels.
        height: Output height in pixels.
        accent_color: Hex color for the underline glow bar.
        font_family: Font family for drawtext (must be available on system).
    Returns:
        List of ffmpeg command arguments (without the output path — caller appends).
    """
    if duration_secs <= 0:
        raise ValueError(f"duration_secs must be positive, got {duration_secs}")
    if width <= 0 or height <= 0:
        raise ValueError(f"dimensions must be positive, got {width}x{height}")
    # Font size scales with height — ~5% of output height
    font_size = max(24, int(height * 0.05))
    # Accent bar: thin horizontal line at ~65% down
    bar_y = int(height * 0.65)
    bar_height = max(2, int(height * 0.004))
    bar_margin = int(width * 0.2)
    # Escape text for ffmpeg drawtext (colons, backslashes, single quotes)
    escaped_text = (
        text.replace("\\", "\\\\")
        .replace(":", "\\:")
        .replace("'", "'\\''")
    )
    # Build complex filtergraph:
    # 1. color source for black background
    # 2. drawtext for centered title
    # 3. drawbox for accent underline bar
    filtergraph = (
        f"color=c=black:s={width}x{height}:d={duration_secs}:r=30,"
        f"drawtext=text='{escaped_text}'"
        f":fontcolor=white:fontsize={font_size}"
        f":fontfile='':font='{font_family}'"
        f":x=(w-text_w)/2:y=(h-text_h)/2-{font_size},"
        f"drawbox=x={bar_margin}:y={bar_y}"
        f":w={width - 2 * bar_margin}:h={bar_height}"
        f":color='{accent_color}'@0.8:t=fill"
    )
    cmd = [
        "ffmpeg",
        "-y",
        "-f", "lavfi",
        "-i", filtergraph,
        "-c:v", "libx264",
        "-preset", "fast",
        "-crf", "23",
        "-pix_fmt", "yuv420p",
        "-t", str(duration_secs),
        # Silent audio track so concat with audio segments works
        "-f", "lavfi",
        "-i", f"anullsrc=r=44100:cl=stereo:d={duration_secs}",
        "-c:a", "aac",
        "-b:a", "128k",
        "-shortest",
        "-movflags", "+faststart",
    ]
    return cmd
 def render_card_to_file(
    text: str,
    duration_secs: float,
    width: int,
    height: int,
    output_path: Path,
    accent_color: str = DEFAULT_ACCENT_COLOR,
    font_family: str = DEFAULT_FONT_FAMILY,
 ) -> Path:
    """Generate a card mp4 file via ffmpeg.
    Args:
        text: Display text for the card.
        duration_secs: Card duration in seconds.
        width: Output width in pixels.
        height: Output height in pixels.
        output_path: Destination mp4 file.
        accent_color: Hex color for accent elements.
        font_family: Font family for text.
    Returns:
        The output_path on success.
    Raises:
        subprocess.CalledProcessError: If ffmpeg exits non-zero.
        subprocess.TimeoutExpired: If ffmpeg exceeds timeout.
    """
    cmd = render_card(
        text=text,
        duration_secs=duration_secs,
        width=width,
        height=height,
        accent_color=accent_color,
        font_family=font_family,
    )
    cmd.append(str(output_path))
    logger.info(
        "Rendering card: text=%r duration=%.1fs size=%dx%d → %s",
        text, duration_secs, width, height, output_path,
    )
    result = subprocess.run(
        cmd,
        capture_output=True,
        timeout=FFMPEG_TIMEOUT_SECS,
    )
    if result.returncode != 0:
        stderr_text = result.stderr.decode("utf-8", errors="replace")[-2000:]
        logger.error("Card render failed (rc=%d): %s", result.returncode, stderr_text)
        raise subprocess.CalledProcessError(
            result.returncode, cmd, output=result.stdout, stderr=result.stderr,
        )
    logger.info("Card rendered: %s (%d bytes)", output_path, output_path.stat().st_size)
    return output_path
 def build_concat_list(segments: list[Path], list_path: Path) -> Path:
    """Write an ffmpeg concat demuxer list file.
    Args:
        segments: Ordered list of segment mp4 paths.
        list_path: Where to write the concat list.
    Returns:
        The list_path.
    """
    lines = [f"file '{seg.resolve()}'" for seg in segments]
    list_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
    return list_path
 def concat_segments(segments: list[Path], output_path: Path) -> Path:
    """Concatenate mp4 segments using ffmpeg concat demuxer.
    All segments must share the same codec settings (libx264/aac, same
    resolution). Uses ``-c copy`` for fast stream-copy concatenation.
    Args:
        segments: Ordered list of segment mp4 paths.
        output_path: Destination mp4 file.
    Returns:
        The output_path on success.
    Raises:
        ValueError: If segments list is empty.
        subprocess.CalledProcessError: If ffmpeg exits non-zero.
        subprocess.TimeoutExpired: If ffmpeg exceeds timeout.
    """
    if not segments:
        raise ValueError("segments list cannot be empty")
    # Write concat list to a temp file
    with tempfile.NamedTemporaryFile(
        mode="w", suffix=".txt", delete=False, prefix="concat_",
    ) as f:
        for seg in segments:
            f.write(f"file '{seg.resolve()}'\n")
        list_path = Path(f.name)
    try:
        cmd = [
            "ffmpeg",
            "-y",
            "-f", "concat",
            "-safe", "0",
            "-i", str(list_path),
            "-c", "copy",
            "-movflags", "+faststart",
            str(output_path),
        ]
        logger.info(
            "Concatenating %d segments → %s",
            len(segments), output_path,
        )
        result = subprocess.run(
            cmd,
            capture_output=True,
            timeout=FFMPEG_TIMEOUT_SECS,
        )
        if result.returncode != 0:
            stderr_text = result.stderr.decode("utf-8", errors="replace")[-2000:]
            logger.error(
                "Concat failed (rc=%d): %s", result.returncode, stderr_text,
            )
            raise subprocess.CalledProcessError(
                result.returncode, cmd, output=result.stdout, stderr=result.stderr,
            )
        logger.info(
            "Concatenated %d segments: %s (%d bytes)",
            len(segments), output_path, output_path.stat().st_size,
        )
        return output_path
    finally:
        # Clean up temp list file
        try:
            list_path.unlink()
        except OSError:
            pass
 def parse_template_config(
    shorts_template: dict | None,
 ) -> dict:
    """Parse a creator's shorts_template JSONB into normalized config.
    Expected schema::
        {
            "show_intro": true,
            "intro_text": "Creator Name Presents",
            "intro_duration": 2.0,
            "show_outro": true,
            "outro_text": "Thanks for watching!",
            "outro_duration": 2.0,
            "accent_color": "#22d3ee",
            "font_family": "Inter"
        }
    Missing fields get defaults. Returns a dict with all keys guaranteed.
    """
    if not shorts_template:
        return {
            "show_intro": False,
            "intro_text": "",
            "intro_duration": DEFAULT_INTRO_DURATION,
            "show_outro": False,
            "outro_text": "",
            "outro_duration": DEFAULT_OUTRO_DURATION,
            "accent_color": DEFAULT_ACCENT_COLOR,
            "font_family": DEFAULT_FONT_FAMILY,
        }
    return {
        "show_intro": bool(shorts_template.get("show_intro", False)),
        "intro_text": str(shorts_template.get("intro_text", "")),
        "intro_duration": float(shorts_template.get("intro_duration", DEFAULT_INTRO_DURATION)),
        "show_outro": bool(shorts_template.get("show_outro", False)),
        "outro_text": str(shorts_template.get("outro_text", "")),
        "outro_duration": float(shorts_template.get("outro_duration", DEFAULT_OUTRO_DURATION)),
        "accent_color": str(shorts_template.get("accent_color", DEFAULT_ACCENT_COLOR)),
        "font_family": str(shorts_template.get("font_family", DEFAULT_FONT_FAMILY)),
    }
--- a/backend/pipeline/citation_utils.py
+++ b/backend/pipeline/citation_utils.py
@ -0,0 +1,64 @@
 """Citation extraction and validation utilities for synthesized technique pages.
 Used by stage 5 synthesis and the test harness to verify that [N] citation
 markers in body sections reference valid source moments.
 """
 from __future__ import annotations
 import re
 from pipeline.schemas import BodySection
 # Matches [N] or [N,M] or [N,M,P] style citation markers where N,M,P are integers.
 _CITATION_RE = re.compile(r"\[(\d+(?:,\s*\d+)*)\]")
 def extract_citations(text: str) -> list[int]:
    """Extract all citation indices from ``[N]`` and ``[N,M,...]`` markers in *text*.
    Returns a sorted list of unique integer indices.
    """
    indices: set[int] = set()
    for match in _CITATION_RE.finditer(text):
        for part in match.group(1).split(","):
            indices.add(int(part.strip()))
    return sorted(indices)
 def validate_citations(
    sections: list[BodySection],
    moment_count: int,
 ) -> dict:
    """Validate citation markers across all *sections* against *moment_count* source moments.
    Moments are expected to be referenced as 0-based indices ``[0]`` through
    ``[moment_count - 1]``.
    Returns a dict with:
        valid (bool): True when every cited index is in range and every moment is cited.
        total_citations (int): Count of unique cited indices.
        invalid_indices (list[int]): Cited indices that are out of range.
        uncited_moments (list[int]): In-range moment indices that are never cited.
        coverage_pct (float): Percentage of moments that are cited (0.0–100.0).
    """
    all_indices: set[int] = set()
    for section in sections:
        all_indices.update(extract_citations(section.content))
        for sub in section.subsections:
            all_indices.update(extract_citations(sub.content))
    valid_range = set(range(moment_count))
    invalid_indices = sorted(all_indices - valid_range)
    cited_in_range = all_indices & valid_range
    uncited_moments = sorted(valid_range - cited_in_range)
    coverage_pct = (len(cited_in_range) / moment_count * 100.0) if moment_count > 0 else 0.0
    return {
        "valid": len(invalid_indices) == 0 and len(uncited_moments) == 0,
        "total_citations": len(cited_in_range),
        "invalid_indices": invalid_indices,
        "uncited_moments": uncited_moments,
        "coverage_pct": round(coverage_pct, 1),
    }
--- a/backend/pipeline/embedding_client.py
+++ b/backend/pipeline/embedding_client.py
@ -0,0 +1,88 @@
 """Synchronous embedding client using the OpenAI-compatible /v1/embeddings API.
 Uses ``openai.OpenAI`` (sync) since Celery tasks run synchronously.
 Handles connection failures gracefully — embedding is non-blocking for the pipeline.
 """
 from __future__ import annotations
 import logging
 import openai
 from config import Settings
 logger = logging.getLogger(__name__)
 class EmbeddingClient:
    """Sync embedding client backed by an OpenAI-compatible /v1/embeddings endpoint."""
    def __init__(self, settings: Settings) -> None:
        self.settings = settings
        self._client = openai.OpenAI(
            base_url=settings.embedding_api_url,
            api_key=settings.llm_api_key,
        )
    def embed(self, texts: list[str]) -> list[list[float]]:
        """Generate embedding vectors for a batch of texts.
        Parameters
        ----------
        texts:
            List of strings to embed.
        Returns
        -------
        list[list[float]]
            Embedding vectors. Returns empty list on connection/timeout errors
            so the pipeline can continue without embeddings.
        """
        if not texts:
            return []
        try:
            response = self._client.embeddings.create(
                model=self.settings.embedding_model,
                input=texts,
            )
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            logger.warning(
                "Embedding API unavailable (%s: %s). Skipping %d texts.",
                type(exc).__name__,
                exc,
                len(texts),
            )
            return []
        except openai.APIError as exc:
            logger.warning(
                "Embedding API error (%s: %s). Skipping %d texts.",
                type(exc).__name__,
                exc,
                len(texts),
            )
            return []
        vectors = [item.embedding for item in response.data]
        # Validate dimensions
        expected_dim = self.settings.embedding_dimensions
        for i, vec in enumerate(vectors):
            if len(vec) != expected_dim:
                logger.warning(
                    "Embedding dimension mismatch at index %d: expected %d, got %d. "
                    "Returning empty list.",
                    i,
                    expected_dim,
                    len(vec),
                )
                return []
        logger.info(
            "Generated %d embeddings (dim=%d) using model=%s",
            len(vectors),
            expected_dim,
            self.settings.embedding_model,
        )
        return vectors
--- a/backend/pipeline/export_fixture.py
+++ b/backend/pipeline/export_fixture.py
@ -0,0 +1,306 @@
 """Export pipeline stage inputs for a video as a reusable JSON fixture.
 Connects to the live database, queries KeyMoments and classification data,
 and writes a fixture file that the test harness can consume offline.
 Usage:
    python -m pipeline.export_fixture --video-id <uuid> --output fixtures/video.json
    python -m pipeline.export_fixture --video-id <uuid>  # prints to stdout
    python -m pipeline.export_fixture --list              # list available videos
 Requires: DATABASE_URL, REDIS_URL environment variables (or .env file).
 """
 from __future__ import annotations
 import argparse
 import json
 import sys
 import time
 from collections import Counter
 from pathlib import Path
 from sqlalchemy import create_engine, select
 from sqlalchemy.orm import Session, sessionmaker
 def _log(tag: str, msg: str, level: str = "INFO") -> None:
    """Write structured log line to stderr."""
    print(f"[EXPORT] [{level}] {tag}: {msg}", file=sys.stderr)
 def _get_sync_session(database_url: str) -> Session:
    """Create a sync SQLAlchemy session from the database URL."""
    url = database_url.replace("postgresql+asyncpg://", "postgresql+psycopg2://")
    engine = create_engine(url, pool_pre_ping=True)
    factory = sessionmaker(bind=engine)
    return factory()
 def _list_videos(database_url: str) -> int:
    """List all videos with their processing status and moment counts."""
    from models import Creator, KeyMoment, SourceVideo
    session = _get_sync_session(database_url)
    try:
        videos = (
            session.execute(
                select(SourceVideo).order_by(SourceVideo.created_at.desc())
            )
            .scalars()
            .all()
        )
        if not videos:
            _log("LIST", "No videos found in database")
            return 0
        print(f"\n{'ID':<38s} {'Status':<14s} {'Moments':>7s}  {'Creator':<20s}  {'Filename'}", file=sys.stderr)
        print(f"{'─'*38} {'─'*14} {'─'*7}  {'─'*20}  {'─'*40}", file=sys.stderr)
        for video in videos:
            moment_count = (
                session.execute(
                    select(KeyMoment.id).where(KeyMoment.source_video_id == video.id)
                )
                .scalars()
                .all()
            )
            creator = session.execute(
                select(Creator).where(Creator.id == video.creator_id)
            ).scalar_one_or_none()
            creator_name = creator.name if creator else "?"
            print(
                f"{str(video.id):<38s} {video.processing_status.value:<14s} "
                f"{len(moment_count):>7d}  {creator_name:<20s}  {video.filename}",
                file=sys.stderr,
            )
        print(f"\nTotal: {len(videos)} videos\n", file=sys.stderr)
        return 0
    finally:
        session.close()
 def export_fixture(
    database_url: str,
    redis_url: str,
    video_id: str,
    output_path: str | None = None,
 ) -> int:
    """Export stage 5 inputs for a video as a JSON fixture.
    Returns exit code: 0 = success, 1 = error.
    """
    from models import Creator, KeyMoment, SourceVideo
    start = time.monotonic()
    _log("CONNECT", "Connecting to database...")
    session = _get_sync_session(database_url)
    try:
        # ── Load video ──────────────────────────────────────────────────
        video = session.execute(
            select(SourceVideo).where(SourceVideo.id == video_id)
        ).scalar_one_or_none()
        if video is None:
            _log("ERROR", f"Video not found: {video_id}", level="ERROR")
            return 1
        creator = session.execute(
            select(Creator).where(Creator.id == video.creator_id)
        ).scalar_one_or_none()
        creator_name = creator.name if creator else "Unknown"
        _log(
            "VIDEO",
            f"Found: {video.filename} by {creator_name} "
            f"({video.duration_seconds or '?'}s, {video.content_type.value}, "
            f"status={video.processing_status.value})",
        )
        # ── Load key moments ────────────────────────────────────────────
        moments = (
            session.execute(
                select(KeyMoment)
                .where(KeyMoment.source_video_id == video_id)
                .order_by(KeyMoment.start_time)
            )
            .scalars()
            .all()
        )
        if not moments:
            _log("ERROR", f"No key moments found for video_id={video_id}", level="ERROR")
            _log("HINT", "Pipeline stages 2-3 must complete before export is possible", level="ERROR")
            return 1
        time_min = min(m.start_time for m in moments)
        time_max = max(m.end_time for m in moments)
        _log("MOMENTS", f"Loaded {len(moments)} key moments (time range: {time_min:.1f}s - {time_max:.1f}s)")
        # ── Load classification data ────────────────────────────────────
        classification_data: list[dict] = []
        cls_source = "missing"
        # Try Redis first
        try:
            import redis as redis_lib
            r = redis_lib.Redis.from_url(redis_url)
            key = f"chrysopedia:classification:{video_id}"
            raw = r.get(key)
            if raw is not None:
                classification_data = json.loads(raw)
                cls_source = "redis"
                ttl = r.ttl(key)
                _log("CLASSIFY", f"Source: redis ({len(classification_data)} entries, TTL={ttl}s)")
        except Exception as exc:
            _log("CLASSIFY", f"Redis unavailable: {exc}", level="WARN")
        # Fallback: check SourceVideo.classification_data column (Phase 2 addition)
        if not classification_data:
            video_cls = getattr(video, "classification_data", None)
            if video_cls:
                classification_data = video_cls
                cls_source = "postgresql"
                _log("CLASSIFY", f"Source: postgresql ({len(classification_data)} entries)")
        if not classification_data:
            _log("CLASSIFY", "No classification data found in Redis or PostgreSQL", level="WARN")
            _log("HINT", "Pipeline stage 4 must complete before classification data is available", level="WARN")
            cls_source = "missing"
        # Build classification lookup by moment_id
        cls_by_moment_id = {c["moment_id"]: c for c in classification_data}
        # Count moments without classification
        unclassified = sum(1 for m in moments if str(m.id) not in cls_by_moment_id)
        if unclassified > 0:
            _log("CLASSIFY", f"WARNING: {unclassified}/{len(moments)} moments have no classification data", level="WARN")
        # ── Build fixture ───────────────────────────────────────────────
        fixture_moments = []
        category_counts: Counter[str] = Counter()
        for m in moments:
            cls_info = cls_by_moment_id.get(str(m.id), {})
            topic_category = cls_info.get("topic_category", "Uncategorized")
            topic_tags = cls_info.get("topic_tags", [])
            category_counts[topic_category] += 1
            fixture_moments.append({
                "moment_id": str(m.id),
                "title": m.title,
                "summary": m.summary,
                "content_type": m.content_type.value,
                "start_time": m.start_time,
                "end_time": m.end_time,
                "plugins": m.plugins or [],
                "raw_transcript": m.raw_transcript or "",
                # Classification data (stage 4 output)
                "classification": {
                    "topic_category": topic_category,
                    "topic_tags": topic_tags,
                },
                # Compatibility fields for existing quality/scorer format
                "transcript_excerpt": (m.raw_transcript or "")[:500],
                "topic_tags": topic_tags,
                "topic_category": topic_category,
            })
        fixture = {
            "video_id": str(video.id),
            "creator_name": creator_name,
            "content_type": video.content_type.value,
            "filename": video.filename,
            "duration_seconds": video.duration_seconds,
            "classification_source": cls_source,
            "export_timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
            "moments": fixture_moments,
        }
        fixture_json = json.dumps(fixture, indent=2, ensure_ascii=False)
        fixture_size_kb = len(fixture_json.encode("utf-8")) / 1024
        # ── Output ──────────────────────────────────────────────────────
        if output_path:
            Path(output_path).parent.mkdir(parents=True, exist_ok=True)
            Path(output_path).write_text(fixture_json, encoding="utf-8")
            _log(
                "OUTPUT",
                f"Wrote fixture: {output_path} ({fixture_size_kb:.1f} KB, "
                f"{len(fixture_moments)} moments, {len(category_counts)} categories)",
            )
        else:
            # Print fixture JSON to stdout
            print(fixture_json)
            _log(
                "OUTPUT",
                f"Printed fixture to stdout ({fixture_size_kb:.1f} KB, "
                f"{len(fixture_moments)} moments, {len(category_counts)} categories)",
            )
        # Category breakdown
        for cat, count in category_counts.most_common():
            _log("CATEGORY", f"  {cat}: {count} moments")
        elapsed = time.monotonic() - start
        _log("DONE", f"Export completed in {elapsed:.1f}s")
        return 0
    except Exception as exc:
        _log("ERROR", f"Export failed: {exc}", level="ERROR")
        import traceback
        traceback.print_exc(file=sys.stderr)
        return 1
    finally:
        session.close()
 def main() -> int:
    parser = argparse.ArgumentParser(
        prog="pipeline.export_fixture",
        description="Export pipeline stage inputs for a video as a reusable JSON fixture",
    )
    parser.add_argument(
        "--video-id",
        type=str,
        help="UUID of the video to export",
    )
    parser.add_argument(
        "--output", "-o",
        type=str,
        default=None,
        help="Output file path (default: print to stdout)",
    )
    parser.add_argument(
        "--list",
        action="store_true",
        default=False,
        help="List all videos with status and moment counts",
    )
    args = parser.parse_args()
    # Load settings
    from config import get_settings
    settings = get_settings()
    if args.list:
        return _list_videos(settings.database_url)
    if not args.video_id:
        parser.error("--video-id is required (or use --list to see available videos)")
    return export_fixture(
        database_url=settings.database_url,
        redis_url=settings.redis_url,
        video_id=args.video_id,
        output_path=args.output,
    )
 if __name__ == "__main__":
    sys.exit(main())
--- a/backend/pipeline/highlight_schemas.py
+++ b/backend/pipeline/highlight_schemas.py
@ -0,0 +1,63 @@
 """Pydantic schemas for highlight detection pipeline.
 Covers scoring breakdown, candidate responses, and batch result summaries.
 """
 from __future__ import annotations
 import uuid
 from datetime import datetime
 from pydantic import BaseModel, Field
 class HighlightScoreBreakdown(BaseModel):
    """Per-dimension score breakdown for a highlight candidate.
    Each field is a float in [0, 1] representing the normalized score
    for that scoring dimension.
    """
    duration_score: float = Field(description="Score based on moment duration (sweet-spot curve)")
    content_density_score: float = Field(description="Score based on transcript richness / word density")
    technique_relevance_score: float = Field(description="Score based on content_type and plugin mentions")
    position_score: float = Field(description="Score based on temporal position within the video")
    uniqueness_score: float = Field(description="Score based on title/topic distinctness among siblings")
    engagement_proxy_score: float = Field(description="Proxy engagement signal from summary quality/length")
    plugin_diversity_score: float = Field(description="Score based on breadth of plugins/tools mentioned")
    speech_rate_variance_score: float = Field(
        default=0.5,
        description="Score based on speech rate variation (emphasis shifts) from word timings",
    )
    pause_density_score: float = Field(
        default=0.5,
        description="Score based on strategic pause frequency from word timings",
    )
    speaking_pace_score: float = Field(
        default=0.5,
        description="Score based on words-per-second fitness for teaching pace",
    )
 class HighlightCandidateResponse(BaseModel):
    """API response schema for a single highlight candidate."""
    id: uuid.UUID
    key_moment_id: uuid.UUID
    source_video_id: uuid.UUID
    score: float = Field(ge=0.0, le=1.0, description="Composite highlight score")
    score_breakdown: HighlightScoreBreakdown
    duration_secs: float = Field(ge=0.0, description="Duration of the key moment in seconds")
    status: str = Field(description="One of: candidate, approved, rejected")
    created_at: datetime
    model_config = {"from_attributes": True}
 class HighlightBatchResult(BaseModel):
    """Summary of a highlight scoring batch run for one video."""
    video_id: uuid.UUID
    candidates_created: int = Field(ge=0, description="Number of new candidates inserted")
    candidates_updated: int = Field(ge=0, description="Number of existing candidates re-scored")
    top_score: float = Field(ge=0.0, le=1.0, description="Highest score in this batch")
--- a/backend/pipeline/highlight_scorer.py
+++ b/backend/pipeline/highlight_scorer.py
@ -0,0 +1,413 @@
 """Heuristic scoring engine for highlight candidate detection.
 Takes KeyMoment data + context (source quality, video content type) and
 returns a composite score in [0, 1] with a 10-dimension breakdown.
 The breakdown fields align with HighlightScoreBreakdown in highlight_schemas.py:
  duration_score, content_density_score, technique_relevance_score,
  position_score, uniqueness_score, engagement_proxy_score, plugin_diversity_score,
  speech_rate_variance_score, pause_density_score, speaking_pace_score
 """
 from __future__ import annotations
 import math
 import re
 import statistics
 from typing import Any
 # ── Weights per dimension (must sum to 1.0) ──────────────────────────────────
 _WEIGHTS: dict[str, float] = {
    "duration_score": 0.20,
    "content_density_score": 0.15,
    "technique_relevance_score": 0.15,
    "plugin_diversity_score": 0.08,
    "engagement_proxy_score": 0.08,
    "position_score": 0.08,          # mapped from source_quality
    "uniqueness_score": 0.04,        # mapped from video_type
    "speech_rate_variance_score": 0.08,
    "pause_density_score": 0.07,
    "speaking_pace_score": 0.07,
 }
 assert abs(sum(_WEIGHTS.values()) - 1.0) < 1e-9, "Weights must sum to 1.0"
 # ── Individual scoring functions ─────────────────────────────────────────────
 def _duration_fitness(duration_secs: float) -> float:
    """Bell-curve around 30-60s sweet spot.
    Peak at 30-60s (score 1.0), penalty below 15s and above 120s,
    zero above 300s.
    """
    if duration_secs <= 0:
        return 0.0
    if duration_secs >= 300:
        return 0.0
    # Sweet spot: 30-60s → 1.0
    if 30 <= duration_secs <= 60:
        return 1.0
    # Below sweet spot: linear ramp from 0 at 0s to 1.0 at 30s
    # with steeper penalty below 15s
    if duration_secs < 30:
        if duration_secs < 15:
            return duration_secs / 30.0  # 0→0.5 over 0-15s
        return 0.5 + (duration_secs - 15) / 30.0  # 0.5→1.0 over 15-30s
    # Above sweet spot: gradual decay from 1.0 at 60s to 0.0 at 300s
    return max(0.0, 1.0 - (duration_secs - 60) / 240.0)
 def _content_type_weight(content_type: str | None) -> float:
    """Score based on KeyMoment content_type.
    technique=1.0, settings=0.8, workflow=0.6, reasoning=0.4
    """
    mapping = {
        "technique": 1.0,
        "settings": 0.8,
        "workflow": 0.6,
        "reasoning": 0.4,
    }
    return mapping.get(content_type or "", 0.5)
 def _specificity_density(summary: str | None) -> float:
    """Score based on specificity signals in the summary.
    Counts specific values (numbers, plugin names, dB, Hz, ms, %, ratios)
    normalized by summary length.
    """
    if not summary:
        return 0.0
    words = summary.split()
    word_count = len(words)
    if word_count == 0:
        return 0.0
    # Patterns that indicate specificity
    specificity_patterns = [
        r"\b\d+\.?\d*\s*(?:dB|Hz|kHz|ms|sec|bpm|%)\b",  # units
        r"\b\d+\.?\d*\s*/\s*\d+\.?\d*\b",                # ratios like 3/4
        r"\b\d{2,}\b",                                     # multi-digit numbers
        r"\b\d+\.\d+\b",                                   # decimal numbers
    ]
    hits = 0
    for pattern in specificity_patterns:
        hits += len(re.findall(pattern, summary, re.IGNORECASE))
    # Normalize: ~1 specific value per 10 words is high density
    density = hits / (word_count / 10.0)
    return min(density, 1.0)
 def _plugin_richness(plugins: list[str] | None) -> float:
    """Score based on number of plugins mentioned.
    min(len(plugins) / 3, 1.0)
    """
    if not plugins:
        return 0.0
    return min(len(plugins) / 3.0, 1.0)
 def _transcript_energy(raw_transcript: str | None) -> float:
    """Score based on teaching/engagement phrases in transcript.
    Counts teaching phrases ('the trick is', 'notice how', 'because',
    'I always', 'the key is', 'what I do') normalized by transcript
    word count.
    """
    if not raw_transcript:
        return 0.0
    words = raw_transcript.split()
    word_count = len(words)
    if word_count == 0:
        return 0.0
    teaching_phrases = [
        "the trick is",
        "notice how",
        "because",
        "i always",
        "the key is",
        "what i do",
        "important thing",
        "you want to",
        "make sure",
        "here's why",
    ]
    text_lower = raw_transcript.lower()
    hits = sum(text_lower.count(phrase) for phrase in teaching_phrases)
    # Normalize: ~1 phrase per 50 words is high energy
    energy = hits / (word_count / 50.0)
    return min(energy, 1.0)
 def _source_quality_weight(source_quality: str | None) -> float:
    """Score based on TechniquePage source_quality.
    structured=1.0, mixed=0.7, unstructured=0.4, None=0.5
    """
    mapping = {
        "structured": 1.0,
        "mixed": 0.7,
        "unstructured": 0.4,
    }
    return mapping.get(source_quality or "", 0.5)
 def _video_type_weight(video_content_type: str | None) -> float:
    """Score based on SourceVideo content_type.
    tutorial=1.0, breakdown=0.9, livestream=0.5, short_form=0.3
    """
    mapping = {
        "tutorial": 1.0,
        "breakdown": 0.9,
        "livestream": 0.5,
        "short_form": 0.3,
    }
    return mapping.get(video_content_type or "", 0.5)
 # ── Audio proxy scoring functions ─────────────────────────────────────────────
 def extract_word_timings(
    transcript_data: list[dict[str, Any]],
    start_time: float,
    end_time: float,
 ) -> list[dict[str, Any]]:
    """Extract word-level timing dicts from transcript segments within a time window.
    Parameters
    ----------
    transcript_data : list[dict]
        Parsed transcript JSON — list of segments, each with a ``words`` array.
        Each word dict must have ``start`` and ``end`` float fields (seconds).
    start_time : float
        Window start in seconds (inclusive).
    end_time : float
        Window end in seconds (inclusive).
    Returns
    -------
    list[dict] — word-timing dicts whose ``start`` falls within [start_time, end_time].
    """
    if not transcript_data:
        return []
    words: list[dict[str, Any]] = []
    for segment in transcript_data:
        seg_words = segment.get("words")
        if not seg_words:
            continue
        for w in seg_words:
            w_start = w.get("start")
            if w_start is None:
                continue
            if start_time <= w_start <= end_time:
                words.append(w)
    return words
 def _speech_rate_variance(word_timings: list[dict[str, Any]] | None) -> float:
    """Compute normalized stdev of words-per-second in sliding windows.
    High variance indicates emphasis shifts (speeding up / slowing down),
    which correlates with engaging highlights.
    Uses 5-second sliding windows with 2.5-second step.
    Returns 0.5 (neutral) when word_timings is None or insufficient data.
    """
    if not word_timings or len(word_timings) < 4:
        return 0.5
    # Determine time span
    first_start = word_timings[0].get("start", 0.0)
    last_start = word_timings[-1].get("start", 0.0)
    span = last_start - first_start
    if span < 5.0:
        return 0.5
    # Compute WPS in 5s sliding windows with 2.5s step
    window_size = 5.0
    step = 2.5
    wps_values: list[float] = []
    t = first_start
    while t + window_size <= last_start + 0.01:
        count = sum(
            1 for w in word_timings
            if t <= w.get("start", 0.0) < t + window_size
        )
        wps_values.append(count / window_size)
        t += step
    if len(wps_values) < 2:
        return 0.5
    mean_wps = statistics.mean(wps_values)
    if mean_wps < 0.01:
        return 0.5
    stdev = statistics.stdev(wps_values)
    # Normalize: coefficient of variation, capped at 1.0
    # CV of ~0.3-0.5 is typical for varied speech; >0.5 is high variance
    cv = stdev / mean_wps
    return min(cv / 0.6, 1.0)
 def _pause_density(word_timings: list[dict[str, Any]] | None) -> float:
    """Count strategic pauses normalized by duration.
    Inter-word gaps >0.5s and inter-segment gaps >1.0s indicate deliberate
    pauses for emphasis, which correlate with better highlights.
    Returns 0.5 (neutral) when word_timings is None or insufficient data.
    """
    if not word_timings or len(word_timings) < 2:
        return 0.5
    first_start = word_timings[0].get("start", 0.0)
    last_end = word_timings[-1].get("end", word_timings[-1].get("start", 0.0))
    duration = last_end - first_start
    if duration < 1.0:
        return 0.5
    short_pauses = 0  # >0.5s gaps
    long_pauses = 0   # >1.0s gaps
    for i in range(1, len(word_timings)):
        prev_end = word_timings[i - 1].get("end", word_timings[i - 1].get("start", 0.0))
        curr_start = word_timings[i].get("start", 0.0)
        gap = curr_start - prev_end
        if gap > 1.0:
            long_pauses += 1
        elif gap > 0.5:
            short_pauses += 1
    # Weight long pauses more heavily
    weighted_pauses = short_pauses + long_pauses * 2.0
    # Normalize: ~2-4 weighted pauses per 30s is good density
    density = weighted_pauses / (duration / 15.0)
    return min(density, 1.0)
 def _speaking_pace_fitness(word_timings: list[dict[str, Any]] | None) -> float:
    """Bell-curve score around 3-5 words-per-second optimal teaching pace.
    3-5 WPS is the sweet spot for tutorial content — fast enough to be
    engaging, slow enough for comprehension. Returns 0.5 (neutral) when
    word_timings is None or insufficient data.
    """
    if not word_timings or len(word_timings) < 2:
        return 0.5
    first_start = word_timings[0].get("start", 0.0)
    last_end = word_timings[-1].get("end", word_timings[-1].get("start", 0.0))
    duration = last_end - first_start
    if duration < 1.0:
        return 0.5
    wps = len(word_timings) / duration
    # Sweet spot: 3-5 WPS → 1.0
    if 3.0 <= wps <= 5.0:
        return 1.0
    # Below sweet spot: linear ramp from 0 at 0 WPS to 1.0 at 3 WPS
    if wps < 3.0:
        return max(0.0, wps / 3.0)
    # Above sweet spot: decay from 1.0 at 5 WPS to 0.0 at 10 WPS
    if wps > 5.0:
        return max(0.0, 1.0 - (wps - 5.0) / 5.0)
    return 0.5  # unreachable, but defensive
 # ── Main scoring function ───────────────────────────────────────────────────
 def score_moment(
    *,
    start_time: float,
    end_time: float,
    content_type: str | None = None,
    summary: str | None = None,
    plugins: list[str] | None = None,
    raw_transcript: str | None = None,
    source_quality: str | None = None,
    video_content_type: str | None = None,
    word_timings: list[dict[str, Any]] | None = None,
 ) -> dict[str, Any]:
    """Score a KeyMoment for highlight potential.
    Parameters
    ----------
    start_time : float
        Moment start in seconds.
    end_time : float
        Moment end in seconds.
    content_type : str | None
        KeyMoment content type (technique, settings, workflow, reasoning).
    summary : str | None
        KeyMoment summary text.
    plugins : list[str] | None
        Plugins mentioned in the moment.
    raw_transcript : str | None
        Raw transcript text of the moment.
    source_quality : str | None
        TechniquePage source quality (structured, mixed, unstructured).
    video_content_type : str | None
        SourceVideo content type (tutorial, breakdown, livestream, short_form).
    word_timings : list[dict] | None
        Word-level timing dicts with ``start`` and ``end`` keys (seconds).
        When None, audio proxy dimensions score 0.5 (neutral).
    Returns
    -------
    dict with keys:
        score : float in [0.0, 1.0]
        score_breakdown : dict mapping dimension names to float scores
        duration_secs : float
    """
    duration_secs = max(0.0, end_time - start_time)
    breakdown = {
        "duration_score": _duration_fitness(duration_secs),
        "content_density_score": _specificity_density(summary),
        "technique_relevance_score": _content_type_weight(content_type),
        "plugin_diversity_score": _plugin_richness(plugins),
        "engagement_proxy_score": _transcript_energy(raw_transcript),
        "position_score": _source_quality_weight(source_quality),
        "uniqueness_score": _video_type_weight(video_content_type),
        "speech_rate_variance_score": _speech_rate_variance(word_timings),
        "pause_density_score": _pause_density(word_timings),
        "speaking_pace_score": _speaking_pace_fitness(word_timings),
    }
    # Weighted composite
    composite = sum(
        breakdown[dim] * weight for dim, weight in _WEIGHTS.items()
    )
    # Clamp to [0, 1] for safety
    composite = max(0.0, min(1.0, composite))
    return {
        "score": composite,
        "score_breakdown": breakdown,
        "duration_secs": duration_secs,
    }
--- a/backend/pipeline/llm_client.py
+++ b/backend/pipeline/llm_client.py
@ -0,0 +1,357 @@
 """Synchronous LLM client with primary/fallback endpoint logic.
 Uses the OpenAI-compatible API (works with Ollama, vLLM, OpenWebUI, etc.).
 Celery tasks run synchronously, so this uses ``openai.OpenAI`` (not Async).
 Supports two modalities:
 - **chat**: Standard JSON mode with ``response_format: {"type": "json_object"}``
 - **thinking**: For reasoning models that emit ``<think>...</think>`` blocks
  before their answer. Skips ``response_format``, appends JSON instructions to
  the system prompt, and strips think tags from the response.
 """
 from __future__ import annotations
 import logging
 import re
 from typing import TYPE_CHECKING, TypeVar
 if TYPE_CHECKING:
    from collections.abc import Callable
 import openai
 from pydantic import BaseModel
 from config import Settings
 logger = logging.getLogger(__name__)
 T = TypeVar("T", bound=BaseModel)
 # ── LLM Response wrapper ─────────────────────────────────────────────────────
 class LLMResponse(str):
    """String subclass that carries LLM response metadata.
    Backward-compatible with all code that treats the response as a plain
    string, but callers that know about it can inspect finish_reason and
    the truncated property.
    """
    finish_reason: str | None
    prompt_tokens: int | None
    completion_tokens: int | None
    def __new__(
        cls,
        text: str,
        finish_reason: str | None = None,
        prompt_tokens: int | None = None,
        completion_tokens: int | None = None,
    ):
        obj = super().__new__(cls, text)
        obj.finish_reason = finish_reason
        obj.prompt_tokens = prompt_tokens
        obj.completion_tokens = completion_tokens
        return obj
    @property
    def truncated(self) -> bool:
        """True if the model hit its token limit before finishing."""
        return self.finish_reason == "length"
 # ── Think-tag stripping ──────────────────────────────────────────────────────
 _THINK_PATTERN = re.compile(r"<think>.*?</think>", re.DOTALL)
 def strip_think_tags(text: str) -> str:
    """Remove ``<think>...</think>`` blocks from LLM output.
    Thinking/reasoning models often prefix their JSON with a reasoning trace
    wrapped in ``<think>`` tags. This strips all such blocks (including
    multiline and multiple occurrences) and returns the cleaned text.
    Handles:
    - Single ``<think>...</think>`` block
    - Multiple blocks in one response
    - Multiline content inside think tags
    - Responses with no think tags (passthrough)
    - Empty input (passthrough)
    """
    if not text:
        return text
    cleaned = _THINK_PATTERN.sub("", text)
    return cleaned.strip()
 # ── Token estimation ─────────────────────────────────────────────────────────
 # Stage-specific output multipliers: estimated output tokens as a ratio of input tokens.
 # Tuned from actual pipeline data (KCL Ep 31 audit, April 2026):
 #   stage2: actual compl/prompt = 680/39312 = 0.017 → use 0.05 with buffer
 #   stage3: actual compl/prompt ≈ 1000/7000 = 0.14 → use 0.3 with buffer
 #   stage4: actual compl/prompt = 740/3736 = 0.20 → use 0.3 with buffer
 #   stage5: actual compl/prompt ≈ 2500/7000 = 0.36 → use 0.8 with buffer
 _STAGE_OUTPUT_RATIOS: dict[str, float] = {
    "stage2_segmentation": 0.05,   # Compact topic groups — much smaller than input
    "stage3_extraction": 0.3,      # Key moments with summaries — moderate
    "stage4_classification": 0.3,  # Tags + categories per moment
    "stage5_synthesis": 0.8,       # Full prose technique pages — heaviest output
 }
 # Minimum floor so we never send a trivially small max_tokens
 _MIN_MAX_TOKENS = 4096
 def estimate_tokens(text: str) -> int:
    """Estimate token count from text using a chars-per-token heuristic.
    Uses 3.5 chars/token which is conservative for English + JSON markup.
    """
    if not text:
        return 0
    return max(1, int(len(text) / 3.5))
 def estimate_max_tokens(
    system_prompt: str,
    user_prompt: str,
    stage: str | None = None,
    hard_limit: int = 32768,
 ) -> int:
    """Return the hard_limit as max_tokens for all stages.
    Previously used dynamic estimation based on input size and stage-specific
    multipliers, but thinking models consume unpredictable token budgets for
    internal reasoning. A static ceiling avoids truncation errors.
    The hard_limit value comes from Settings.llm_max_tokens_hard_limit (96000).
    """
    input_tokens = estimate_tokens(system_prompt) + estimate_tokens(user_prompt)
    logger.info(
        "Token estimate: input≈%d, stage=%s, max_tokens=%d (static hard_limit)",
        input_tokens, stage or "default", hard_limit,
    )
    return hard_limit
 class LLMClient:
    """Sync LLM client that tries a primary endpoint and falls back on failure."""
    def __init__(self, settings: Settings) -> None:
        self.settings = settings
        self._primary = openai.OpenAI(
            base_url=settings.llm_api_url,
            api_key=settings.llm_api_key,
        )
        self._fallback = openai.OpenAI(
            base_url=settings.llm_fallback_url,
            api_key=settings.llm_api_key,
        )
    # ── Core completion ──────────────────────────────────────────────────
    def complete(
        self,
        system_prompt: str,
        user_prompt: str,
        response_model: type[BaseModel] | None = None,
        modality: str = "chat",
        model_override: str | None = None,
        on_complete: "Callable | None" = None,
        max_tokens: int | None = None,
    ) -> "LLMResponse":
        """Send a chat completion request, falling back on connection/timeout errors.
        Parameters
        ----------
        system_prompt:
            System message content.
        user_prompt:
            User message content.
        response_model:
            If provided and modality is "chat", ``response_format`` is set to
            ``{"type": "json_object"}``. For "thinking" modality, JSON
            instructions are appended to the system prompt instead.
        modality:
            Either "chat" (default) or "thinking". Thinking modality skips
            response_format and strips ``<think>`` tags from output.
        model_override:
            Model name to use instead of the default. If None, uses the
            configured default for the endpoint.
        max_tokens:
            Override for max_tokens on this call. If None, falls back to
            the configured ``llm_max_tokens`` from settings.
        Returns
        -------
        LLMResponse
            Raw completion text (str subclass) with finish_reason metadata.
        """
        kwargs: dict = {}
        effective_system = system_prompt
        if modality == "thinking":
            # Thinking models often don't support response_format: json_object.
            # Instead, append explicit JSON instructions to the system prompt.
            if response_model is not None:
                json_schema_hint = (
                    "\n\nYou MUST respond with ONLY valid JSON. "
                    "No markdown code fences, no explanation, no preamble — "
                    "just the raw JSON object."
                )
                effective_system = system_prompt + json_schema_hint
        else:
            # Chat modality — use standard JSON mode
            if response_model is not None:
                kwargs["response_format"] = {"type": "json_object"}
        messages = [
            {"role": "system", "content": effective_system},
            {"role": "user", "content": user_prompt},
        ]
        primary_model = model_override or self.settings.llm_model
        fallback_model = self.settings.llm_fallback_model
        effective_max_tokens = max_tokens if max_tokens is not None else self.settings.llm_max_tokens
        effective_temperature = self.settings.llm_temperature
        logger.info(
            "LLM request: model=%s, modality=%s, response_model=%s, max_tokens=%d, temperature=%.1f",
            primary_model,
            modality,
            response_model.__name__ if response_model else None,
            effective_max_tokens,
            effective_temperature,
        )
        # --- Try primary endpoint ---
        try:
            response = self._primary.chat.completions.create(
                model=primary_model,
                messages=messages,
                max_tokens=effective_max_tokens,
                temperature=effective_temperature,
                **kwargs,
            )
            raw = response.choices[0].message.content or ""
            usage = getattr(response, "usage", None)
            if usage:
                logger.info(
                    "LLM response: prompt_tokens=%s, completion_tokens=%s, total=%s, content_len=%d, finish=%s",
                    usage.prompt_tokens, usage.completion_tokens, usage.total_tokens,
                    len(raw), response.choices[0].finish_reason,
                )
            if modality == "thinking":
                raw = strip_think_tags(raw)
            finish = response.choices[0].finish_reason if response.choices else None
            if on_complete is not None:
                try:
                    on_complete(
                        model=primary_model,
                        prompt_tokens=usage.prompt_tokens if usage else None,
                        completion_tokens=usage.completion_tokens if usage else None,
                        total_tokens=usage.total_tokens if usage else None,
                        content=raw,
                        finish_reason=finish,
                    )
                except Exception as cb_exc:
                    logger.warning("on_complete callback failed: %s", cb_exc)
            return LLMResponse(
                raw,
                finish_reason=finish,
                prompt_tokens=usage.prompt_tokens if usage else None,
                completion_tokens=usage.completion_tokens if usage else None,
            )
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            logger.warning(
                "Primary LLM endpoint failed (%s: %s), trying fallback at %s",
                type(exc).__name__,
                exc,
                self.settings.llm_fallback_url,
            )
        # --- Try fallback endpoint ---
        try:
            response = self._fallback.chat.completions.create(
                model=fallback_model,
                messages=messages,
                max_tokens=effective_max_tokens,
                temperature=effective_temperature,
                **kwargs,
            )
            raw = response.choices[0].message.content or ""
            usage = getattr(response, "usage", None)
            if usage:
                logger.info(
                    "LLM response (fallback): prompt_tokens=%s, completion_tokens=%s, total=%s, content_len=%d, finish=%s",
                    usage.prompt_tokens, usage.completion_tokens, usage.total_tokens,
                    len(raw), response.choices[0].finish_reason,
                )
            if modality == "thinking":
                raw = strip_think_tags(raw)
            finish = response.choices[0].finish_reason if response.choices else None
            if on_complete is not None:
                try:
                    on_complete(
                        model=fallback_model,
                        prompt_tokens=usage.prompt_tokens if usage else None,
                        completion_tokens=usage.completion_tokens if usage else None,
                        total_tokens=usage.total_tokens if usage else None,
                        content=raw,
                        finish_reason=finish,
                        is_fallback=True,
                    )
                except Exception as cb_exc:
                    logger.warning("on_complete callback failed: %s", cb_exc)
            return LLMResponse(
                raw,
                finish_reason=finish,
                prompt_tokens=usage.prompt_tokens if usage else None,
                completion_tokens=usage.completion_tokens if usage else None,
            )
        except (openai.APIConnectionError, openai.APITimeoutError, openai.APIError) as exc:
            logger.error(
                "Fallback LLM endpoint also failed (%s: %s). Giving up.",
                type(exc).__name__,
                exc,
            )
            raise
    # ── Response parsing ─────────────────────────────────────────────────
    def parse_response(self, text: str, model: type[T]) -> T:
        """Parse raw LLM output as JSON and validate against a Pydantic model.
        Parameters
        ----------
        text:
            Raw JSON string from the LLM.
        model:
            Pydantic model class to validate against.
        Returns
        -------
        T
            Validated Pydantic model instance.
        Raises
        ------
        pydantic.ValidationError
            If the JSON doesn't match the schema.
        ValueError
            If the text is not valid JSON.
        """
        try:
            return model.model_validate_json(text)
        except Exception:
            logger.error(
                "Failed to parse LLM response as %s. Response text: %.500s",
                model.__name__,
                text,
            )
            raise
--- a/backend/pipeline/qdrant_client.py
+++ b/backend/pipeline/qdrant_client.py
@ -0,0 +1,320 @@
 """Qdrant vector database manager for collection lifecycle and point upserts.
 Handles collection creation (idempotent) and batch upserts for technique pages
 and key moments. Connection failures are non-blocking — the pipeline continues
 without search indexing.
 """
 from __future__ import annotations
 import logging
 import uuid
 from qdrant_client import QdrantClient
 from qdrant_client.http import exceptions as qdrant_exceptions
 from qdrant_client.models import Distance, PointStruct, VectorParams
 from config import Settings
 logger = logging.getLogger(__name__)
 # Namespace UUID for deterministic point IDs
 _QDRANT_NAMESPACE = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
 class QdrantManager:
    """Manages a Qdrant collection for Chrysopedia technique-page and key-moment vectors."""
    def __init__(self, settings: Settings) -> None:
        self.settings = settings
        self._client = QdrantClient(url=settings.qdrant_url)
        self._collection = settings.qdrant_collection
    # ── Collection management ────────────────────────────────────────────
    def ensure_collection(self) -> None:
        """Create the collection if it does not already exist.
        Uses cosine distance and the configured embedding dimensions.
        """
        try:
            if self._client.collection_exists(self._collection):
                logger.info("Qdrant collection '%s' already exists.", self._collection)
                return
            self._client.create_collection(
                collection_name=self._collection,
                vectors_config=VectorParams(
                    size=self.settings.embedding_dimensions,
                    distance=Distance.COSINE,
                ),
            )
            logger.info(
                "Created Qdrant collection '%s' (dim=%d, cosine).",
                self._collection,
                self.settings.embedding_dimensions,
            )
        except qdrant_exceptions.UnexpectedResponse as exc:
            logger.warning(
                "Qdrant error during ensure_collection (%s). Skipping.",
                exc,
            )
        except Exception as exc:
            logger.warning(
                "Qdrant connection failed during ensure_collection (%s: %s). Skipping.",
                type(exc).__name__,
                exc,
            )
    # ── Deletion ───────────────────────────────────────────────────────────
    def delete_by_video_id(self, video_id: str) -> int:
        """Delete all points (key moments + technique pages) associated with a video.
        Key moments have source_video_id in payload.
        Technique pages don't have direct video linkage, so only moments are deleted.
        Returns the count of deleted points (best-effort — Qdrant may not report exact counts).
        """
        from qdrant_client.models import Filter, FieldCondition, MatchValue
        try:
            result = self._client.delete(
                collection_name=self._collection,
                points_selector=Filter(
                    must=[
                        FieldCondition(
                            key="source_video_id",
                            match=MatchValue(value=video_id),
                        ),
                    ],
                ),
            )
            logger.info(
                "Deleted Qdrant points for video_id=%s from collection '%s'.",
                video_id,
                self._collection,
            )
            return 0  # Qdrant delete doesn't return count
        except Exception as exc:
            logger.warning(
                "Qdrant delete for video_id=%s failed (%s: %s). Skipping.",
                video_id,
                type(exc).__name__,
                exc,
            )
            return 0
    # ── Low-level upsert ─────────────────────────────────────────────────
    def upsert_points(self, points: list[PointStruct]) -> None:
        """Upsert a batch of pre-built PointStruct objects."""
        if not points:
            return
        try:
            self._client.upsert(
                collection_name=self._collection,
                points=points,
            )
            logger.info(
                "Upserted %d points to Qdrant collection '%s'.",
                len(points),
                self._collection,
            )
        except qdrant_exceptions.UnexpectedResponse as exc:
            logger.warning(
                "Qdrant upsert failed (%s). %d points skipped.",
                exc,
                len(points),
            )
        except Exception as exc:
            logger.warning(
                "Qdrant upsert connection error (%s: %s). %d points skipped.",
                type(exc).__name__,
                exc,
                len(points),
            )
    # ── High-level upserts ───────────────────────────────────────────────
    def upsert_technique_pages(
        self,
        pages: list[dict],
        vectors: list[list[float]],
    ) -> None:
        """Build and upsert PointStructs for technique pages.
        Each page dict must contain:
            page_id, creator_id, title, topic_category, topic_tags, summary
        Parameters
        ----------
        pages:
            Metadata dicts, one per technique page.
        vectors:
            Corresponding embedding vectors (same order as pages).
        """
        if len(pages) != len(vectors):
            logger.warning(
                "Technique-page count (%d) != vector count (%d). Skipping upsert.",
                len(pages),
                len(vectors),
            )
            return
        points = []
        for page, vector in zip(pages, vectors):
            # Deterministic UUID: same page always gets the same point ID
            point_id = str(uuid.uuid5(_QDRANT_NAMESPACE, f"tp:{page['page_id']}"))
            point = PointStruct(
                id=point_id,
                vector=vector,
                payload={
                    "type": "technique_page",
                    "page_id": page["page_id"],
                    "creator_id": page["creator_id"],
                    "creator_name": page.get("creator_name", ""),
                    "title": page["title"],
                    "slug": page.get("slug", ""),
                    "topic_category": page["topic_category"],
                    "topic_tags": page.get("topic_tags") or [],
                    "summary": page.get("summary") or "",
                },
            )
            points.append(point)
        self.upsert_points(points)
    def upsert_key_moments(
        self,
        moments: list[dict],
        vectors: list[list[float]],
    ) -> None:
        """Build and upsert PointStructs for key moments.
        Each moment dict must contain:
            moment_id, source_video_id, title, start_time, end_time, content_type
        Parameters
        ----------
        moments:
            Metadata dicts, one per key moment.
        vectors:
            Corresponding embedding vectors (same order as moments).
        """
        if len(moments) != len(vectors):
            logger.warning(
                "Key-moment count (%d) != vector count (%d). Skipping upsert.",
                len(moments),
                len(vectors),
            )
            return
        points = []
        for moment, vector in zip(moments, vectors):
            # Deterministic UUID: same moment always gets the same point ID
            point_id = str(uuid.uuid5(_QDRANT_NAMESPACE, f"km:{moment['moment_id']}"))
            point = PointStruct(
                id=point_id,
                vector=vector,
                payload={
                    "type": "key_moment",
                    "moment_id": moment["moment_id"],
                    "source_video_id": moment["source_video_id"],
                    "creator_id": moment.get("creator_id", ""),
                    "technique_page_id": moment.get("technique_page_id", ""),
                    "technique_page_slug": moment.get("technique_page_slug", ""),
                    "title": moment["title"],
                    "creator_name": moment.get("creator_name", ""),
                    "start_time": moment["start_time"],
                    "end_time": moment["end_time"],
                    "content_type": moment["content_type"],
                },
            )
            points.append(point)
        self.upsert_points(points)
    # ── Technique section operations ─────────────────────────────────────
    def delete_sections_by_page_id(self, page_id: str) -> None:
        """Delete all technique_section points for a given page_id.
        Called before re-upserting sections to prevent orphan points when
        headings are renamed or sections removed. Non-blocking — logs warning
        on failure.
        """
        from qdrant_client.models import FieldCondition, Filter, MatchValue
        try:
            self._client.delete(
                collection_name=self._collection,
                points_selector=Filter(
                    must=[
                        FieldCondition(
                            key="page_id",
                            match=MatchValue(value=page_id),
                        ),
                        FieldCondition(
                            key="type",
                            match=MatchValue(value="technique_section"),
                        ),
                    ],
                ),
            )
            logger.info(
                "Deleted technique_section points for page_id=%s from '%s'.",
                page_id, self._collection,
            )
        except Exception as exc:
            logger.warning(
                "Qdrant delete sections for page_id=%s failed (%s: %s). Skipping.",
                page_id, type(exc).__name__, exc,
            )
    def upsert_technique_sections(
        self,
        sections: list[dict],
        vectors: list[list[float]],
    ) -> None:
        """Build and upsert PointStructs for technique page sections.
        Each section dict must contain:
            page_id, section_anchor, section_heading, creator_id, creator_name,
            title (page title), slug (page slug), topic_category, topic_tags, summary
        Uses deterministic UUIDs: ``uuid5(namespace, 'ts:{page_id}:{section_anchor}')``.
        """
        if len(sections) != len(vectors):
            logger.warning(
                "Technique-section count (%d) != vector count (%d). Skipping upsert.",
                len(sections), len(vectors),
            )
            return
        points = []
        for sec, vector in zip(sections, vectors):
            point_id = str(uuid.uuid5(
                _QDRANT_NAMESPACE,
                f"ts:{sec['page_id']}:{sec['section_anchor']}",
            ))
            point = PointStruct(
                id=point_id,
                vector=vector,
                payload={
                    "type": "technique_section",
                    "page_id": sec["page_id"],
                    "creator_id": sec.get("creator_id", ""),
                    "creator_name": sec.get("creator_name", ""),
                    "title": sec.get("title", ""),
                    "slug": sec.get("slug", ""),
                    "section_heading": sec["section_heading"],
                    "section_anchor": sec["section_anchor"],
                    "topic_category": sec.get("topic_category", ""),
                    "topic_tags": sec.get("topic_tags") or [],
                    "summary": (sec.get("summary") or "")[:200],
                },
            )
            points.append(point)
        self.upsert_points(points)
--- a/backend/pipeline/quality/init.py
+++ b/backend/pipeline/quality/init.py
@ -0,0 +1,11 @@
 """FYN-LLM quality assurance toolkit."""
 import os
 import sys
 # Ensure backend/ is on sys.path so sibling modules (config, pipeline.llm_client)
 # resolve when running from the project root via symlink.
 _backend_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "..")
 _backend_abs = os.path.normpath(os.path.abspath(_backend_dir))
 if _backend_abs not in sys.path:
    sys.path.insert(0, _backend_abs)
--- a/backend/pipeline/quality/main.py
+++ b/backend/pipeline/quality/main.py
@ -0,0 +1,646 @@
 """FYN-LLM quality assurance toolkit.
 Subcommands:
  fitness   — Run LLM fitness tests across four categories
  score     — Score a Stage 5 technique page across 5 quality dimensions
  optimize  — Automated prompt optimization loop with leaderboard output
 Run with: python -m pipeline.quality <command>
 """
 from __future__ import annotations
 import argparse
 import json
 import sys
 from datetime import datetime, timezone
 from pathlib import Path
 from config import get_settings
 from pipeline.llm_client import LLMClient
 from .chat_eval import ChatEvalRunner
 from .chat_scorer import ChatScoreRunner
 from .fitness import FitnessRunner
 from .optimizer import OptimizationLoop, OptimizationResult
 from .scorer import DIMENSIONS, STAGE_CONFIGS, ScoreRunner
 # ── Reporting helpers ────────────────────────────────────────────────────────
 def print_leaderboard(result: OptimizationResult, stage: int = 5) -> None:
    """Print a formatted leaderboard of top 5 variants by composite score."""
    dims = STAGE_CONFIGS[stage].dimensions if stage in STAGE_CONFIGS else DIMENSIONS
    # Filter to entries that actually scored (no errors)
    scored = [h for h in result.history if not h.get("error")]
    if not scored:
        print("\n  No successfully scored variants to rank.\n")
        return
    ranked = sorted(scored, key=lambda h: h["composite"], reverse=True)[:5]
    print(f"\n{'='*72}")
    print(f"  LEADERBOARD — Top 5 Variants by Composite Score (Stage {stage})")
    print(f"{'='*72}")
    # Header
    dim_headers = "  ".join(f"{d[:5]:>5s}" for d in dims)
    sep_segments = "  ".join("─" * 5 for _ in dims)
    print(f"  {'#':>2s}  {'Label':<16s}  {'Comp':>5s}  {dim_headers}")
    print(f"  {'─'*2}  {'─'*16}  {'─'*5}  {sep_segments}")
    for i, entry in enumerate(ranked, 1):
        label = entry.get("label", "?")[:16]
        comp = entry["composite"]
        dim_vals = "  ".join(
            f"{entry['scores'].get(d, 0.0):5.2f}" for d in dims
        )
        bar = "█" * int(comp * 20) + "░" * (20 - int(comp * 20))
        print(f"  {i:>2d}  {label:<16s}  {comp:5.3f}  {dim_vals}  {bar}")
    print(f"{'='*72}\n")
 def print_trajectory(result: OptimizationResult) -> None:
    """Print an ASCII chart of composite score across iterations."""
    scored = [h for h in result.history if not h.get("error")]
    if len(scored) < 2:
        print("  (Not enough data points for trajectory chart)\n")
        return
    # Get the best composite per iteration
    iter_best: dict[int, float] = {}
    for h in scored:
        it = h["iteration"]
        if it not in iter_best or h["composite"] > iter_best[it]:
            iter_best[it] = h["composite"]
    iterations = sorted(iter_best.keys())
    values = [iter_best[it] for it in iterations]
    # Chart dimensions
    chart_height = 15
    min_val = max(0.0, min(values) - 0.05)
    max_val = min(1.0, max(values) + 0.05)
    val_range = max_val - min_val
    if val_range < 0.01:
        val_range = 0.1
        min_val = max(0.0, values[0] - 0.05)
        max_val = min_val + val_range
    print(f"  {'─'*50}")
    print("  SCORE TRAJECTORY — Best Composite per Iteration")
    print(f"  {'─'*50}")
    print()
    # Render rows top to bottom
    for row in range(chart_height, -1, -1):
        threshold = min_val + (row / chart_height) * val_range
        # Y-axis label every 5 rows
        if row % 5 == 0:
            label = f"{threshold:.2f}"
        else:
            label = "     "
        line = f"  {label} │"
        for vi, val in enumerate(values):
            normalized = (val - min_val) / val_range
            filled_rows = int(normalized * chart_height)
            if filled_rows >= row:
                line += " ● "
            else:
                line += " · "
        print(line)
    # X-axis
    print(f"  ───── ┼{'───' * len(values)}")
    x_labels = "  " + "      "
    for it in iterations:
        x_labels += f"{it:>2d} "
    print(x_labels)
    print("        " + "  iteration →")
    print()
 def write_results_json(
    result: OptimizationResult,
    output_dir: str,
    stage: int,
    iterations: int,
    variants_per_iter: int,
    fixture_path: str,
 ) -> str:
    """Write optimization results to a timestamped JSON file. Returns the path."""
    out_path = Path(output_dir)
    out_path.mkdir(parents=True, exist_ok=True)
    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    filename = f"optimize_stage{stage}_{timestamp}.json"
    filepath = out_path / filename
    dims = STAGE_CONFIGS[stage].dimensions if stage in STAGE_CONFIGS else DIMENSIONS
    payload = {
        "config": {
            "stage": stage,
            "iterations": iterations,
            "variants_per_iter": variants_per_iter,
            "fixture_path": fixture_path,
        },
        "best_prompt": result.best_prompt,
        "best_scores": {
            "composite": result.best_score.composite,
            **{d: result.best_score.scores.get(d, 0.0) for d in dims},
        },
        "elapsed_seconds": result.elapsed_seconds,
        "history": result.history,
    }
    filepath.write_text(json.dumps(payload, indent=2), encoding="utf-8")
    return str(filepath)
 # ── CLI ──────────────────────────────────────────────────────────────────────
 def main() -> int:
    parser = argparse.ArgumentParser(
        prog="pipeline.quality",
        description="FYN-LLM quality assurance toolkit",
    )
    sub = parser.add_subparsers(dest="command")
    # -- fitness subcommand --
    sub.add_parser("fitness", help="Run LLM fitness tests across four categories")
    # -- score subcommand --
    score_parser = sub.add_parser(
        "score",
        help="Score a Stage 5 technique page across 5 quality dimensions",
    )
    source_group = score_parser.add_mutually_exclusive_group(required=True)
    source_group.add_argument(
        "--file",
        type=str,
        help="Path to a moments JSON file (creator_name, moments array)",
    )
    source_group.add_argument(
        "--slug",
        type=str,
        help="Technique slug to load from the database",
    )
    score_parser.add_argument(
        "--voice-level",
        type=float,
        default=None,
        help="Voice preservation dial (0.0=clinical, 1.0=maximum voice). Triggers re-synthesis before scoring.",
    )
    # -- optimize subcommand --
    opt_parser = sub.add_parser(
        "optimize",
        help="Automated prompt optimization loop with leaderboard output",
    )
    # -- apply subcommand --
    apply_parser = sub.add_parser(
        "apply",
        help="Apply a winning prompt from optimization results to the stage's prompt file",
    )
    apply_parser.add_argument(
        "results_file",
        type=str,
        help="Path to an optimization results JSON file",
    )
    apply_parser.add_argument(
        "--dry-run",
        action="store_true",
        default=False,
        help="Show what would change without writing",
    )
    opt_parser.add_argument(
        "--stage",
        type=int,
        default=5,
        help="Pipeline stage to optimize (default: 5)",
    )
    opt_parser.add_argument(
        "--iterations",
        type=int,
        default=10,
        help="Number of optimization iterations (default: 10)",
    )
    opt_parser.add_argument(
        "--variants-per-iter",
        type=int,
        default=2,
        help="Variants generated per iteration (default: 2)",
    )
    opt_source = opt_parser.add_mutually_exclusive_group(required=True)
    opt_source.add_argument(
        "--file",
        type=str,
        help="Path to moments JSON fixture file",
    )
    opt_source.add_argument(
        "--video-id",
        type=str,
        help="Video UUID — exports fixture from DB automatically (requires DATABASE_URL, REDIS_URL)",
    )
    opt_parser.add_argument(
        "--output-dir",
        type=str,
        default="backend/pipeline/quality/results/",
        help="Directory to write result JSON (default: backend/pipeline/quality/results/)",
    )
    opt_parser.add_argument(
        "--apply",
        action="store_true",
        default=False,
        help="Write the winning prompt back to the stage's prompt file (backs up the original first)",
    )
    # -- chat_eval subcommand --
    chat_parser = sub.add_parser(
        "chat_eval",
        help="Evaluate chat quality across a test suite of queries",
    )
    chat_parser.add_argument(
        "--suite",
        type=str,
        required=True,
        help="Path to a chat test suite YAML/JSON file",
    )
    chat_parser.add_argument(
        "--base-url",
        type=str,
        default="http://localhost:8096",
        help="Chat API base URL (default: http://localhost:8096)",
    )
    chat_parser.add_argument(
        "--output",
        type=str,
        default="backend/pipeline/quality/results/",
        help="Output path for results JSON (default: backend/pipeline/quality/results/)",
    )
    chat_parser.add_argument(
        "--timeout",
        type=float,
        default=120.0,
        help="Request timeout in seconds (default: 120)",
    )
    args = parser.parse_args()
    if args.command is None:
        parser.print_help()
        return 1
    if args.command == "fitness":
        settings = get_settings()
        client = LLMClient(settings)
        runner = FitnessRunner(client)
        return runner.run_all()
    if args.command == "score":
        return _run_score(args)
    if args.command == "optimize":
        return _run_optimize(args)
    if args.command == "apply":
        return _run_apply(args)
    if args.command == "chat_eval":
        return _run_chat_eval(args)
    return 0
 def _run_score(args: argparse.Namespace) -> int:
    """Execute the score subcommand."""
    # -- Load source data --
    if args.slug:
        print("DB loading not yet implemented", file=sys.stderr)
        return 1
    try:
        with open(args.file) as f:
            data = json.load(f)
    except FileNotFoundError:
        print(f"File not found: {args.file}", file=sys.stderr)
        return 1
    except json.JSONDecodeError as exc:
        print(f"Invalid JSON in {args.file}: {exc}", file=sys.stderr)
        return 1
    moments = data.get("moments", [])
    creator_name = data.get("creator_name", "Unknown")
    if not moments:
        print("No moments found in input file", file=sys.stderr)
        return 1
    settings = get_settings()
    client = LLMClient(settings)
    runner = ScoreRunner(client)
    # -- Voice-level mode: re-synthesize then score --
    if args.voice_level is not None:
        voice_level = args.voice_level
        if not (0.0 <= voice_level <= 1.0):
            print("--voice-level must be between 0.0 and 1.0", file=sys.stderr)
            return 1
        print(f"\nRe-synthesizing + scoring for '{creator_name}' ({len(moments)} moments, voice_level={voice_level})...")
        result = runner.synthesize_and_score(moments, creator_name, voice_level)
        if result.error:
            runner.print_report(result)
            return 1
        runner.print_report(result)
        return 0
    # -- Standard mode: build page stub from moments, score directly --
    page_json = {
        "title": f"{creator_name} — Technique Page",
        "creator_name": creator_name,
        "summary": f"Technique page synthesized from {len(moments)} key moments.",
        "body_sections": [
            {
                "heading": m.get("topic_tags", ["Technique"])[0] if m.get("topic_tags") else "Technique",
                "content": m.get("summary", "") + "\n\n" + m.get("transcript_excerpt", ""),
            }
            for m in moments
        ],
    }
    print(f"\nScoring page for '{creator_name}' ({len(moments)} moments)...")
    result = runner.score_page(page_json, moments)
    if result.error:
        runner.print_report(result)
        return 1
    runner.print_report(result)
    return 0
 def _run_optimize(args: argparse.Namespace) -> int:
    """Execute the optimize subcommand."""
    # Stage validation — stages 2-5 are supported
    if args.stage not in STAGE_CONFIGS:
        print(
            f"Error: unsupported stage {args.stage}. Valid stages: {sorted(STAGE_CONFIGS)}",
            file=sys.stderr,
        )
        return 1
    # Resolve fixture: either from --file or auto-export from --video-id
    fixture_path: str
    if args.file:
        fixture_path = args.file
    else:
        # Auto-export from database
        print(f"\n[OPTIMIZE] Exporting fixture from video_id={args.video_id}...", file=sys.stderr)
        import tempfile
        from pipeline.export_fixture import export_fixture
        settings = get_settings()
        tmp = tempfile.NamedTemporaryFile(suffix=".json", prefix="optimize_fixture_", delete=False)
        tmp.close()
        exit_code = export_fixture(
            database_url=settings.database_url,
            redis_url=settings.redis_url,
            video_id=args.video_id,
            output_path=tmp.name,
        )
        if exit_code != 0:
            print(f"Error: fixture export failed (exit code {exit_code})", file=sys.stderr)
            return 1
        fixture_path = tmp.name
        print(f"[OPTIMIZE] Fixture exported to: {fixture_path}", file=sys.stderr)
    fixture = Path(fixture_path)
    if not fixture.exists():
        print(f"Error: fixture file not found: {fixture_path}", file=sys.stderr)
        return 1
    # Ensure output dir
    Path(args.output_dir).mkdir(parents=True, exist_ok=True)
    settings = get_settings()
    client = LLMClient(settings)
    loop = OptimizationLoop(
        client=client,
        stage=args.stage,
        fixture_path=fixture_path,
        iterations=args.iterations,
        variants_per_iter=args.variants_per_iter,
        output_dir=args.output_dir,
    )
    try:
        result = loop.run()
    except KeyboardInterrupt:
        print("\n  Optimization interrupted by user.", file=sys.stderr)
        return 130
    except Exception as exc:
        print(f"\nError: optimization failed: {exc}", file=sys.stderr)
        return 1
    # If the loop returned an error on baseline, report and exit
    if result.best_score.error and not result.history:
        print(f"\nError: {result.best_score.error}", file=sys.stderr)
        return 1
    # Reporting
    print_leaderboard(result, stage=args.stage)
    print_trajectory(result)
    # Write results JSON
    try:
        json_path = write_results_json(
            result=result,
            output_dir=args.output_dir,
            stage=args.stage,
            iterations=args.iterations,
            variants_per_iter=args.variants_per_iter,
            fixture_path=fixture_path,
        )
        print(f"  Results written to: {json_path}")
    except OSError as exc:
        print(f"  Warning: failed to write results JSON: {exc}", file=sys.stderr)
    # Apply winning prompt if requested
    if args.apply:
        baseline_composite = 0.0
        for h in result.history:
            if h.get("label") == "baseline" and not h.get("error"):
                baseline_composite = h["composite"]
                break
        if result.best_score.composite <= baseline_composite:
            print("\n  --apply: Best prompt did not beat baseline — skipping apply.")
        elif result.best_score.error:
            print("\n  --apply: Best result has an error — skipping apply.")
        else:
            print("\n  --apply: Winning prompt beat baseline — applying...")
            success, msg = apply_prompt(args.stage, result.best_prompt)
            print(f"  {msg}")
            if not success:
                return 1
    return 0
 def apply_prompt(stage: int, new_prompt: str, dry_run: bool = False) -> tuple[bool, str]:
    """Apply a new prompt to a stage's prompt file. Returns (success, message).
    Creates a timestamped backup of the original before overwriting.
    """
    if stage not in STAGE_CONFIGS:
        return False, f"Unsupported stage {stage}. Valid: {sorted(STAGE_CONFIGS)}"
    config = STAGE_CONFIGS[stage]
    settings = get_settings()
    prompt_path = Path(settings.prompts_path) / config.prompt_file
    if not prompt_path.exists():
        return False, f"Prompt file not found: {prompt_path}"
    original = prompt_path.read_text(encoding="utf-8")
    if original.strip() == new_prompt.strip():
        return True, "No change — winning prompt is identical to current prompt."
    # Show diff summary
    orig_lines = original.strip().splitlines()
    new_lines = new_prompt.strip().splitlines()
    print(f"\n  Prompt file: {prompt_path}")
    print(f"  Original: {len(orig_lines)} lines, {len(original)} chars")
    print(f"  New:      {len(new_lines)} lines, {len(new_prompt)} chars")
    # Simple line-level diff summary
    import difflib
    diff = list(difflib.unified_diff(orig_lines, new_lines, lineterm="", n=2))
    added = sum(1 for l in diff if l.startswith("+") and not l.startswith("+++"))
    removed = sum(1 for l in diff if l.startswith("-") and not l.startswith("---"))
    print(f"  Changes:  +{added} lines, -{removed} lines")
    if dry_run:
        print("\n  [DRY RUN] Would write to:", prompt_path)
        if len(diff) <= 40:
            print()
            for line in diff:
                print(f"  {line}")
        else:
            print(f"\n  (diff is {len(diff)} lines — showing first 30)")
            for line in diff[:30]:
                print(f"  {line}")
            print("  ...")
        return True, "Dry run — no files modified."
    # Backup original
    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    backup_path = prompt_path.with_suffix(f".{timestamp}.bak")
    backup_path.write_text(original, encoding="utf-8")
    print(f"  Backup:   {backup_path}")
    # Write new prompt
    prompt_path.write_text(new_prompt, encoding="utf-8")
    print(f"  ✓ Written: {prompt_path}")
    return True, f"Prompt applied. Backup at {backup_path}"
 def _run_apply(args: argparse.Namespace) -> int:
    """Execute the apply subcommand — read a results JSON and apply the winning prompt."""
    results_path = Path(args.results_file)
    if not results_path.exists():
        print(f"Error: results file not found: {args.results_file}", file=sys.stderr)
        return 1
    try:
        data = json.loads(results_path.read_text(encoding="utf-8"))
    except json.JSONDecodeError as exc:
        print(f"Error: invalid JSON in {args.results_file}: {exc}", file=sys.stderr)
        return 1
    stage = data.get("config", {}).get("stage")
    best_prompt = data.get("best_prompt", "")
    best_scores = data.get("best_scores", {})
    if not stage:
        print("Error: results JSON missing config.stage", file=sys.stderr)
        return 1
    if not best_prompt:
        print("Error: results JSON missing best_prompt or it's empty", file=sys.stderr)
        return 1
    composite = best_scores.get("composite", 0)
    print(f"\n  Applying results from: {results_path}")
    print(f"  Stage: {stage}")
    print(f"  Best composite score: {composite:.3f}")
    success, msg = apply_prompt(stage, best_prompt, dry_run=args.dry_run)
    print(f"\n  {msg}")
    return 0 if success else 1
 def _run_chat_eval(args: argparse.Namespace) -> int:
    """Execute the chat_eval subcommand — evaluate chat quality across a test suite."""
    suite_path = Path(args.suite)
    if not suite_path.exists():
        print(f"Error: suite file not found: {args.suite}", file=sys.stderr)
        return 1
    # Load test cases
    try:
        cases = ChatEvalRunner.load_suite(suite_path)
    except Exception as exc:
        print(f"Error loading test suite: {exc}", file=sys.stderr)
        return 1
    if not cases:
        print("Error: test suite contains no queries", file=sys.stderr)
        return 1
    print(f"\n  Chat Evaluation: {len(cases)} queries from {suite_path}")
    print(f"  Endpoint: {args.base_url}")
    # Build scorer and runner
    settings = get_settings()
    client = LLMClient(settings)
    scorer = ChatScoreRunner(client)
    runner = ChatEvalRunner(
        scorer=scorer,
        base_url=args.base_url,
        timeout=args.timeout,
    )
    # Execute
    results = runner.run_suite(cases)
    # Print summary
    runner.print_summary(results)
    # Write results
    try:
        json_path = runner.write_results(results, args.output)
        print(f"  Results written to: {json_path}")
    except OSError as exc:
        print(f"  Warning: failed to write results: {exc}", file=sys.stderr)
    # Exit code: 0 if at least one scored, 1 if all errored
    scored = [r for r in results if r.score and not r.score.error and not r.request_error]
    return 0 if scored else 1
 if __name__ == "__main__":
    sys.exit(main())
--- a/backend/pipeline/quality/chat_eval.py
+++ b/backend/pipeline/quality/chat_eval.py
@ -0,0 +1,352 @@
 """Chat evaluation harness — sends queries to the live chat endpoint, scores responses.
 Loads a test suite (YAML or JSON), calls the chat HTTP endpoint for each query,
 parses SSE events to collect response text and sources, then scores each using
 ChatScoreRunner. Writes results to a JSON file.
 Usage:
    python -m pipeline.quality chat_eval --suite fixtures/chat_test_suite.yaml
    python -m pipeline.quality chat_eval --suite fixtures/chat_test_suite.yaml --base-url http://ub01:8096
 """
 from __future__ import annotations
 import json
 import logging
 import time
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
 import httpx
 from pipeline.llm_client import LLMClient
 from pipeline.quality.chat_scorer import CHAT_DIMENSIONS, ChatScoreResult, ChatScoreRunner
 logger = logging.getLogger(__name__)
 _DEFAULT_BASE_URL = "http://localhost:8096"
 _CHAT_ENDPOINT = "/api/chat"
 _REQUEST_TIMEOUT = 120.0  # seconds — LLM streaming can be slow
@dataclass
 class ChatTestCase:
    """A single test case from the test suite."""
    query: str
    creator: str | None = None
    personality_weight: float = 0.0
    category: str = "general"
    description: str = ""
@dataclass
 class ChatEvalResult:
    """Result of evaluating a single test case."""
    test_case: ChatTestCase
    response: str = ""
    sources: list[dict] = field(default_factory=list)
    cascade_tier: str = ""
    score: ChatScoreResult | None = None
    request_error: str | None = None
    latency_seconds: float = 0.0
 class ChatEvalRunner:
    """Runs a chat evaluation suite against a live endpoint."""
    def __init__(
        self,
        scorer: ChatScoreRunner,
        base_url: str = _DEFAULT_BASE_URL,
        timeout: float = _REQUEST_TIMEOUT,
    ) -> None:
        self.scorer = scorer
        self.base_url = base_url.rstrip("/")
        self.timeout = timeout
    @staticmethod
    def load_suite(path: str | Path) -> list[ChatTestCase]:
        """Load test cases from a YAML or JSON file.
        Expected format (YAML):
            queries:
              - query: "How do I sidechain a bass?"
                creator: null
                personality_weight: 0.0
                category: technical
                description: "Basic sidechain compression question"
        """
        filepath = Path(path)
        text = filepath.read_text(encoding="utf-8")
        if filepath.suffix in (".yaml", ".yml"):
            try:
                import yaml
            except ImportError:
                raise ImportError(
                    "PyYAML is required to load YAML test suites. "
                    "Install with: pip install pyyaml"
                )
            data = yaml.safe_load(text)
        else:
            data = json.loads(text)
        queries = data.get("queries", [])
        cases: list[ChatTestCase] = []
        for q in queries:
            cases.append(ChatTestCase(
                query=q["query"],
                creator=q.get("creator"),
                personality_weight=float(q.get("personality_weight", 0.0)),
                category=q.get("category", "general"),
                description=q.get("description", ""),
            ))
        return cases
    def run_suite(self, cases: list[ChatTestCase]) -> list[ChatEvalResult]:
        """Execute all test cases sequentially, scoring each response."""
        results: list[ChatEvalResult] = []
        for i, case in enumerate(cases, 1):
            print(f"\n  [{i}/{len(cases)}] {case.category}: {case.query[:60]}...")
            result = self._run_single(case)
            results.append(result)
            if result.request_error:
                print(f"    ✗ Request error: {result.request_error}")
            elif result.score and result.score.error:
                print(f"    ✗ Scoring error: {result.score.error}")
            elif result.score:
                print(f"    ✓ Composite: {result.score.composite:.3f}  "
                      f"(latency: {result.latency_seconds:.1f}s)")
        return results
    def _run_single(self, case: ChatTestCase) -> ChatEvalResult:
        """Execute a single test case: call endpoint, parse SSE, score."""
        eval_result = ChatEvalResult(test_case=case)
        # Call the chat endpoint
        t0 = time.monotonic()
        try:
            response_text, sources, cascade_tier = self._call_chat_endpoint(case)
            eval_result.latency_seconds = round(time.monotonic() - t0, 2)
        except Exception as exc:
            eval_result.latency_seconds = round(time.monotonic() - t0, 2)
            eval_result.request_error = str(exc)
            logger.error("chat_eval_request_error query=%r error=%s", case.query, exc)
            return eval_result
        eval_result.response = response_text
        eval_result.sources = sources
        eval_result.cascade_tier = cascade_tier
        if not response_text:
            eval_result.request_error = "Empty response from chat endpoint"
            return eval_result
        # Score the response
        eval_result.score = self.scorer.score_response(
            query=case.query,
            response=response_text,
            sources=sources,
            personality_weight=case.personality_weight,
            creator_name=case.creator,
        )
        return eval_result
    def _call_chat_endpoint(
        self, case: ChatTestCase
    ) -> tuple[str, list[dict], str]:
        """Call the chat SSE endpoint and parse the event stream.
        Returns (accumulated_text, sources_list, cascade_tier).
        """
        url = f"{self.base_url}{_CHAT_ENDPOINT}"
        payload: dict[str, Any] = {"query": case.query}
        if case.creator:
            payload["creator"] = case.creator
        if case.personality_weight > 0:
            payload["personality_weight"] = case.personality_weight
        sources: list[dict] = []
        accumulated = ""
        cascade_tier = ""
        with httpx.Client(timeout=self.timeout) as client:
            with client.stream("POST", url, json=payload) as resp:
                resp.raise_for_status()
                buffer = ""
                for chunk in resp.iter_text():
                    buffer += chunk
                    # Parse SSE events from buffer
                    while "\n\n" in buffer:
                        event_block, buffer = buffer.split("\n\n", 1)
                        event_type, event_data = self._parse_sse_event(event_block)
                        if event_type == "sources":
                            sources = event_data if isinstance(event_data, list) else []
                        elif event_type == "token":
                            accumulated += event_data if isinstance(event_data, str) else str(event_data)
                        elif event_type == "done":
                            if isinstance(event_data, dict):
                                cascade_tier = event_data.get("cascade_tier", "")
                        elif event_type == "error":
                            msg = event_data.get("message", str(event_data)) if isinstance(event_data, dict) else str(event_data)
                            raise RuntimeError(f"Chat endpoint returned error: {msg}")
        return accumulated, sources, cascade_tier
    @staticmethod
    def _parse_sse_event(block: str) -> tuple[str, Any]:
        """Parse a single SSE event block into (event_type, data)."""
        event_type = ""
        data_lines: list[str] = []
        for line in block.strip().splitlines():
            if line.startswith("event: "):
                event_type = line[7:].strip()
            elif line.startswith("data: "):
                data_lines.append(line[6:])
            elif line.startswith("data:"):
                data_lines.append(line[5:])
        raw_data = "\n".join(data_lines)
        try:
            parsed = json.loads(raw_data)
        except (json.JSONDecodeError, ValueError):
            parsed = raw_data  # plain text token
        return event_type, parsed
    @staticmethod
    def write_results(
        results: list[ChatEvalResult],
        output_path: str | Path,
    ) -> str:
        """Write evaluation results to a JSON file. Returns the path."""
        out = Path(output_path)
        out.parent.mkdir(parents=True, exist_ok=True)
        timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
        if out.is_dir():
            filepath = out / f"chat_eval_{timestamp}.json"
        else:
            filepath = out
        # Build serializable payload
        entries: list[dict] = []
        for r in results:
            entry: dict[str, Any] = {
                "query": r.test_case.query,
                "creator": r.test_case.creator,
                "personality_weight": r.test_case.personality_weight,
                "category": r.test_case.category,
                "description": r.test_case.description,
                "response_length": len(r.response),
                "source_count": len(r.sources),
                "cascade_tier": r.cascade_tier,
                "latency_seconds": r.latency_seconds,
            }
            if r.request_error:
                entry["error"] = r.request_error
            elif r.score:
                entry["scores"] = r.score.scores
                entry["composite"] = r.score.composite
                entry["justifications"] = r.score.justifications
                entry["scoring_time"] = r.score.elapsed_seconds
                if r.score.error:
                    entry["scoring_error"] = r.score.error
            entries.append(entry)
        # Summary stats
        scored = [e for e in entries if "composite" in e]
        avg_composite = (
            sum(e["composite"] for e in scored) / len(scored) if scored else 0.0
        )
        dim_avgs: dict[str, float] = {}
        for dim in CHAT_DIMENSIONS:
            vals = [e["scores"][dim] for e in scored if dim in e.get("scores", {})]
            dim_avgs[dim] = round(sum(vals) / len(vals), 3) if vals else 0.0
        payload = {
            "timestamp": timestamp,
            "total_queries": len(results),
            "scored_queries": len(scored),
            "errors": len(results) - len(scored),
            "average_composite": round(avg_composite, 3),
            "dimension_averages": dim_avgs,
            "results": entries,
        }
        filepath.write_text(json.dumps(payload, indent=2), encoding="utf-8")
        return str(filepath)
    @staticmethod
    def print_summary(results: list[ChatEvalResult]) -> None:
        """Print a summary table of evaluation results."""
        print("\n" + "=" * 72)
        print("  CHAT EVALUATION SUMMARY")
        print("=" * 72)
        scored = [r for r in results if r.score and not r.score.error and not r.request_error]
        errored = [r for r in results if r.request_error or (r.score and r.score.error)]
        if not scored:
            print("\n  No successfully scored responses.\n")
            if errored:
                print(f"  Errors: {len(errored)}")
                for r in errored:
                    err = r.request_error or (r.score.error if r.score else "unknown")
                    print(f"    - {r.test_case.query[:50]}: {err}")
            print("=" * 72 + "\n")
            return
        # Header
        print(f"\n  {'Category':<12s} {'Query':<30s} {'Comp':>5s} {'Cite':>5s} {'Struct':>6s} {'Domain':>6s} {'Ground':>6s} {'Person':>6s}")
        print(f"  {'─'*12} {'─'*30} {'─'*5} {'─'*5} {'─'*6} {'─'*6} {'─'*6} {'─'*6}")
        for r in scored:
            s = r.score
            assert s is not None
            q = r.test_case.query[:30]
            cat = r.test_case.category[:12]
            print(
                f"  {cat:<12s} {q:<30s} "
                f"{s.composite:5.2f} "
                f"{s.citation_accuracy:5.2f} "
                f"{s.response_structure:6.2f} "
                f"{s.domain_expertise:6.2f} "
                f"{s.source_grounding:6.2f} "
                f"{s.personality_fidelity:6.2f}"
            )
        # Averages
        avg_comp = sum(r.score.composite for r in scored) / len(scored)
        avg_dims = {}
        for dim in CHAT_DIMENSIONS:
            vals = [r.score.scores.get(dim, 0.0) for r in scored]
            avg_dims[dim] = sum(vals) / len(vals)
        print(f"\n  {'AVERAGE':<12s} {'':30s} "
              f"{avg_comp:5.2f} "
              f"{avg_dims['citation_accuracy']:5.2f} "
              f"{avg_dims['response_structure']:6.2f} "
              f"{avg_dims['domain_expertise']:6.2f} "
              f"{avg_dims['source_grounding']:6.2f} "
              f"{avg_dims['personality_fidelity']:6.2f}")
        if errored:
            print(f"\n  Errors: {len(errored)}")
            for r in errored:
                err = r.request_error or (r.score.error if r.score else "unknown")
                print(f"    - {r.test_case.query[:50]}: {err}")
        print("=" * 72 + "\n")
--- a/backend/pipeline/quality/chat_scorer.py
+++ b/backend/pipeline/quality/chat_scorer.py
@ -0,0 +1,271 @@
 """Chat-specific quality scorer — LLM-as-judge evaluation for chat responses.
 Scores chat responses across 5 dimensions:
 - citation_accuracy: Are citations real and correctly numbered?
 - response_structure: Concise, well-organized, uses appropriate formatting?
 - domain_expertise: Music production terminology used naturally?
 - source_grounding: Claims backed by provided sources, no fabrication?
 - personality_fidelity: At weight>0, response reflects creator voice?
 Run via: python -m pipeline.quality chat_eval --suite <path>
 """
 from __future__ import annotations
 import json
 import logging
 import time
 from dataclasses import dataclass, field
 import openai
 from pipeline.llm_client import LLMClient
 logger = logging.getLogger(__name__)
 CHAT_DIMENSIONS = [
    "citation_accuracy",
    "response_structure",
    "domain_expertise",
    "source_grounding",
    "personality_fidelity",
 ]
 CHAT_RUBRIC = """\
 You are an expert evaluator of AI chat response quality for a music production knowledge base.
 You will be given:
 1. The user's query
 2. The assistant's response
 3. The numbered source citations that were provided to the assistant
 4. The personality_weight (0.0 = encyclopedic, >0 = creator voice expected)
 5. The creator_name (if any)
 Evaluate the response across these 5 dimensions, scoring each 0.0 to 1.0:
 **citation_accuracy** — Citations are real, correctly numbered, and point to relevant sources
 - 0.9-1.0: Every [N] citation references a real source number, citations are placed next to the claim they support, no phantom citations
 - 0.5-0.7: Most citations are valid but some are misplaced or reference non-existent source numbers
 - 0.0-0.3: Many phantom citations, wrong numbers, or citations placed randomly without connection to claims
 **response_structure** — Response is concise, well-organized, uses appropriate formatting
 - 0.9-1.0: Clear paragraphs, uses bullet lists for steps/lists, bold for key terms, appropriate length (not padded)
 - 0.5-0.7: Readable but could be better organized — wall of text, missing formatting where it would help
 - 0.0-0.3: Disorganized, excessively long or too terse, no formatting, hard to scan
 **domain_expertise** — Music production terminology used naturally and correctly
 - 0.9-1.0: Uses correct audio/synth/mixing terminology, explains technical terms when appropriate, sounds like a knowledgeable producer
 - 0.5-0.7: Generally correct but some terminology is vague ("adjust the sound" vs "shape the transient") or misused
 - 0.0-0.3: Generic language, avoids domain terminology, or uses terms incorrectly
 **source_grounding** — Claims are backed by provided sources, no fabrication
 - 0.9-1.0: Every factual claim traces to a provided source, no invented details (plugin names, settings, frequencies not in sources)
 - 0.5-0.7: Mostly grounded but 1-2 claims seem embellished or not directly from sources
 - 0.0-0.3: Contains hallucinated specifics — settings, plugin names, or techniques not present in any source
 **personality_fidelity** — When personality_weight > 0, response reflects the creator's voice proportional to the weight
 - If personality_weight == 0: Score based on neutral encyclopedic tone (should NOT show personality). Neutral, informative = 1.0. Forced personality = 0.5.
 - If personality_weight > 0 and personality_weight < 0.5: Subtle personality hints expected. Score higher if tone is lightly flavored but still mainly encyclopedic.
 - If personality_weight >= 0.5: Clear creator voice expected. Score higher for signature phrases, teaching style, energy matching the named creator.
 - If no creator_name is provided: Score 1.0 if response is neutral/encyclopedic, lower if it adopts an unexplained persona.
 Return ONLY a JSON object with this exact structure:
 {
  "citation_accuracy": <float 0.0-1.0>,
  "response_structure": <float 0.0-1.0>,
  "domain_expertise": <float 0.0-1.0>,
  "source_grounding": <float 0.0-1.0>,
  "personality_fidelity": <float 0.0-1.0>,
  "justifications": {
    "citation_accuracy": "<1-2 sentence justification>",
    "response_structure": "<1-2 sentence justification>",
    "domain_expertise": "<1-2 sentence justification>",
    "source_grounding": "<1-2 sentence justification>",
    "personality_fidelity": "<1-2 sentence justification>"
  }
 }
 """
@dataclass
 class ChatScoreResult:
    """Outcome of scoring a chat response across quality dimensions."""
    scores: dict[str, float] = field(default_factory=dict)
    composite: float = 0.0
    justifications: dict[str, str] = field(default_factory=dict)
    elapsed_seconds: float = 0.0
    error: str | None = None
    # Convenience properties
    @property
    def citation_accuracy(self) -> float:
        return self.scores.get("citation_accuracy", 0.0)
    @property
    def response_structure(self) -> float:
        return self.scores.get("response_structure", 0.0)
    @property
    def domain_expertise(self) -> float:
        return self.scores.get("domain_expertise", 0.0)
    @property
    def source_grounding(self) -> float:
        return self.scores.get("source_grounding", 0.0)
    @property
    def personality_fidelity(self) -> float:
        return self.scores.get("personality_fidelity", 0.0)
 class ChatScoreRunner:
    """Scores chat responses using LLM-as-judge evaluation."""
    def __init__(self, client: LLMClient) -> None:
        self.client = client
    def score_response(
        self,
        query: str,
        response: str,
        sources: list[dict],
        personality_weight: float = 0.0,
        creator_name: str | None = None,
    ) -> ChatScoreResult:
        """Score a single chat response against the 5 chat quality dimensions.
        Parameters
        ----------
        query:
            The user's original query.
        response:
            The assistant's accumulated response text.
        sources:
            List of source citation dicts (as emitted by the SSE sources event).
        personality_weight:
            0.0 = encyclopedic mode, >0 = personality mode.
        creator_name:
            Creator name, if this was a creator-scoped query.
        Returns
        -------
        ChatScoreResult with per-dimension scores.
        """
        sources_block = json.dumps(sources, indent=2) if sources else "(no sources)"
        user_prompt = (
            f"## User Query\n\n{query}\n\n"
            f"## Assistant Response\n\n{response}\n\n"
            f"## Sources Provided\n\n```json\n{sources_block}\n```\n\n"
            f"## Metadata\n\n"
            f"- personality_weight: {personality_weight}\n"
            f"- creator_name: {creator_name or '(none)'}\n\n"
            f"Score this chat response across all 5 dimensions."
        )
        t0 = time.monotonic()
        try:
            from pydantic import BaseModel as _BM
            resp = self.client.complete(
                system_prompt=CHAT_RUBRIC,
                user_prompt=user_prompt,
                response_model=_BM,
                modality="chat",
            )
            elapsed = round(time.monotonic() - t0, 2)
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            elapsed = round(time.monotonic() - t0, 2)
            return ChatScoreResult(
                elapsed_seconds=elapsed,
                error=f"Cannot reach LLM judge. Error: {exc}",
            )
        raw_text = str(resp).strip()
        try:
            parsed = json.loads(raw_text)
        except json.JSONDecodeError:
            logger.error("Malformed chat judge response (not JSON): %.300s", raw_text)
            return ChatScoreResult(
                elapsed_seconds=elapsed,
                error=f"Malformed judge response. Raw excerpt: {raw_text[:200]}",
            )
        return self._parse_scores(parsed, elapsed)
    def _parse_scores(self, parsed: dict, elapsed: float) -> ChatScoreResult:
        """Extract and validate scores from parsed JSON judge response."""
        scores: dict[str, float] = {}
        justifications: dict[str, str] = {}
        raw_justifications = parsed.get("justifications", {})
        if not isinstance(raw_justifications, dict):
            raw_justifications = {}
        for dim in CHAT_DIMENSIONS:
            raw = parsed.get(dim)
            if raw is None:
                logger.warning("Missing dimension '%s' in chat judge response", dim)
                scores[dim] = 0.0
                justifications[dim] = "(missing from judge response)"
                continue
            try:
                val = float(raw)
                scores[dim] = max(0.0, min(1.0, val))
            except (TypeError, ValueError):
                logger.warning("Invalid value for '%s': %r", dim, raw)
                scores[dim] = 0.0
                justifications[dim] = f"(invalid value: {raw!r})"
                continue
            justifications[dim] = str(raw_justifications.get(dim, ""))
        composite = sum(scores.values()) / len(CHAT_DIMENSIONS) if CHAT_DIMENSIONS else 0.0
        return ChatScoreResult(
            scores=scores,
            composite=round(composite, 3),
            justifications=justifications,
            elapsed_seconds=elapsed,
        )
    def print_report(self, result: ChatScoreResult, query: str = "") -> None:
        """Print a formatted chat scoring report to stdout."""
        print("\n" + "=" * 60)
        print("  CHAT QUALITY SCORE REPORT")
        if query:
            print(f"  Query: {query[:60]}{'...' if len(query) > 60 else ''}")
        print("=" * 60)
        if result.error:
            print(f"\n  ✗ Error: {result.error}\n")
            print("=" * 60 + "\n")
            return
        for dim in CHAT_DIMENSIONS:
            score = result.scores.get(dim, 0.0)
            filled = int(score * 20)
            bar = "█" * filled + "░" * (20 - filled)
            justification = result.justifications.get(dim, "")
            print(f"\n  {dim.replace('_', ' ').title()}")
            print(f"    Score: {score:.2f}  {bar}")
            if justification:
                # Simple word wrap at ~56 chars
                words = justification.split()
                lines: list[str] = []
                current = ""
                for word in words:
                    if current and len(current) + len(word) + 1 > 56:
                        lines.append(current)
                        current = word
                    else:
                        current = f"{current} {word}" if current else word
                if current:
                    lines.append(current)
                for line in lines:
                    print(f"    {line}")
        print("\n" + "-" * 60)
        print(f"  Composite: {result.composite:.3f}")
        print(f"  Time: {result.elapsed_seconds}s")
        print("=" * 60 + "\n")
--- a/backend/pipeline/quality/fitness.py
+++ b/backend/pipeline/quality/fitness.py
@ -0,0 +1,489 @@
 """FYN-LLM fitness test runner.
 Tests four categories:
 1. Mandelbrot reasoning — factual knowledge / reasoning depth
 2. JSON compliance — simple and nested structured output
 3. Instruction following — bullet count, keyword inclusion, casing
 4. Diverse prompt battery — summarization, classification, extraction
 """
 from __future__ import annotations
 import json
 import logging
 import time
 from dataclasses import dataclass, field
 import openai
 from pydantic import BaseModel
 from pipeline.llm_client import LLMClient
 logger = logging.getLogger(__name__)
 # ── Result types ─────────────────────────────────────────────────────────────
@dataclass
 class TestResult:
    """Outcome of a single fitness test."""
    name: str
    passed: bool
    elapsed_seconds: float
    token_count: int | None = None
    detail: str = ""
@dataclass
 class CategoryReport:
    """Results for one test category."""
    category: str
    results: list[TestResult] = field(default_factory=list)
    @property
    def all_passed(self) -> bool:
        return all(r.passed for r in self.results)
 # ── Pydantic models for JSON compliance tests ────────────────────────────────
 class SimpleItem(BaseModel):
    name: str
    count: int
 class Address(BaseModel):
    street: str
    city: str
    zip_code: str
 class PersonWithAddress(BaseModel):
    name: str
    age: int
    address: Address
 # ── Runner ───────────────────────────────────────────────────────────────────
 class FitnessRunner:
    """Runs all fitness tests against the configured LLM endpoint."""
    def __init__(self, client: LLMClient) -> None:
        self.client = client
    # ── Public entry point ───────────────────────────────────────────────
    def run_all(self) -> int:
        """Run all fitness tests, print report, return exit code (0=pass, 1=fail)."""
        categories: list[CategoryReport] = []
        # Connectivity pre-check — fail fast with a clear message
        try:
            self._probe_connectivity()
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            url = self.client.settings.llm_api_url
            fallback = self.client.settings.llm_fallback_url
            print(
                f"\n✗ Cannot reach LLM endpoint at {url} (fallback {fallback})\n"
                f"  Error: {exc}\n"
            )
            return 1
        categories.append(self._run_mandelbrot())
        categories.append(self._run_json_compliance())
        categories.append(self._run_instruction_following())
        categories.append(self._run_diverse_battery())
        self._print_report(categories)
        return 0 if all(c.all_passed for c in categories) else 1
    # ── Connectivity probe ───────────────────────────────────────────────
    def _probe_connectivity(self) -> None:
        """Quick completion to verify the endpoint is reachable."""
        self.client.complete(
            system_prompt="You are a test probe.",
            user_prompt="Respond with the single word: ok",
        )
    # ── Category 1: Mandelbrot reasoning ─────────────────────────────────
    def _run_mandelbrot(self) -> CategoryReport:
        cat = CategoryReport(category="Mandelbrot Reasoning")
        cat.results.append(self._test_mandelbrot())
        return cat
    def _test_mandelbrot(self) -> TestResult:
        name = "mandelbrot_area_knowledge"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="You are a mathematics expert. Answer precisely and concisely.",
                user_prompt=(
                    "What is the approximate area of the Mandelbrot set? "
                    "Include the numerical value and mention whether the exact area is known."
                ),
                modality="thinking",
            )
            elapsed = time.monotonic() - t0
            text = resp.lower()
            # Check for key concepts
            has_area = any(kw in text for kw in ["1.506", "1.507", "1.50659"])
            has_uncertainty = any(
                kw in text
                for kw in ["not exactly known", "not known exactly", "approximate", "estimated", "conjecture"]
            )
            passed = has_area and has_uncertainty
            detail = "" if passed else f"Missing: area={has_area}, uncertainty={has_uncertainty}. Response: {resp[:200]}"
            return TestResult(
                name=name,
                passed=passed,
                elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name,
                passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    # ── Category 2: JSON compliance ──────────────────────────────────────
    def _run_json_compliance(self) -> CategoryReport:
        cat = CategoryReport(category="JSON Compliance")
        cat.results.append(self._test_json_simple())
        cat.results.append(self._test_json_nested())
        return cat
    def _test_json_simple(self) -> TestResult:
        name = "json_simple_object"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="You are a JSON generator. Output ONLY valid JSON, nothing else.",
                user_prompt=(
                    'Generate a JSON object with exactly two keys: "name" (a string) '
                    'and "count" (an integer). Example structure: {"name": "...", "count": N}'
                ),
                response_model=SimpleItem,
                modality="chat",
            )
            elapsed = time.monotonic() - t0
            return self._validate_json(name, resp, SimpleItem, elapsed)
        except Exception as exc:
            return TestResult(
                name=name,
                passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _test_json_nested(self) -> TestResult:
        name = "json_nested_object"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="You are a JSON generator. Output ONLY valid JSON, nothing else.",
                user_prompt=(
                    'Generate a JSON object with keys "name" (string), "age" (integer), '
                    'and "address" (object with "street", "city", "zip_code" string fields).'
                ),
                response_model=PersonWithAddress,
                modality="chat",
            )
            elapsed = time.monotonic() - t0
            return self._validate_json(name, resp, PersonWithAddress, elapsed)
        except Exception as exc:
            return TestResult(
                name=name,
                passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _validate_json(
        self,
        name: str,
        resp: str,
        model: type[BaseModel],
        elapsed: float,
    ) -> TestResult:
        """Parse response as JSON, validate against Pydantic model."""
        text = str(resp).strip()
        if not text:
            return TestResult(
                name=name, passed=False, elapsed_seconds=round(elapsed, 2),
                token_count=getattr(resp, "completion_tokens", None),
                detail="Empty response from LLM",
            )
        try:
            parsed = json.loads(text)
        except json.JSONDecodeError as exc:
            return TestResult(
                name=name, passed=False, elapsed_seconds=round(elapsed, 2),
                token_count=getattr(resp, "completion_tokens", None),
                detail=f"Invalid JSON: {exc}. Raw: {text[:200]}",
            )
        try:
            model.model_validate(parsed)
        except Exception as exc:
            return TestResult(
                name=name, passed=False, elapsed_seconds=round(elapsed, 2),
                token_count=getattr(resp, "completion_tokens", None),
                detail=f"Schema validation failed: {exc}",
            )
        return TestResult(
            name=name, passed=True, elapsed_seconds=round(elapsed, 2),
            token_count=getattr(resp, "completion_tokens", None),
        )
    # ── Category 3: Instruction following ────────────────────────────────
    def _run_instruction_following(self) -> CategoryReport:
        cat = CategoryReport(category="Instruction Following")
        cat.results.append(self._test_bullet_count())
        cat.results.append(self._test_keyword_inclusion())
        cat.results.append(self._test_lowercase_only())
        return cat
    def _test_bullet_count(self) -> TestResult:
        name = "instruction_bullet_count"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="Follow instructions exactly.",
                user_prompt="List exactly 3 benefits of exercise. Use bullet points starting with '- '.",
            )
            elapsed = time.monotonic() - t0
            lines = [l.strip() for l in str(resp).strip().splitlines() if l.strip().startswith("- ")]
            passed = len(lines) == 3
            detail = "" if passed else f"Expected 3 bullets, got {len(lines)}: {str(resp)[:200]}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _test_keyword_inclusion(self) -> TestResult:
        name = "instruction_keyword_inclusion"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="Follow instructions exactly.",
                user_prompt=(
                    "Write one sentence about the weather. "
                    'You MUST include the word "elephant" somewhere in your sentence.'
                ),
            )
            elapsed = time.monotonic() - t0
            passed = "elephant" in str(resp).lower()
            detail = "" if passed else f"Missing keyword 'elephant'. Response: {str(resp)[:200]}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _test_lowercase_only(self) -> TestResult:
        name = "instruction_lowercase_only"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="Follow instructions exactly.",
                user_prompt=(
                    "Write a short sentence about the ocean. "
                    "Use ONLY lowercase letters — no uppercase at all, not even at the start."
                ),
            )
            elapsed = time.monotonic() - t0
            text = str(resp).strip()
            # Allow non-alpha chars (punctuation, spaces, numbers) but no uppercase letters
            has_upper = any(c.isupper() for c in text)
            passed = not has_upper and len(text) > 5
            detail = "" if passed else f"Contains uppercase or too short. Response: {text[:200]}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    # ── Category 4: Diverse prompt battery ───────────────────────────────
    def _run_diverse_battery(self) -> CategoryReport:
        cat = CategoryReport(category="Diverse Prompt Battery")
        cat.results.append(self._test_summarization())
        cat.results.append(self._test_classification())
        cat.results.append(self._test_extraction())
        return cat
    def _test_summarization(self) -> TestResult:
        name = "battery_summarization"
        paragraph = (
            "The James Webb Space Telescope (JWST) is the largest optical telescope in space. "
            "Launched in December 2021, it is designed to conduct infrared astronomy. Its high "
            "resolution and sensitivity allow it to view objects too old and distant for the Hubble "
            "Space Telescope. Among its goals are observing the first stars and the formation of "
            "the first galaxies, and detailed atmospheric characterization of exoplanets."
        )
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="You are a concise summarizer.",
                user_prompt=f"Summarize the following in exactly 2 sentences:\n\n{paragraph}",
            )
            elapsed = time.monotonic() - t0
            text = str(resp).strip()
            # Rough sentence count: split on period followed by space or end
            sentences = [s.strip() for s in text.replace("! ", ". ").split(". ") if s.strip()]
            # Be generous: 1-3 sentences is acceptable
            passed = 1 <= len(sentences) <= 3 and len(text) > 20
            detail = "" if passed else f"Expected ~2 sentences, got {len(sentences)}. Response: {text[:200]}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _test_classification(self) -> TestResult:
        name = "battery_classification"
        categories = ["technology", "sports", "politics", "science", "entertainment"]
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt=(
                    "You are a text classifier. Respond with ONLY one word from the given categories."
                ),
                user_prompt=(
                    f"Classify the following text into one of these categories: {', '.join(categories)}\n\n"
                    "Text: \"NASA's Perseverance rover has discovered organic molecules on Mars, "
                    "suggesting the planet may have once harbored microbial life.\"\n\n"
                    "Category:"
                ),
            )
            elapsed = time.monotonic() - t0
            answer = str(resp).strip().lower().rstrip(".")
            passed = answer in categories
            detail = "" if passed else f"Response '{answer}' not in {categories}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=resp.completion_tokens,
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    def _test_extraction(self) -> TestResult:
        name = "battery_extraction"
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt="You are a data extractor. Output ONLY valid JSON, nothing else.",
                user_prompt=(
                    "Extract the following fields as a JSON object: "
                    '"event_name", "date", "location"\n\n'
                    "Text: \"The annual Tech Summit 2026 will be held on March 15, 2026 "
                    'in San Francisco, California."\n\n'
                    "JSON:"
                ),
                response_model=BaseModel,  # triggers json mode
                modality="chat",
            )
            elapsed = time.monotonic() - t0
            text = str(resp).strip()
            if not text:
                return TestResult(
                    name=name, passed=False, elapsed_seconds=round(elapsed, 2),
                    token_count=getattr(resp, "completion_tokens", None),
                    detail="Empty response from LLM",
                )
            try:
                parsed = json.loads(text)
            except json.JSONDecodeError as exc:
                return TestResult(
                    name=name, passed=False, elapsed_seconds=round(elapsed, 2),
                    token_count=getattr(resp, "completion_tokens", None),
                    detail=f"Invalid JSON: {exc}. Raw: {text[:200]}",
                )
            required_keys = {"event_name", "date", "location"}
            present = set(parsed.keys()) & required_keys
            passed = present == required_keys
            detail = "" if passed else f"Missing keys: {required_keys - present}"
            return TestResult(
                name=name, passed=passed, elapsed_seconds=round(elapsed, 2),
                token_count=getattr(resp, "completion_tokens", None),
                detail=detail,
            )
        except Exception as exc:
            return TestResult(
                name=name, passed=False,
                elapsed_seconds=round(time.monotonic() - t0, 2),
                detail=f"Exception: {exc}",
            )
    # ── Report formatting ────────────────────────────────────────────────
    def _print_report(self, categories: list[CategoryReport]) -> None:
        """Print a formatted pass/fail report to stdout."""
        total = 0
        passed_count = 0
        print("\n" + "=" * 60)
        print("  FYN-LLM FITNESS REPORT")
        print("=" * 60)
        for cat in categories:
            status = "✓ PASS" if cat.all_passed else "✗ FAIL"
            print(f"\n  [{status}] {cat.category}")
            for r in cat.results:
                total += 1
                icon = "✓" if r.passed else "✗"
                tokens = f" ({r.token_count} tok)" if r.token_count else ""
                print(f"    {icon} {r.name}  [{r.elapsed_seconds}s{tokens}]")
                if r.detail:
                    # Indent detail lines
                    for line in r.detail.splitlines():
                        print(f"      {line}")
                if r.passed:
                    passed_count += 1
        print("\n" + "-" * 60)
        print(f"  Total: {passed_count}/{total} passed")
        if passed_count == total:
            print("  Result: ✓ ALL PASS")
        else:
            print(f"  Result: ✗ {total - passed_count} FAILED")
        print("=" * 60 + "\n")
--- a/backend/pipeline/quality/fixtures/init.py
+++ b/backend/pipeline/quality/fixtures/init.py
--- a/backend/pipeline/quality/fixtures/chat_test_suite.yaml
+++ b/backend/pipeline/quality/fixtures/chat_test_suite.yaml
@ -0,0 +1,72 @@
 # Chat quality evaluation test suite
 # 10 representative queries across 4 categories:
 #   - technical: How-to questions about specific production techniques
 #   - conceptual: Broader understanding questions about audio concepts
 #   - creator: Creator-scoped queries at different personality weights
 #   - cross_creator: Queries spanning multiple creators' approaches
 queries:
  # ── Technical how-to (2) ────────────────────────────────────────────
  - query: "How do I set up sidechain compression on a bass synth using a kick drum as the trigger?"
    creator: null
    personality_weight: 0.0
    category: technical
    description: "Common sidechain compression setup — expects specific settings (ratio, attack, release)"
  - query: "What are the best EQ settings for cleaning up a muddy vocal recording?"
    creator: null
    personality_weight: 0.0
    category: technical
    description: "Vocal EQ technique — expects frequency ranges, Q values, cut/boost guidance"
  # ── Conceptual (2) ─────────────────────────────────────────────────
  - query: "What is the difference between parallel compression and serial compression, and when should I use each?"
    creator: null
    personality_weight: 0.0
    category: conceptual
    description: "Conceptual comparison — expects clear definitions, use cases, pros/cons"
  - query: "How does sample rate affect sound quality in music production?"
    creator: null
    personality_weight: 0.0
    category: conceptual
    description: "Audio fundamentals — expects Nyquist, aliasing, practical guidance"
  # ── Creator-specific: encyclopedic (2) ──────────────────────────────
  - query: "How does this creator approach sound design for bass sounds?"
    creator: "KEOTA"
    personality_weight: 0.0
    category: creator_encyclopedic
    description: "Creator-scoped query at weight=0 — should be neutral/encyclopedic about KEOTA's techniques"
  - query: "What mixing techniques does this creator recommend for achieving width in a mix?"
    creator: "Mr. Bill"
    personality_weight: 0.0
    category: creator_encyclopedic
    description: "Creator-scoped query at weight=0 — neutral tone about Mr. Bill's approach"
  # ── Creator-specific: personality (2) ───────────────────────────────
  - query: "How does this creator approach sound design for bass sounds?"
    creator: "KEOTA"
    personality_weight: 0.7
    category: creator_personality
    description: "Same query as above but at weight=0.7 — should reflect KEOTA's voice and teaching style"
  - query: "What mixing techniques does this creator recommend for achieving width in a mix?"
    creator: "Mr. Bill"
    personality_weight: 0.7
    category: creator_personality
    description: "Same query as above but at weight=0.7 — should reflect Mr. Bill's voice"
  # ── Cross-creator (2) ──────────────────────────────────────────────
  - query: "What are the different approaches to layering synth sounds across creators?"
    creator: null
    personality_weight: 0.0
    category: cross_creator
    description: "Cross-creator comparison — should cite multiple creators' techniques"
  - query: "How do different producers approach drum processing and what plugins do they prefer?"
    creator: null
    personality_weight: 0.0
    category: cross_creator
    description: "Cross-creator comparison on drums — expects multiple perspectives with citations"
--- a/backend/pipeline/quality/fixtures/sample_classifications.json
+++ b/backend/pipeline/quality/fixtures/sample_classifications.json
@ -0,0 +1,29 @@
 {
  "extracted_moments": [
    {
      "title": "Frequency-specific sidechain with Trackspacer",
      "summary": "Using Trackspacer plugin for frequency-band sidechain compression targeting 100-300Hz, allowing bass high-end to remain present while clearing low-mid mud under the kick.",
      "content_type": "technique",
      "plugins": ["Trackspacer"],
      "start_time": 15.2,
      "end_time": 52.1
    },
    {
      "title": "Parallel drum compression chain",
      "summary": "Setting up Ableton's Drum Buss at 40% drive into a return track with Valhalla Room at 1.2s decay, mixed at -12dB for room sound without wash.",
      "content_type": "settings",
      "plugins": ["Drum Buss", "Valhalla Room"],
      "start_time": 52.1,
      "end_time": 89.3
    },
    {
      "title": "Mono compatibility checking workflow",
      "summary": "Using Ableton's Utility plugin on the sub bus to constantly check mono compatibility of layered bass patches, catching phase cancellation before mixdown.",
      "content_type": "workflow",
      "plugins": ["Utility"],
      "start_time": 89.3,
      "end_time": 110.0
    }
  ],
  "taxonomy": "Sound Design > Mixing & Processing"
 }
--- a/backend/pipeline/quality/fixtures/sample_moments.json
+++ b/backend/pipeline/quality/fixtures/sample_moments.json
@ -0,0 +1,54 @@
 {
  "creator_name": "KOAN Sound",
  "topic_category": "Sound design",
  "moments": [
    {
      "summary": "Layering snare transients by combining a high-frequency click from a Popcorn Snare with a mid-body from a pitched-down 808 rim shot, blending at -6dB relative offset.",
      "transcript_excerpt": "So what I'll do is take the Popcorn Snare — that's got this really sharp click at like 4k — and then I layer underneath it a rim shot pitched down maybe 3 semitones. You blend those together and suddenly you've got this snare that cuts through everything but still has weight.",
      "topic_tags": ["snare layering", "transient design", "sample stacking"],
      "topic_category": "Sound design",
      "start_time": 124.5,
      "end_time": 158.2
    },
    {
      "summary": "Using Serum's noise oscillator with the 'Analog_Crackle' wavetable at 12% mix to add organic texture to bass patches, followed by OTT at 30% depth for glue.",
      "transcript_excerpt": "One trick I always come back to is Serum's noise osc with Analog_Crackle. You don't want it loud — like 12 percent mix — just enough that the bass feels alive. Then slap OTT on there at maybe 30 percent depth and it glues the whole thing together without squashing it.",
      "topic_tags": ["bass design", "Serum", "OTT", "texture"],
      "topic_category": "Sound design",
      "start_time": 203.1,
      "end_time": 241.7
    },
    {
      "summary": "Resampling technique: bounce a bass patch to audio, chop the best 2 bars, then re-pitch in Simpler with warp off for tighter timing and consistent tone.",
      "transcript_excerpt": "I'll resample everything. Bounce it down, find the two bars that sound best, throw it in Simpler with warp completely off. Now you've got this tight, consistent thing where every hit is exactly the same energy. The pitch tracking is way more predictable too.",
      "topic_tags": ["resampling", "Ableton", "Simpler", "bass production"],
      "topic_category": "Sound design",
      "start_time": 312.0,
      "end_time": 349.8
    },
    {
      "summary": "Parallel compression chain for drums using Ableton's Drum Buss at 40% drive into a return track with Valhalla Room at 1.2s decay, mixed at -12dB.",
      "transcript_excerpt": "The parallel chain is dead simple — Drum Buss, crank the drive to about 40 percent, send that to a return with Valhalla Room. Keep the decay short, like 1.2 seconds. Mix it in at minus 12 and your drums just... breathe. They've got this room sound without getting washy.",
      "topic_tags": ["parallel compression", "drum processing", "Valhalla Room", "Drum Buss"],
      "topic_category": "Sound design",
      "start_time": 421.3,
      "end_time": 462.1
    },
    {
      "summary": "Frequency-specific sidechain using Trackspacer plugin instead of volume ducking, targeting only 100-300Hz so the bass ducks under the kick without losing high-end presence.",
      "transcript_excerpt": "Everyone does volume sidechain but honestly Trackspacer changed everything for me. You set it to only affect 100 to 300 Hz so when the kick hits, the bass ducks just in that low-mid range. The top end of the bass stays right there — you keep all the character and harmonics, you just clear the mud.",
      "topic_tags": ["sidechaining", "Trackspacer", "frequency ducking", "mixing"],
      "topic_category": "Sound design",
      "start_time": 498.7,
      "end_time": 534.2
    },
    {
      "summary": "Using Ableton's Utility plugin to check mono compatibility at every stage, specifically toggling mono on the sub bus to catch phase cancellation from layered bass patches.",
      "transcript_excerpt": "I'm almost paranoid about mono. I've got Utility on the sub bus and I'm flipping to mono constantly. If your layered bass sounds thin in mono you've got phase issues — doesn't matter how fat it sounds in stereo, it'll collapse on a club system.",
      "topic_tags": ["mono compatibility", "phase checking", "club mixing", "Utility"],
      "topic_category": "Sound design",
      "start_time": 567.0,
      "end_time": 598.4
    }
  ]
 }
--- a/backend/pipeline/quality/fixtures/sample_segments.json
+++ b/backend/pipeline/quality/fixtures/sample_segments.json
@ -0,0 +1,40 @@
 {
  "transcript_segments": [
    {
      "index": 0,
      "start_time": 0.0,
      "end_time": 15.2,
      "text": "Hey everyone, today we're going to talk about sidechain compression and how I use it in my productions."
    },
    {
      "index": 1,
      "start_time": 15.2,
      "end_time": 34.8,
      "text": "So the basic idea is you take the kick drum signal and use it to duck the bass. Most people use a compressor for this but I actually prefer Trackspacer because it gives you frequency-specific ducking."
    },
    {
      "index": 2,
      "start_time": 34.8,
      "end_time": 52.1,
      "text": "With Trackspacer you can set it to only affect 100 to 300 Hz so when the kick hits, the bass ducks just in that low-mid range. The top end stays right there."
    },
    {
      "index": 3,
      "start_time": 52.1,
      "end_time": 71.5,
      "text": "Now let me show you another technique — parallel compression on drums. I use Drum Buss with the drive at about 40 percent, then send that to a return track."
    },
    {
      "index": 4,
      "start_time": 71.5,
      "end_time": 89.3,
      "text": "On the return I put Valhalla Room with a short decay, like 1.2 seconds. Mix it in at minus 12 dB. Your drums just breathe — they get this room sound without getting washy."
    },
    {
      "index": 5,
      "start_time": 89.3,
      "end_time": 110.0,
      "text": "One more thing about mono compatibility. I always have Utility on the sub bus and I flip to mono constantly. If your layered bass sounds thin in mono you've got phase issues."
    }
  ]
 }
--- a/backend/pipeline/quality/fixtures/sample_topic_group.json
+++ b/backend/pipeline/quality/fixtures/sample_topic_group.json
@ -0,0 +1,18 @@
 {
  "topic_segments": [
    {
      "start_index": 0,
      "end_index": 2,
      "topic_label": "Frequency-specific sidechain compression with Trackspacer",
      "summary": "Using Trackspacer for frequency-band sidechain ducking instead of traditional volume compression",
      "transcript_text": "Hey everyone, today we're going to talk about sidechain compression and how I use it in my productions. So the basic idea is you take the kick drum signal and use it to duck the bass. Most people use a compressor for this but I actually prefer Trackspacer because it gives you frequency-specific ducking. With Trackspacer you can set it to only affect 100 to 300 Hz so when the kick hits, the bass ducks just in that low-mid range. The top end stays right there."
    },
    {
      "start_index": 3,
      "end_index": 4,
      "topic_label": "Parallel drum compression with Drum Buss and Valhalla Room",
      "summary": "Setting up a parallel compression chain using Ableton's Drum Buss and Valhalla Room reverb for drum processing",
      "transcript_text": "Now let me show you another technique — parallel compression on drums. I use Drum Buss with the drive at about 40 percent, then send that to a return track. On the return I put Valhalla Room with a short decay, like 1.2 seconds. Mix it in at minus 12 dB. Your drums just breathe — they get this room sound without getting washy."
    }
  ]
 }
--- a/backend/pipeline/quality/optimizer.py
+++ b/backend/pipeline/quality/optimizer.py
@ -0,0 +1,522 @@
 """Automated prompt optimization loop for pipeline stages 2-5.
 Orchestrates a generate→score→select cycle:
 1. Score the current best prompt against reference fixtures
 2. Generate N variants targeting weak dimensions
 3. Score each variant
 4. Keep the best scorer as the new baseline
 5. Repeat for K iterations
 Usage (via CLI):
    python -m pipeline.quality optimize --stage 5 --iterations 10
    python -m pipeline.quality optimize --stage 3 --iterations 5 --file fixtures/sample_topic_group.json
 """
 from __future__ import annotations
 import json
 import logging
 import time
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
 from pipeline.llm_client import LLMClient
 from pipeline.quality.scorer import STAGE_CONFIGS, ScoreResult, ScoreRunner
 logger = logging.getLogger(__name__)
@dataclass
 class OptimizationResult:
    """Full result of an optimization run."""
    best_prompt: str = ""
    best_score: ScoreResult = field(default_factory=ScoreResult)
    history: list[dict] = field(default_factory=list)
    elapsed_seconds: float = 0.0
 class OptimizationLoop:
    """Runs iterative prompt optimization for a pipeline stage.
    Each iteration generates *variants_per_iter* prompt mutations,
    scores each against reference fixture data, and keeps the
    highest-composite-scoring variant as the new baseline.
    Parameters
    ----------
    client:
        LLMClient instance for LLM calls (synthesis + scoring + variant gen).
    stage:
        Pipeline stage number (2-5).
    fixture_path:
        Path to a JSON fixture file matching the stage's expected keys.
    iterations:
        Number of generate→score→select cycles.
    variants_per_iter:
        Number of variant prompts to generate per iteration.
    """
    def __init__(
        self,
        client: LLMClient,
        stage: int,
        fixture_path: str,
        iterations: int = 5,
        variants_per_iter: int = 2,
        output_dir: str | None = None,
    ) -> None:
        if stage not in STAGE_CONFIGS:
            raise ValueError(
                f"Unsupported stage {stage}. Valid stages: {sorted(STAGE_CONFIGS)}"
            )
        self.client = client
        self.stage = stage
        self.fixture_path = fixture_path
        self.iterations = iterations
        self.variants_per_iter = variants_per_iter
        self.config = STAGE_CONFIGS[stage]
        self.output_dir = output_dir
        self.scorer = ScoreRunner(client)
        self.generator = PromptVariantGenerator(client)
    def run(self) -> OptimizationResult:
        """Execute the full optimization loop.
        Returns
        -------
        OptimizationResult
            Contains the best prompt, its scores, full iteration history,
            and wall-clock elapsed time.
        """
        from pipeline.stages import _load_prompt
        t0 = time.monotonic()
        dimensions = self.config.dimensions
        # Load base prompt using the stage's configured prompt file
        prompt_file = self.config.prompt_file
        try:
            base_prompt = _load_prompt(prompt_file)
        except FileNotFoundError:
            logger.error("Prompt file not found: %s", prompt_file)
            return OptimizationResult(
                best_prompt="",
                best_score=ScoreResult(error=f"Prompt file not found: {prompt_file}"),
                elapsed_seconds=round(time.monotonic() - t0, 2),
            )
        # Load fixture data
        try:
            fixture = self._load_fixture()
        except (FileNotFoundError, json.JSONDecodeError, KeyError) as exc:
            logger.error("Failed to load fixture: %s", exc)
            return OptimizationResult(
                best_prompt=base_prompt,
                best_score=ScoreResult(error=f"Fixture load error: {exc}"),
                elapsed_seconds=round(time.monotonic() - t0, 2),
            )
        history: list[dict] = []
        # Score the baseline
        print(f"\n{'='*60}")
        print(f"  PROMPT OPTIMIZATION — Stage {self.stage}")
        print(f"  Iterations: {self.iterations}, Variants/iter: {self.variants_per_iter}")
        print(f"{'='*60}\n")
        print("  Scoring baseline prompt...")
        best_score = self._score_variant(base_prompt, fixture)
        best_prompt = base_prompt
        history.append({
            "iteration": 0,
            "variant_index": 0,
            "prompt_text": base_prompt[:200] + "..." if len(base_prompt) > 200 else base_prompt,
            "prompt_length": len(base_prompt),
            "composite": best_score.composite,
            "scores": {d: best_score.scores.get(d, 0.0) for d in dimensions},
            "error": best_score.error,
            "label": "baseline",
        })
        if best_score.error:
            print(f"  ✗ Baseline scoring failed: {best_score.error}")
            print("  Aborting optimization — fix the baseline first.\n")
            return OptimizationResult(
                best_prompt=best_prompt,
                best_score=best_score,
                history=history,
                elapsed_seconds=round(time.monotonic() - t0, 2),
            )
        baseline_composite = best_score.composite
        total_variants_scored = 0
        self._write_progress(
            phase="baseline_scored",
            iteration=0, variant=0,
            total_variants_scored=0,
            best_composite=best_score.composite,
            baseline_composite=baseline_composite,
            elapsed_seconds=round(time.monotonic() - t0, 2),
            best_label="baseline",
        )
        self._print_iteration_summary(0, best_score, is_baseline=True)
        # Iterate
        best_label = "baseline"
        for iteration in range(1, self.iterations + 1):
            print(f"\n  ── Iteration {iteration}/{self.iterations} ──")
            # Generate variants with stage-appropriate markers
            variants = self.generator.generate(
                base_prompt=best_prompt,
                scores=best_score,
                n=self.variants_per_iter,
                stage=self.stage,
            )
            if not variants:
                print("  ⚠ No valid variants generated — skipping iteration")
                continue
            # Score each variant
            iteration_best_score = best_score
            iteration_best_prompt = best_prompt
            for vi, variant_prompt in enumerate(variants):
                print(f"  Scoring variant {vi + 1}/{len(variants)}...")
                score = self._score_variant(variant_prompt, fixture)
                history.append({
                    "iteration": iteration,
                    "variant_index": vi + 1,
                    "prompt_text": variant_prompt[:200] + "..." if len(variant_prompt) > 200 else variant_prompt,
                    "prompt_length": len(variant_prompt),
                    "composite": score.composite,
                    "scores": {d: score.scores.get(d, 0.0) for d in dimensions},
                    "error": score.error,
                    "label": f"iter{iteration}_v{vi+1}",
                })
                if score.error:
                    print(f"    ✗ Variant {vi + 1} errored: {score.error}")
                    total_variants_scored += 1
                    self._write_progress(
                        phase="variant_scored",
                        iteration=iteration, variant=vi + 1,
                        total_variants_scored=total_variants_scored,
                        best_composite=best_score.composite,
                        baseline_composite=baseline_composite,
                        elapsed_seconds=round(time.monotonic() - t0, 2),
                        best_label=best_label,
                    )
                    continue
                total_variants_scored += 1
                if score.composite > iteration_best_score.composite:
                    iteration_best_score = score
                    iteration_best_prompt = variant_prompt
                    print(f"    ✓ New best: {score.composite:.3f} (was {best_score.composite:.3f})")
                else:
                    print(f"    · Score {score.composite:.3f} ≤ current best {iteration_best_score.composite:.3f}")
                self._write_progress(
                    phase="variant_scored",
                    iteration=iteration, variant=vi + 1,
                    total_variants_scored=total_variants_scored,
                    best_composite=max(best_score.composite, iteration_best_score.composite),
                    baseline_composite=baseline_composite,
                    elapsed_seconds=round(time.monotonic() - t0, 2),
                    best_label=best_label if iteration_best_score.composite <= best_score.composite
                        else f"iter{iteration}_v{vi+1}",
                )
            # Update global best if this iteration improved
            if iteration_best_score.composite > best_score.composite:
                best_score = iteration_best_score
                best_prompt = iteration_best_prompt
                best_label = f"iter{iteration}"
                print(f"  ★ Iteration {iteration} improved: {best_score.composite:.3f}")
            else:
                print(f"  · No improvement in iteration {iteration}")
            self._print_iteration_summary(iteration, best_score)
        # Final report
        elapsed = round(time.monotonic() - t0, 2)
        self._print_final_report(best_score, history, elapsed)
        self._write_progress(
            phase="complete",
            iteration=self.iterations,
            variant=self.variants_per_iter,
            total_variants_scored=total_variants_scored,
            best_composite=best_score.composite,
            baseline_composite=baseline_composite,
            elapsed_seconds=elapsed,
            best_label=best_label,
        )
        return OptimizationResult(
            best_prompt=best_prompt,
            best_score=best_score,
            history=history,
            elapsed_seconds=elapsed,
        )
    # ── Internal helpers ──────────────────────────────────────────────────
    def _write_progress(
        self,
        *,
        phase: str,
        iteration: int,
        variant: int,
        total_variants_scored: int,
        best_composite: float,
        baseline_composite: float,
        elapsed_seconds: float,
        best_label: str = "",
    ) -> None:
        """Write a progress.json file to the output directory for external monitoring.
        File is atomic-replaced so readers never see partial writes.
        """
        if not self.output_dir:
            return
        out_dir = Path(self.output_dir)
        out_dir.mkdir(parents=True, exist_ok=True)
        progress_path = out_dir / f"progress_stage{self.stage}.json"
        total_expected = self.iterations * self.variants_per_iter
        pct = (total_variants_scored / total_expected * 100) if total_expected else 0
        # ETA: average time per variant × remaining
        remaining = total_expected - total_variants_scored
        avg_per_variant = (elapsed_seconds / total_variants_scored) if total_variants_scored > 0 else 0
        eta_seconds = round(avg_per_variant * remaining, 1)
        payload = {
            "stage": self.stage,
            "phase": phase,
            "iteration": iteration,
            "total_iterations": self.iterations,
            "variant": variant,
            "variants_per_iter": self.variants_per_iter,
            "total_variants_scored": total_variants_scored,
            "total_expected": total_expected,
            "percent_complete": round(pct, 1),
            "baseline_composite": round(baseline_composite, 4),
            "best_composite": round(best_composite, 4),
            "improvement": round(best_composite - baseline_composite, 4),
            "best_label": best_label,
            "elapsed_seconds": round(elapsed_seconds, 1),
            "eta_seconds": eta_seconds,
            "updated_at": datetime.now(timezone.utc).isoformat(),
        }
        # Atomic write via temp file + rename
        tmp_path = progress_path.with_suffix(".tmp")
        tmp_path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
        tmp_path.rename(progress_path)
    def _load_fixture(self) -> dict:
        """Load and validate the fixture JSON file against stage-specific keys."""
        path = Path(self.fixture_path)
        if not path.exists():
            raise FileNotFoundError(f"Fixture not found: {path}")
        data = json.loads(path.read_text(encoding="utf-8"))
        for key in self.config.fixture_keys:
            if key not in data:
                raise KeyError(
                    f"Stage {self.stage} fixture must contain '{key}' key "
                    f"(required: {self.config.fixture_keys})"
                )
        return data
    def _score_variant(
        self,
        variant_prompt: str,
        fixture: dict,
    ) -> ScoreResult:
        """Score a variant prompt by running LLM completion + scoring.
        Dispatches to stage-specific synthesis logic:
        - Stages 2-4: call LLM with the variant prompt and fixture input,
          parse with the stage's schema, then score via score_stage_output()
        - Stage 5: original flow (synthesis + page scoring)
        """
        from pipeline.stages import _get_stage_config
        import json as _json
        import openai as _openai
        model_override, modality = _get_stage_config(self.stage)
        schema_class = self.config.get_schema()
        # Build user prompt from fixture data — stage-specific formatting
        user_prompt = self._build_user_prompt(fixture)
        t0 = time.monotonic()
        try:
            raw = self.client.complete(
                system_prompt=variant_prompt,
                user_prompt=user_prompt,
                response_model=schema_class,
                modality=modality,
                model_override=model_override,
            )
            elapsed_synth = round(time.monotonic() - t0, 2)
        except (_openai.APIConnectionError, _openai.APITimeoutError) as exc:
            elapsed_synth = round(time.monotonic() - t0, 2)
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=f"LLM error (stage {self.stage}): {exc}",
            )
        except Exception as exc:
            elapsed_synth = round(time.monotonic() - t0, 2)
            logger.exception("Unexpected error during variant synthesis (stage %d)", self.stage)
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=f"Unexpected synthesis error: {exc}",
            )
        # Parse the LLM response into the stage schema
        raw_text = str(raw).strip()
        try:
            parsed = self.client.parse_response(raw_text, schema_class)
        except Exception as exc:
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=f"Variant parse error (stage {self.stage}): {exc}",
            )
        # Convert parsed output to JSON for the scorer
        output_json = self._schema_to_output_json(parsed)
        if output_json is None:
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=f"Stage {self.stage} produced empty output",
            )
        # Score using the generic stage scorer
        result = self.scorer.score_stage_output(
            stage=self.stage,
            output_json=output_json,
            input_json=self._fixture_to_input_json(fixture),
        )
        result.elapsed_seconds = round(result.elapsed_seconds + elapsed_synth, 2)
        return result
    def _build_user_prompt(self, fixture: dict) -> str:
        """Build a stage-appropriate user prompt from fixture data."""
        if self.stage == 2:
            segments_json = json.dumps(fixture["transcript_segments"], indent=2)
            return f"<transcript_segments>\n{segments_json}\n</transcript_segments>"
        elif self.stage == 3:
            segments_json = json.dumps(fixture["topic_segments"], indent=2)
            return f"<topic_segments>\n{segments_json}\n</topic_segments>"
        elif self.stage == 4:
            moments_json = json.dumps(fixture["extracted_moments"], indent=2)
            taxonomy = fixture.get("taxonomy", "")
            prompt = f"<moments>\n{moments_json}\n</moments>"
            if taxonomy:
                prompt += f"\n<taxonomy>{taxonomy}</taxonomy>"
            return prompt
        elif self.stage == 5:
            moments_json = json.dumps(fixture["moments"], indent=2)
            creator = fixture.get("creator_name", "Unknown")
            return f"<creator>{creator}</creator>\n<moments>\n{moments_json}\n</moments>"
        else:
            return json.dumps(fixture, indent=2)
    def _schema_to_output_json(self, parsed: object) -> dict | list | None:
        """Convert a parsed Pydantic schema instance to JSON-serializable dict."""
        if hasattr(parsed, "model_dump"):
            return parsed.model_dump()
        elif hasattr(parsed, "dict"):
            return parsed.dict()
        return None
    def _fixture_to_input_json(self, fixture: dict) -> dict | list:
        """Extract the primary input data from the fixture for scorer context."""
        if self.stage == 2:
            return fixture["transcript_segments"]
        elif self.stage == 3:
            return fixture["topic_segments"]
        elif self.stage == 4:
            return fixture["extracted_moments"]
        elif self.stage == 5:
            return fixture["moments"]
        return fixture
    def _print_iteration_summary(
        self,
        iteration: int,
        score: ScoreResult,
        is_baseline: bool = False,
    ) -> None:
        """Print a compact one-line summary of the current best scores."""
        label = "BASELINE" if is_baseline else f"ITER {iteration}"
        dimensions = self.config.dimensions
        dims = "  ".join(
            f"{d[:4]}={score.scores.get(d, 0.0):.2f}" for d in dimensions
        )
        print(f"  [{label}] composite={score.composite:.3f}  {dims}")
    def _print_final_report(
        self,
        best_score: ScoreResult,
        history: list[dict],
        elapsed: float,
    ) -> None:
        """Print the final optimization summary."""
        dimensions = self.config.dimensions
        print(f"\n{'='*60}")
        print("  OPTIMIZATION COMPLETE")
        print(f"{'='*60}")
        print(f"  Total time: {elapsed}s")
        print(f"  Iterations: {self.iterations}")
        print(f"  Variants scored: {len(history) - 1}")  # minus baseline
        baseline_composite = history[0]["composite"] if history else 0.0
        improvement = best_score.composite - baseline_composite
        print(f"\n  Baseline composite: {baseline_composite:.3f}")
        print(f"  Best composite:     {best_score.composite:.3f}")
        if improvement > 0:
            print(f"  Improvement:        +{improvement:.3f}")
        else:
            print(f"  Improvement:        {improvement:.3f} (no gain)")
        print(f"\n  Per-dimension best scores:")
        for d in dimensions:
            val = best_score.scores.get(d, 0.0)
            bar = "█" * int(val * 20) + "░" * (20 - int(val * 20))
            print(f"    {d.replace('_', ' ').title():25s} {val:.2f}  {bar}")
        errored = sum(1 for h in history if h.get("error"))
        if errored:
            print(f"\n  ⚠ {errored} variant(s) errored during scoring")
        print(f"{'='*60}\n")
 # Late import to avoid circular dependency (scorer imports at module level,
 # variant_generator imports scorer)
 from pipeline.quality.variant_generator import PromptVariantGenerator  # noqa: E402
--- a/backend/pipeline/quality/results/.gitkeep
+++ b/backend/pipeline/quality/results/.gitkeep
--- a/backend/pipeline/quality/results/chat_eval_baseline.json
+++ b/backend/pipeline/quality/results/chat_eval_baseline.json
@ -0,0 +1,91 @@
 {
  "timestamp": "20260404_043200",
  "evaluation_method": "manual_curl",
  "llm_status": "unavailable (upstream 502 Bad Gateway at chat.forgetyour.name)",
  "api_health": "ok",
  "total_queries": 6,
  "scored_queries": 0,
  "errors_llm": 6,
  "note": "LLM completions unavailable — only source retrieval quality assessed. Re-run with automated eval when LLM proxy is restored.",
  "source_retrieval_results": [
    {
      "query": "How do I set up sidechain compression on a bass synth using a kick drum as the trigger?",
      "creator": null,
      "personality_weight": 0.0,
      "category": "technical",
      "source_count": 10,
      "unique_creators": ["Break", "Caracal Project, The", "Chee", "KOAN Sound"],
      "creator_distribution": {"Break": 3, "Caracal Project, The": 2, "Chee": 2, "KOAN Sound": 1},
      "relevance_assessment": "highly_relevant",
      "notes": "All 10 sources directly about sidechain compression. Good creator diversity."
    },
    {
      "query": "What are the different approaches to layering synth sounds across creators?",
      "creator": null,
      "personality_weight": 0.0,
      "category": "cross_creator",
      "source_count": 10,
      "unique_creators": ["Chee", "COPYCATT", "Caracal Project, The", "Current Value", "Emperor"],
      "creator_distribution": {"Chee": 5, "COPYCATT": 2, "Caracal Project, The": 1, "Current Value": 1, "Emperor": 1},
      "relevance_assessment": "relevant_but_skewed",
      "notes": "50% of sources from Chee — cross-creator diversity could be improved."
    },
    {
      "query": "How does this creator approach sound design for bass sounds?",
      "creator": "Keota",
      "personality_weight": 0.0,
      "category": "creator_encyclopedic",
      "source_count": 10,
      "unique_creators": ["COPYCATT", "Break", "Chee", "Caracal Project, The"],
      "creator_distribution": {"COPYCATT": 2, "Break": 2, "Chee": 3, "Caracal Project, The": 3},
      "relevance_assessment": "creator_scope_failure",
      "notes": "Zero sources from Keota despite creator-scoped query. Cascade fell through to global tier."
    },
    {
      "query": "What mixing techniques does this creator recommend for achieving width in a mix?",
      "creator": "Mr. Bill",
      "personality_weight": 0.0,
      "category": "creator_encyclopedic",
      "source_count": 10,
      "unique_creators": ["Break", "Frequent", "Caracal Project, The", "COPYCATT", "Chee"],
      "creator_distribution": {"Break": 2, "Frequent": 1, "Caracal Project, The": 2, "COPYCATT": 2, "Chee": 3},
      "relevance_assessment": "creator_scope_failure",
      "notes": "Zero sources from Mr. Bill despite creator-scoped query."
    },
    {
      "query": "How does this creator approach sound design for bass sounds? (personality)",
      "creator": "Keota",
      "personality_weight": 0.7,
      "category": "creator_personality",
      "source_count": 10,
      "personality_profile_exists": false,
      "notes": "Personality weight=0.7 accepted but no profile data exists — falls back to encyclopedic mode silently."
    },
    {
      "query": "What mixing techniques does this creator recommend for width? (personality)",
      "creator": "Mr. Bill",
      "personality_weight": 0.7,
      "category": "creator_personality",
      "source_count": 10,
      "personality_profile_exists": false,
      "notes": "Personality weight=0.7 accepted but no profile data exists — falls back to encyclopedic mode silently."
    }
  ],
  "personality_profiles_status": {
    "total_creators": 25,
    "creators_with_profile": 0,
    "assessment": "No personality profiles populated. The 5-tier progressive injection system is architecturally complete (26 unit tests pass) but functionally inert on the live system."
  },
  "prompt_changes": {
    "before_lines": 4,
    "after_lines": 18,
    "changes": [
      "Added structured citation guidance with inline example",
      "Added response format section (2-4 paragraphs, bullet lists, bold terms)",
      "Added domain awareness (music production subdomain list)",
      "Added conflicting source handling instruction",
      "Added response length guidance"
    ],
    "test_impact": "Zero test modifications needed — all 26 tests pass unchanged"
  }
 }
--- a/backend/pipeline/quality/results/optimize_stage5_20260401_100005.json
+++ b/backend/pipeline/quality/results/optimize_stage5_20260401_100005.json
--- a/backend/pipeline/quality/results/progress_stage5.json
+++ b/backend/pipeline/quality/results/progress_stage5.json
@ -0,0 +1,18 @@
 {
  "stage": 5,
  "phase": "variant_scored",
  "iteration": 3,
  "total_iterations": 5,
  "variant": 2,
  "variants_per_iter": 3,
  "total_variants_scored": 4,
  "total_expected": 15,
  "percent_complete": 26.7,
  "baseline_composite": 1.0,
  "best_composite": 1.0,
  "improvement": 0.0,
  "best_label": "baseline",
  "elapsed_seconds": 1303.4,
  "eta_seconds": 3584.3,
  "updated_at": "2026-04-01T10:37:26.971865+00:00"
 }
--- a/backend/pipeline/quality/scorer.py
+++ b/backend/pipeline/quality/scorer.py
@ -0,0 +1,614 @@
 """Multi-stage quality scorer — LLM-as-judge evaluation with per-stage rubrics.
 Supports stages 2-5, each with its own scoring dimensions, rubric, format
 markers, fixture key requirements, prompt file name, and output schema.
 Run via: python -m pipeline.quality score --file <path>
 """
 from __future__ import annotations
 import json
 import logging
 import sys
 import time
 from dataclasses import dataclass, field
 from typing import Any
 import openai
 from pydantic import BaseModel
 from pipeline.llm_client import LLMClient
 from pipeline.quality.voice_dial import VoiceDial
 logger = logging.getLogger(__name__)
 # ── Per-stage configuration registry ─────────────────────────────────────────
 class StageConfig:
    """Configuration for scoring a specific pipeline stage."""
    def __init__(
        self,
        stage: int,
        dimensions: list[str],
        rubric: str,
        format_markers: list[str],
        fixture_keys: list[str],
        prompt_file: str,
        schema_class: str,
    ) -> None:
        self.stage = stage
        self.dimensions = dimensions
        self.rubric = rubric
        self.format_markers = format_markers
        self.fixture_keys = fixture_keys
        self.prompt_file = prompt_file
        self.schema_class = schema_class
    def get_schema(self) -> type[BaseModel]:
        """Import and return the Pydantic schema class for this stage."""
        from pipeline import schemas
        return getattr(schemas, self.schema_class)
 # ── Stage rubrics ────────────────────────────────────────────────────────────
 _STAGE_2_RUBRIC = """\
 You are an expert evaluator of transcript segmentation quality for educational content.
 You will be given:
 1. A segmentation result (JSON with segments, each having start_index, end_index, topic_label, summary)
 2. The source transcript segments used as input
 Evaluate the segmentation across these 4 dimensions, scoring each 0.0 to 1.0:
 **coverage_completeness** — All transcript content accounted for
 - 0.9-1.0: Every transcript segment is covered by exactly one topic segment, no gaps or overlaps
 - 0.5-0.7: Minor gaps or overlaps, but most content is covered
 - 0.0-0.3: Large gaps — significant transcript segments are not assigned to any topic
 **topic_specificity** — Topic labels are descriptive and useful
 - 0.9-1.0: Labels are specific and descriptive (e.g., "Sidechain compression on kick-bass" not "Audio processing")
 - 0.5-0.7: Labels are somewhat specific but could be more descriptive
 - 0.0-0.3: Labels are generic or meaningless ("Topic 1", "Discussion", "Audio")
 **boundary_accuracy** — Segment boundaries align with actual topic transitions
 - 0.9-1.0: Boundaries fall at natural topic transitions, segments are coherent units
 - 0.5-0.7: Most boundaries are reasonable but some segments mix distinct topics
 - 0.0-0.3: Boundaries seem arbitrary, segments contain unrelated content
 **summary_quality** — Summaries accurately describe segment content
 - 0.9-1.0: Summaries capture the key points of each segment concisely and accurately
 - 0.5-0.7: Summaries are acceptable but miss some key points or are too vague
 - 0.0-0.3: Summaries are inaccurate, too generic, or missing
 Return ONLY a JSON object with this exact structure:
 {
  "coverage_completeness": <float 0.0-1.0>,
  "topic_specificity": <float 0.0-1.0>,
  "boundary_accuracy": <float 0.0-1.0>,
  "summary_quality": <float 0.0-1.0>,
  "justifications": {
    "coverage_completeness": "<1-2 sentence justification>",
    "topic_specificity": "<1-2 sentence justification>",
    "boundary_accuracy": "<1-2 sentence justification>",
    "summary_quality": "<1-2 sentence justification>"
  }
 }
 """
 _STAGE_3_RUBRIC = """\
 You are an expert evaluator of key moment extraction quality for educational content.
 You will be given:
 1. An extraction result (JSON with moments, each having title, summary, start_time, end_time, content_type, plugins, raw_transcript)
 2. The source topic segments used as input
 Evaluate the extraction across these 5 dimensions, scoring each 0.0 to 1.0:
 **moment_richness** — Extracted moments capture substantial, distinct insights
 - 0.9-1.0: Each moment represents a meaningful, distinct technique or concept with detailed summary
 - 0.5-0.7: Moments are valid but some are thin or overlap significantly with others
 - 0.0-0.3: Moments are trivial, redundant, or miss the main techniques discussed
 **timestamp_accuracy** — Time ranges are plausible and well-bounded
 - 0.9-1.0: Start/end times form reasonable ranges, no zero-length or absurdly long spans
 - 0.5-0.7: Most timestamps are reasonable but some spans seem too wide or narrow
 - 0.0-0.3: Timestamps appear arbitrary or many are zero/identical
 **content_type_correctness** — Content types match the actual moment content
 - 0.9-1.0: Each moment's content_type (technique/settings/reasoning/workflow) accurately describes it
 - 0.5-0.7: Most are correct but 1-2 are miscategorized
 - 0.0-0.3: Content types seem randomly assigned or all the same
 **summary_actionability** — Summaries provide actionable, specific information
 - 0.9-1.0: Summaries contain concrete details (values, settings, steps) that a practitioner could follow
 - 0.5-0.7: Summaries describe the topic but lack specific actionable details
 - 0.0-0.3: Summaries are vague ("discusses compression") with no actionable information
 **plugin_normalization** — Plugin/tool names are correctly identified and normalized
 - 0.9-1.0: Plugin names match standard names, no duplicates, captures all mentioned tools
 - 0.5-0.7: Most plugins captured but some are misspelled, duplicated, or missed
 - 0.0-0.3: Plugin list is mostly empty, contains non-plugins, or has many errors
 Return ONLY a JSON object with this exact structure:
 {
  "moment_richness": <float 0.0-1.0>,
  "timestamp_accuracy": <float 0.0-1.0>,
  "content_type_correctness": <float 0.0-1.0>,
  "summary_actionability": <float 0.0-1.0>,
  "plugin_normalization": <float 0.0-1.0>,
  "justifications": {
    "moment_richness": "<1-2 sentence justification>",
    "timestamp_accuracy": "<1-2 sentence justification>",
    "content_type_correctness": "<1-2 sentence justification>",
    "summary_actionability": "<1-2 sentence justification>",
    "plugin_normalization": "<1-2 sentence justification>"
  }
 }
 """
 _STAGE_4_RUBRIC = """\
 You are an expert evaluator of content classification quality for educational content.
 You will be given:
 1. A classification result (JSON with classifications, each having moment_index, topic_category, topic_tags)
 2. The source extracted moments used as input
 Evaluate the classification across these 4 dimensions, scoring each 0.0 to 1.0:
 **category_accuracy** — Topic categories are appropriate and meaningful
 - 0.9-1.0: Categories accurately reflect the primary topic of each moment, using domain-appropriate labels
 - 0.5-0.7: Most categories are reasonable but some are too broad or slightly off
 - 0.0-0.3: Categories are generic ("Music"), incorrect, or all the same
 **tag_completeness** — All relevant tags are captured
 - 0.9-1.0: Tags capture the key concepts, tools, and techniques in each moment comprehensively
 - 0.5-0.7: Main tags are present but secondary concepts or tools are missed
 - 0.0-0.3: Tags are sparse, missing major concepts mentioned in the moments
 **tag_specificity** — Tags are specific enough to be useful for search/filtering
 - 0.9-1.0: Tags are specific ("sidechain compression", "Pro-Q 3") not generic ("audio", "mixing")
 - 0.5-0.7: Mix of specific and generic tags
 - 0.0-0.3: Tags are too generic to meaningfully distinguish moments
 **coverage** — All moments are classified
 - 0.9-1.0: Every moment_index from the input has a corresponding classification entry
 - 0.5-0.7: Most moments classified but 1-2 are missing
 - 0.0-0.3: Many moments are not classified
 Return ONLY a JSON object with this exact structure:
 {
  "category_accuracy": <float 0.0-1.0>,
  "tag_completeness": <float 0.0-1.0>,
  "tag_specificity": <float 0.0-1.0>,
  "coverage": <float 0.0-1.0>,
  "justifications": {
    "category_accuracy": "<1-2 sentence justification>",
    "tag_completeness": "<1-2 sentence justification>",
    "tag_specificity": "<1-2 sentence justification>",
    "coverage": "<1-2 sentence justification>"
  }
 }
 """
 _STAGE_5_RUBRIC = """\
 You are an expert evaluator of synthesized technique articles for music production education.
 You will be given:
 1. A synthesized technique page (JSON with title, summary, body_sections)
 2. The source key moments (transcript excerpts, summaries, tags) used to create it
 Evaluate the page across these 5 dimensions, scoring each 0.0 to 1.0:
 **structural** — Section naming and organization
 - 0.9-1.0: Well-named specific sections (not generic "Overview"/"Tips"), appropriate count (3-6), 2-5 paragraphs per section
 - 0.5-0.7: Acceptable structure but some generic section names or uneven depth
 - 0.0-0.3: Poor structure — too few/many sections, generic names, single-paragraph sections
 **content_specificity** — Concrete technical details
 - 0.9-1.0: Rich in frequencies (Hz), time values (ms), ratios, plugin names, specific settings, dB values
 - 0.5-0.7: Some specific details but padded with vague statements ("adjust to taste", "experiment with settings")
 - 0.0-0.3: Mostly vague generalities with few concrete values from the source material
 **voice_preservation** — Creator's authentic voice
 - 0.9-1.0: Direct quotes preserved, opinions attributed to creator by name, personality and strong views retained
 - 0.5-0.7: Some paraphrased references to creator's views but few direct quotes
 - 0.0-0.3: Encyclopedia style — creator's voice completely smoothed out, no attribution
 **readability** — Synthesis quality and flow
 - 0.9-1.0: Reads as a cohesive article, related info merged, logical flow, no redundancy or contradiction
 - 0.5-0.7: Generally readable but some awkward transitions or minor repetition
 - 0.0-0.3: Feels like concatenated bullet points, disjointed, redundant passages
 **factual_fidelity** — Grounded in source material
 - 0.9-1.0: Every claim traceable to source moments, no invented plugin names/settings/techniques
 - 0.5-0.7: Mostly grounded but 1-2 details seem embellished or not directly from sources
 - 0.0-0.3: Contains hallucinated specifics — plugin names, settings, or techniques not in sources
 Return ONLY a JSON object with this exact structure:
 {
  "structural": <float 0.0-1.0>,
  "content_specificity": <float 0.0-1.0>,
  "voice_preservation": <float 0.0-1.0>,
  "readability": <float 0.0-1.0>,
  "factual_fidelity": <float 0.0-1.0>,
  "justifications": {
    "structural": "<1-2 sentence justification>",
    "content_specificity": "<1-2 sentence justification>",
    "voice_preservation": "<1-2 sentence justification>",
    "readability": "<1-2 sentence justification>",
    "factual_fidelity": "<1-2 sentence justification>"
  }
 }
 """
 # Backward-compat alias used by synthesize_and_score and external references
 SCORING_RUBRIC = _STAGE_5_RUBRIC
 # Build the stage configs registry
 STAGE_CONFIGS: dict[int, StageConfig] = {
    2: StageConfig(
        stage=2,
        dimensions=["coverage_completeness", "topic_specificity", "boundary_accuracy", "summary_quality"],
        rubric=_STAGE_2_RUBRIC,
        format_markers=["segments", "start_index", "end_index", "topic_label"],
        fixture_keys=["transcript_segments"],
        prompt_file="stage2_segmentation.txt",
        schema_class="SegmentationResult",
    ),
    3: StageConfig(
        stage=3,
        dimensions=["moment_richness", "timestamp_accuracy", "content_type_correctness", "summary_actionability", "plugin_normalization"],
        rubric=_STAGE_3_RUBRIC,
        format_markers=["moments", "content_type", "raw_transcript", "plugins"],
        fixture_keys=["topic_segments"],
        prompt_file="stage3_extraction.txt",
        schema_class="ExtractionResult",
    ),
    4: StageConfig(
        stage=4,
        dimensions=["category_accuracy", "tag_completeness", "tag_specificity", "coverage"],
        rubric=_STAGE_4_RUBRIC,
        format_markers=["classifications", "moment_index", "topic_category", "topic_tags"],
        fixture_keys=["extracted_moments"],
        prompt_file="stage4_classification.txt",
        schema_class="ClassificationResult",
    ),
    5: StageConfig(
        stage=5,
        dimensions=["structural", "content_specificity", "voice_preservation", "readability", "factual_fidelity"],
        rubric=SCORING_RUBRIC,
        format_markers=["SynthesisResult", '"pages"', "body_sections", "title", "summary"],
        fixture_keys=["moments", "creator_name"],
        prompt_file="stage5_synthesis.txt",
        schema_class="SynthesisResult",
    ),
 }
 # Backward-compatible alias: stage 5 dimensions list
 DIMENSIONS = STAGE_CONFIGS[5].dimensions
 # ── Result type ──────────────────────────────────────────────────────────────
@dataclass
 class ScoreResult:
    """Outcome of scoring a stage output across quality dimensions.
    Uses a generic ``scores`` dict keyed by dimension name.  Stage 5's
    original named fields (structural, content_specificity, …) are
    preserved as properties for backward compatibility.
    """
    scores: dict[str, float] = field(default_factory=dict)
    composite: float = 0.0
    justifications: dict[str, str] = field(default_factory=dict)
    elapsed_seconds: float = 0.0
    error: str | None = None
    # ── Backward-compat properties for stage 5 named dimensions ──────
    @property
    def structural(self) -> float:
        return self.scores.get("structural", 0.0)
    @property
    def content_specificity(self) -> float:
        return self.scores.get("content_specificity", 0.0)
    @property
    def voice_preservation(self) -> float:
        return self.scores.get("voice_preservation", 0.0)
    @property
    def readability(self) -> float:
        return self.scores.get("readability", 0.0)
    @property
    def factual_fidelity(self) -> float:
        return self.scores.get("factual_fidelity", 0.0)
 # ── Runner ───────────────────────────────────────────────────────────────────
 class ScoreRunner:
    """Scores pipeline stage outputs using LLM-as-judge evaluation."""
    def __init__(self, client: LLMClient) -> None:
        self.client = client
    # ── Generic stage scorer ─────────────────────────────────────────────
    def score_stage_output(
        self,
        stage: int,
        output_json: dict | list,
        input_json: dict | list,
    ) -> ScoreResult:
        """Score an arbitrary stage's output against its input.
        Parameters
        ----------
        stage:
            Pipeline stage number (2-5).
        output_json:
            The stage output to evaluate (parsed JSON).
        input_json:
            The stage input / source material.
        Returns
        -------
        ScoreResult with per-dimension scores for the requested stage.
        """
        if stage not in STAGE_CONFIGS:
            return ScoreResult(error=f"No config for stage {stage}. Valid: {sorted(STAGE_CONFIGS)}")
        cfg = STAGE_CONFIGS[stage]
        user_prompt = (
            "## Stage Output\n\n"
            f"```json\n{json.dumps(output_json, indent=2)}\n```\n\n"
            "## Stage Input\n\n"
            f"```json\n{json.dumps(input_json, indent=2)}\n```\n\n"
            f"Score this stage {stage} output across all {len(cfg.dimensions)} dimensions."
        )
        t0 = time.monotonic()
        try:
            resp = self.client.complete(
                system_prompt=cfg.rubric,
                user_prompt=user_prompt,
                response_model=BaseModel,
                modality="chat",
            )
            elapsed = round(time.monotonic() - t0, 2)
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            elapsed = round(time.monotonic() - t0, 2)
            url = self.client.settings.llm_api_url
            fallback = self.client.settings.llm_fallback_url
            return ScoreResult(
                elapsed_seconds=elapsed,
                error=f"Cannot reach LLM endpoint at {url} (fallback {fallback}). Error: {exc}",
            )
        raw_text = str(resp).strip()
        try:
            parsed = json.loads(raw_text)
        except json.JSONDecodeError:
            logger.error("Malformed judge response (not JSON): %.300s", raw_text)
            return ScoreResult(
                elapsed_seconds=elapsed,
                error=f"Malformed judge response (not valid JSON). Raw excerpt: {raw_text[:200]}",
            )
        return self._parse_scores(parsed, elapsed, cfg.dimensions)
    # ── Stage 5 convenience (backward compat) ────────────────────────────
    def score_page(
        self,
        page_json: dict,
        moments: list[dict],
    ) -> ScoreResult:
        """Evaluate a stage 5 technique page against source moments."""
        return self.score_stage_output(
            stage=5,
            output_json=page_json,
            input_json=moments,
        )
        return self._parse_scores(parsed, elapsed)
    def _parse_scores(self, parsed: dict, elapsed: float, dimensions: list[str] | None = None) -> ScoreResult:
        """Extract and validate scores from parsed JSON response."""
        dims = dimensions or DIMENSIONS
        scores: dict[str, float] = {}
        justifications: dict[str, str] = {}
        raw_justifications = parsed.get("justifications", {})
        if not isinstance(raw_justifications, dict):
            raw_justifications = {}
        for dim in dims:
            raw = parsed.get(dim)
            if raw is None:
                logger.warning("Missing dimension '%s' in judge response", dim)
                scores[dim] = 0.0
                justifications[dim] = "(missing from judge response)"
                continue
            try:
                val = float(raw)
                scores[dim] = max(0.0, min(1.0, val))  # clamp
            except (TypeError, ValueError):
                logger.warning("Invalid value for '%s': %r", dim, raw)
                scores[dim] = 0.0
                justifications[dim] = f"(invalid value: {raw!r})"
                continue
            justifications[dim] = str(raw_justifications.get(dim, ""))
        composite = sum(scores.values()) / len(dims) if dims else 0.0
        return ScoreResult(
            scores=scores,
            composite=round(composite, 3),
            justifications=justifications,
            elapsed_seconds=elapsed,
        )
    def synthesize_and_score(
        self,
        moments: list[dict],
        creator_name: str,
        voice_level: float,
    ) -> ScoreResult:
        """Re-synthesize from source moments with a voice-dialed prompt, then score.
        Loads the stage 5 synthesis prompt from disk, applies the VoiceDial
        modifier at the given voice_level, calls the LLM to produce a
        SynthesisResult, then scores the first page.
        Parameters
        ----------
        moments:
            Source key moments (dicts with summary, transcript_excerpt, etc.)
        creator_name:
            Creator name to inject into the synthesis prompt.
        voice_level:
            Float 0.0–1.0 controlling voice preservation intensity.
        Returns
        -------
        ScoreResult with per-dimension scores after voice-dialed re-synthesis.
        """
        from pipeline.schemas import SynthesisResult
        from pipeline.stages import _get_stage_config, _load_prompt
        # Load and modify the stage 5 system prompt
        try:
            base_prompt = _load_prompt("stage5_synthesis.txt")
        except FileNotFoundError as exc:
            return ScoreResult(error=f"Prompt file not found: {exc}")
        dial = VoiceDial(base_prompt)
        modified_prompt = dial.modify(voice_level)
        band = dial.band_name(voice_level)
        # Build user prompt in the same format as _synthesize_chunk
        moments_json = json.dumps(moments, indent=2)
        user_prompt = f"<creator>{creator_name}</creator>\n<moments>\n{moments_json}\n</moments>"
        model_override, modality = _get_stage_config(5)
        print(f"  Re-synthesizing at voice_level={voice_level} (band={band})...")
        t0 = time.monotonic()
        try:
            raw = self.client.complete(
                system_prompt=modified_prompt,
                user_prompt=user_prompt,
                response_model=SynthesisResult,
                modality=modality,
                model_override=model_override,
            )
            elapsed_synth = round(time.monotonic() - t0, 2)
        except (openai.APIConnectionError, openai.APITimeoutError) as exc:
            elapsed_synth = round(time.monotonic() - t0, 2)
            url = self.client.settings.llm_api_url
            fallback = self.client.settings.llm_fallback_url
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=(
                    f"Cannot reach LLM endpoint at {url} (fallback {fallback}). "
                    f"Error: {exc}"
                ),
            )
        # Parse synthesis response
        raw_text = str(raw).strip()
        try:
            synthesis = self.client.parse_response(raw_text, SynthesisResult)
        except (json.JSONDecodeError, ValueError, Exception) as exc:
            logger.error("Malformed synthesis response: %.300s", raw_text)
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error=f"Malformed synthesis response: {exc}. Raw excerpt: {raw_text[:200]}",
            )
        if not synthesis.pages:
            return ScoreResult(
                elapsed_seconds=elapsed_synth,
                error="Synthesis returned no pages.",
            )
        # Score the first page
        page = synthesis.pages[0]
        page_json = {
            "title": page.title,
            "creator_name": creator_name,
            "summary": page.summary,
            "body_sections": [
                {"heading": heading, "content": content}
                for heading, content in page.body_sections.items()
            ],
        }
        print(f"  Synthesis complete ({elapsed_synth}s). Scoring...")
        result = self.score_page(page_json, moments)
        # Include synthesis time in total
        result.elapsed_seconds = round(result.elapsed_seconds + elapsed_synth, 2)
        return result
    def print_report(self, result: ScoreResult, stage: int = 5) -> None:
        """Print a formatted scoring report to stdout."""
        dims = STAGE_CONFIGS[stage].dimensions if stage in STAGE_CONFIGS else list(result.scores.keys())
        stage_label = f"STAGE {stage}" if stage in STAGE_CONFIGS else "QUALITY"
        print("\n" + "=" * 60)
        print(f"  {stage_label} QUALITY SCORE REPORT")
        print("=" * 60)
        if result.error:
            print(f"\n  ✗ Error: {result.error}\n")
            print("=" * 60 + "\n")
            return
        for dim in dims:
            score = result.scores.get(dim, 0.0)
            bar = self._score_bar(score)
            justification = result.justifications.get(dim, "")
            print(f"\n  {dim.replace('_', ' ').title()}")
            print(f"    Score: {score:.2f}  {bar}")
            if justification:
                # Wrap justification at ~60 chars
                for line in self._wrap(justification, 56):
                    print(f"    {line}")
        print("\n" + "-" * 60)
        print(f"  Composite: {result.composite:.3f}")
        print(f"  Time: {result.elapsed_seconds}s")
        print("=" * 60 + "\n")
    @staticmethod
    def _score_bar(score: float, width: int = 20) -> str:
        """Render a visual bar for a 0-1 score."""
        filled = int(score * width)
        return "█" * filled + "░" * (width - filled)
    @staticmethod
    def _wrap(text: str, width: int) -> list[str]:
        """Simple word wrap."""
        words = text.split()
        lines: list[str] = []
        current = ""
        for word in words:
            if current and len(current) + len(word) + 1 > width:
                lines.append(current)
                current = word
            else:
                current = f"{current} {word}" if current else word
        if current:
            lines.append(current)
        return lines
--- a/backend/pipeline/quality/variant_generator.py
+++ b/backend/pipeline/quality/variant_generator.py
@ -0,0 +1,247 @@
 """LLM-powered prompt variant generator for automated optimization.
 Uses a meta-prompt to instruct the LLM to act as a prompt engineer,
 analyzing per-dimension scores and producing targeted prompt mutations
 that improve the weakest scoring dimensions while preserving the JSON
 output format required by downstream parsing.
 Supports any pipeline stage (2-5) — callers pass the stage's dimensions
 and format markers so the meta-prompt and validation adapt automatically.
 """
 from __future__ import annotations
 import logging
 from typing import Sequence
 from pipeline.llm_client import LLMClient
 from pipeline.quality.scorer import DIMENSIONS, STAGE_CONFIGS, ScoreResult
 logger = logging.getLogger(__name__)
 # ── Meta-prompt for variant generation ────────────────────────────────────────
 VARIANT_META_PROMPT = """\
 You are an expert prompt engineer specializing in LLM-powered content processing pipelines.
 Your task: given a pipeline stage prompt and its quality evaluation scores, produce an
 improved variant of the prompt that targets the weakest-scoring dimensions while
 maintaining or improving the others.
 ## Scoring Dimensions (each 0.0–1.0)
 {dimension_descriptions}
 ## Rules
 1. Focus your changes on the weakest 1-2 dimensions. Don't dilute the prompt by trying to fix everything.
 2. Add specific, actionable instructions — not vague encouragements.
 3. **CRITICAL: You MUST preserve the JSON output format section of the prompt EXACTLY as-is.**
   The prompt contains instructions about outputting a JSON object with a specific schema.
   Do NOT modify, remove, or rephrase any part of the JSON format instructions.
   Your changes should target the processing/analysis guidelines only.
 4. Keep the overall prompt length within 2x of the original. Don't bloat it.
 5. Make substantive changes — rewording a sentence or adding one adjective is not enough.
 ## Output
 Return ONLY the full modified prompt text. No explanation, no markdown fences, no preamble.
 Just the complete prompt that could be used directly as a system prompt.
 """
 # Dimension descriptions per stage, used to fill the meta-prompt template.
 _DIMENSION_DESCRIPTIONS: dict[int, str] = {
    2: (
        "- **coverage_completeness** — All transcript content accounted for, no gaps or overlaps\n"
        "- **topic_specificity** — Topic labels are descriptive and useful, not generic\n"
        "- **boundary_accuracy** — Segment boundaries align with actual topic transitions\n"
        "- **summary_quality** — Summaries accurately describe segment content"
    ),
    3: (
        "- **moment_richness** — Extracted moments capture substantial, distinct insights\n"
        "- **timestamp_accuracy** — Time ranges are plausible and well-bounded\n"
        "- **content_type_correctness** — Content types match the actual moment content\n"
        "- **summary_actionability** — Summaries provide actionable, specific information\n"
        "- **plugin_normalization** — Plugin/tool names are correctly identified and normalized"
    ),
    4: (
        "- **category_accuracy** — Topic categories are appropriate and meaningful\n"
        "- **tag_completeness** — All relevant tags are captured\n"
        "- **tag_specificity** — Tags are specific enough to be useful for search/filtering\n"
        "- **coverage** — All moments are classified"
    ),
    5: (
        "- **structural** — Section naming, count (3-6), paragraph depth (2-5 per section)\n"
        "- **content_specificity** — Concrete details: frequencies, time values, ratios, plugin names, dB values\n"
        "- **voice_preservation** — Direct quotes preserved, opinions attributed to creator by name, personality retained\n"
        "- **readability** — Cohesive article flow, related info merged, no redundancy or contradiction\n"
        "- **factual_fidelity** — Every claim traceable to source material, no hallucinated specifics"
    ),
 }
 # Legacy default format markers for stage 5
 _FORMAT_MARKERS = ["SynthesisResult", '"pages"', "body_sections", "title", "summary"]
 class PromptVariantGenerator:
    """Generates prompt variants by asking an LLM to act as a prompt engineer.
    Given a base prompt and its evaluation scores, produces N mutated
    variants targeting the weakest dimensions.
    """
    def __init__(self, client: LLMClient) -> None:
        self.client = client
    def generate(
        self,
        base_prompt: str,
        scores: ScoreResult,
        n: int = 2,
        *,
        format_markers: Sequence[str] | None = None,
        stage: int = 5,
    ) -> list[str]:
        """Generate up to *n* valid prompt variants.
        Each variant is produced by a separate LLM call with the meta-prompt.
        Variants are validated: they must differ from the base by ≥50 characters
        and must contain the JSON format instruction markers found in the base.
        Invalid variants are logged and skipped.
        Parameters
        ----------
        base_prompt:
            The current best prompt text for the target stage.
        scores:
            ScoreResult from the most recent evaluation of *base_prompt*.
        n:
            Number of variants to attempt generating.
        format_markers:
            Override format markers for validation.  When *None*, uses the
            markers from ``STAGE_CONFIGS[stage]`` (falling back to stage 5
            defaults for backward compat).
        stage:
            Pipeline stage number (2-5), used to select dimension
            descriptions for the meta-prompt and default format markers.
        Returns
        -------
        list[str]
            Valid variant prompt strings (may be fewer than *n*).
        """
        # Resolve format markers and dimensions for the target stage
        if format_markers is not None:
            markers = list(format_markers)
        elif stage in STAGE_CONFIGS:
            markers = STAGE_CONFIGS[stage].format_markers
        else:
            markers = _FORMAT_MARKERS
        dimensions = STAGE_CONFIGS[stage].dimensions if stage in STAGE_CONFIGS else DIMENSIONS
        # Build the system prompt with stage-appropriate dimension descriptions
        dim_desc = _DIMENSION_DESCRIPTIONS.get(stage, _DIMENSION_DESCRIPTIONS[5])
        system_prompt = VARIANT_META_PROMPT.format(dimension_descriptions=dim_desc)
        user_prompt = self._build_user_prompt(base_prompt, scores, dimensions)
        # Identify which format markers are actually present in the base
        required_markers = [m for m in markers if m in base_prompt]
        variants: list[str] = []
        for i in range(n):
            logger.info("Generating variant %d/%d (stage %d)...", i + 1, n, stage)
            try:
                raw = self.client.complete(
                    system_prompt=system_prompt,
                    user_prompt=user_prompt,
                    response_model=None,  # free-form text, not JSON
                    modality="chat",
                )
                variant = str(raw).strip()
            except Exception:
                logger.exception("LLM error generating variant %d/%d", i + 1, n)
                continue
            # Validate the variant
            if not self._validate(variant, base_prompt, required_markers, i + 1):
                continue
            variants.append(variant)
            logger.info("Variant %d/%d accepted (%d chars)", i + 1, n, len(variant))
        logger.info(
            "Generated %d valid variant(s) out of %d attempts", len(variants), n
        )
        return variants
    # ── Internal helpers ──────────────────────────────────────────────────
    def _build_user_prompt(self, base_prompt: str, scores: ScoreResult, dimensions: list[str] | None = None) -> str:
        """Build the user message describing the current prompt and its scores."""
        dims = dimensions or DIMENSIONS
        # Build per-dimension score lines, sorted worst-first
        dim_lines: list[str] = []
        dim_scores = [(d, scores.scores.get(d, 0.0)) for d in dims]
        dim_scores.sort(key=lambda x: x[1])
        for dim, val in dim_scores:
            justification = scores.justifications.get(dim, "")
            label = dim.replace("_", " ").title()
            line = f"  {label}: {val:.2f}"
            if justification:
                line += f"  — {justification}"
            dim_lines.append(line)
        weakest = dim_scores[0][0].replace("_", " ").title()
        second_weakest = dim_scores[1][0].replace("_", " ").title() if len(dim_scores) > 1 else weakest
        return (
            f"## Current Prompt\n\n{base_prompt}\n\n"
            f"## Evaluation Scores (sorted weakest → strongest)\n\n"
            + "\n".join(dim_lines)
            + f"\n\n  Composite: {scores.composite:.3f}\n\n"
            f"## Priority\n\n"
            f"The weakest dimensions are **{weakest}** and **{second_weakest}**. "
            f"Focus your prompt modifications on improving these.\n\n"
            f"Return the full modified prompt now."
        )
    def _validate(
        self,
        variant: str,
        base_prompt: str,
        required_markers: list[str],
        index: int,
    ) -> bool:
        """Check a variant meets minimum quality gates."""
        if not variant:
            logger.warning("Variant %d is empty — skipping", index)
            return False
        # Must differ meaningfully from base
        diff = abs(len(variant) - len(base_prompt))
        # Also check actual content difference via set-symmetric-difference of lines
        base_lines = set(base_prompt.splitlines())
        variant_lines = set(variant.splitlines())
        changed_lines = len(base_lines.symmetric_difference(variant_lines))
        if diff < 50 and changed_lines < 3:
            logger.warning(
                "Variant %d too similar to base (len_diff=%d, changed_lines=%d) — skipping",
                index, diff, changed_lines,
            )
            return False
        # Must preserve format markers
        missing = [m for m in required_markers if m not in variant]
        if missing:
            logger.warning(
                "Variant %d missing format markers %s — skipping",
                index, missing,
            )
            return False
        return True
--- a/backend/pipeline/quality/voice_dial.py
+++ b/backend/pipeline/quality/voice_dial.py
@ -0,0 +1,91 @@
 """Voice preservation dial — modifies Stage 5 synthesis prompt by intensity band.
 Three bands control how much of the creator's original voice is preserved:
  - Low  (0.0–0.33): Clinical, encyclopedic tone — suppress direct quotes
  - Mid  (0.34–0.66): Base prompt unchanged (already ~0.6 voice preservation)
  - High (0.67–1.0): Maximum voice — prioritize exact words, strong opinions
 """
 from __future__ import annotations
 # ── Band modifier text ────────────────────────────────────────────────────────
 _LOW_BAND_MODIFIER = """
 ## Voice Suppression Override
 IMPORTANT — override the voice/tone guidelines above. For this synthesis:
 - Do NOT include any direct quotes from the creator. Rephrase all insights in neutral third-person encyclopedic style.
 - Do NOT attribute opinions or preferences to the creator by name (avoid "he recommends", "she prefers").
 - Remove all personality markers, humor, strong opinions, and conversational tone.
 - Write as a reference manual: factual, impersonal, technically precise.
 - Replace phrases like "he warns against" with neutral statements like "this approach is generally avoided because."
 - Suppress colloquialisms and informal language entirely.
 """
 _HIGH_BAND_MODIFIER = """
 ## Maximum Voice Preservation Override
 IMPORTANT — amplify the voice/tone guidelines above. For this synthesis:
 - Maximize the use of direct quotes from the transcript. Every memorable phrase, vivid metaphor, or strong opinion should be quoted verbatim with quotation marks.
 - Attribute all insights, preferences, and techniques to the creator by name — use their name frequently.
 - Preserve personality, humor, strong opinions, and conversational tone. If the creator is emphatic, the prose should feel emphatic.
 - Prioritize the creator's exact words over paraphrase. When a transcript excerpt contains a usable phrase, quote it rather than summarizing it.
 - Include warnings, caveats, and opinionated asides in the creator's own voice.
 - The resulting page should feel like the creator is speaking directly to the reader through the text.
 """
 # ── VoiceDial class ───────────────────────────────────────────────────────────
 class VoiceDial:
    """Modifies a Stage 5 synthesis prompt based on a voice_level parameter.
    Parameters
    ----------
    base_prompt:
        The original stage5_synthesis.txt system prompt content.
    """
    # Band boundaries
    LOW_UPPER = 0.33
    HIGH_LOWER = 0.67
    def __init__(self, base_prompt: str) -> None:
        self.base_prompt = base_prompt
    def modify(self, voice_level: float) -> str:
        """Return the system prompt modified for the given voice_level.
        Parameters
        ----------
        voice_level:
            Float 0.0–1.0. Values outside this range are clamped.
        Returns
        -------
        str
            Modified system prompt with band-appropriate instructions appended.
        """
        voice_level = max(0.0, min(1.0, voice_level))
        if voice_level <= self.LOW_UPPER:
            return self.base_prompt + _LOW_BAND_MODIFIER
        elif voice_level >= self.HIGH_LOWER:
            return self.base_prompt + _HIGH_BAND_MODIFIER
        else:
            # Mid band — base prompt is already moderate voice preservation
            return self.base_prompt
    @staticmethod
    def band_name(voice_level: float) -> str:
        """Return the human-readable band name for a voice_level value."""
        voice_level = max(0.0, min(1.0, voice_level))
        if voice_level <= VoiceDial.LOW_UPPER:
            return "low"
        elif voice_level >= VoiceDial.HIGH_LOWER:
            return "high"
        return "mid"
--- a/backend/pipeline/schemas.py
+++ b/backend/pipeline/schemas.py
@ -0,0 +1,125 @@
 """Pydantic schemas for pipeline stage inputs and outputs.
 Stage 2 — Segmentation: groups transcript segments by topic.
 Stage 3 — Extraction: extracts key moments from segments.
 Stage 4 — Classification: classifies moments by category/tags.
 Stage 5 — Synthesis: generates technique pages from classified moments.
 """
 from __future__ import annotations
 from pydantic import BaseModel, Field
 # ── Stage 2: Segmentation ───────────────────────────────────────────────────
 class TopicSegment(BaseModel):
    """A contiguous group of transcript segments sharing a topic."""
    start_index: int = Field(description="First transcript segment index in this group")
    end_index: int = Field(description="Last transcript segment index in this group (inclusive)")
    topic_label: str = Field(description="Short label describing the topic")
    summary: str = Field(description="Brief summary of what is discussed")
 class SegmentationResult(BaseModel):
    """Full output of stage 2 (segmentation)."""
    segments: list[TopicSegment]
 # ── Stage 3: Extraction ─────────────────────────────────────────────────────
 class ExtractedMoment(BaseModel):
    """A single key moment extracted from a topic segment group."""
    title: str = Field(description="Concise title for the moment")
    summary: str = Field(description="Detailed summary of the technique/concept")
    start_time: float = Field(description="Start time in seconds")
    end_time: float = Field(description="End time in seconds")
    content_type: str = Field(description="One of: technique, settings, reasoning, workflow")
    plugins: list[str] = Field(default_factory=list, description="Plugins/tools mentioned")
    raw_transcript: str = Field(default="", description="Raw transcript text for this moment")
 class ExtractionResult(BaseModel):
    """Full output of stage 3 (extraction)."""
    moments: list[ExtractedMoment]
 # ── Stage 4: Classification ─────────────────────────────────────────────────
 class ClassifiedMoment(BaseModel):
    """Classification metadata for a single extracted moment."""
    moment_index: int = Field(description="Index into ExtractionResult.moments")
    topic_category: str = Field(description="High-level topic category")
    topic_tags: list[str] = Field(default_factory=list, description="Specific topic tags")
    content_type_override: str | None = Field(
        default=None,
        description="Override for content_type if classification disagrees with extraction",
    )
 class ClassificationResult(BaseModel):
    """Full output of stage 4 (classification)."""
    classifications: list[ClassifiedMoment]
 # ── Stage 5: Synthesis ───────────────────────────────────────────────────────
 class BodySubSection(BaseModel):
    """An H3-level subsection within a body section."""
    heading: str = Field(description="H3 subsection heading")
    content: str = Field(description="Subsection body text (may contain [N] citation markers)")
 class BodySection(BaseModel):
    """An H2-level section of a technique page body."""
    heading: str = Field(description="H2 section heading")
    content: str = Field(description="Section body text (may contain [N] citation markers)")
    subsections: list[BodySubSection] = Field(
        default_factory=list,
        description="Optional H3-level subsections",
    )
 class SynthesizedPage(BaseModel):
    """A technique page synthesized from classified moments."""
    title: str = Field(description="Page title")
    slug: str = Field(description="URL-safe slug")
    topic_category: str = Field(description="Primary topic category")
    topic_tags: list[str] = Field(default_factory=list, description="Associated tags")
    summary: str = Field(description="Page summary / overview paragraph")
    body_sections: list[BodySection] = Field(
        default_factory=list,
        description="Structured body content as H2 sections with optional H3 subsections",
    )
    body_sections_format: str = Field(
        default="v2",
        description="Schema version for body_sections ('v2' = list[BodySection])",
    )
    signal_chains: list[dict] = Field(
        default_factory=list,
        description="Signal chain descriptions (for audio/music production contexts)",
    )
    plugins: list[str] = Field(default_factory=list, description="Plugins/tools referenced")
    source_quality: str = Field(
        default="mixed",
        description="One of: structured, mixed, unstructured",
    )
    moment_indices: list[int] = Field(
        default_factory=list,
        description="Indices of source moments (from the input list) that this page covers",
    )
 class SynthesisResult(BaseModel):
    """Full output of stage 5 (synthesis)."""
    pages: list[SynthesizedPage]
--- a/backend/pipeline/shorts_generator.py
+++ b/backend/pipeline/shorts_generator.py
@ -0,0 +1,222 @@
 """FFmpeg clip extraction with format presets for shorts generation.
 Pure functions — no DB access, no Celery dependency. Tested independently.
 """
 from __future__ import annotations
 import logging
 import subprocess
 from dataclasses import dataclass
 from pathlib import Path
 from models import FormatPreset
 logger = logging.getLogger(__name__)
 FFMPEG_TIMEOUT_SECS = 300
@dataclass(frozen=True)
 class PresetSpec:
    """Resolution and ffmpeg video filter for a format preset."""
    width: int
    height: int
    vf_filter: str
 PRESETS: dict[FormatPreset, PresetSpec] = {
    FormatPreset.vertical: PresetSpec(
        width=1080,
        height=1920,
        vf_filter="scale=1080:-2,pad=1080:1920:(ow-iw)/2:(oh-ih)/2:black",
    ),
    FormatPreset.square: PresetSpec(
        width=1080,
        height=1080,
        vf_filter="crop=min(iw\\,ih):min(iw\\,ih),scale=1080:1080",
    ),
    FormatPreset.horizontal: PresetSpec(
        width=1920,
        height=1080,
        vf_filter="scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2:black",
    ),
 }
 def resolve_video_path(video_source_root: str, file_path: str) -> Path:
    """Join root + relative path and validate the file exists.
    Args:
        video_source_root: Base directory for video files (e.g. /videos).
        file_path: Relative path stored in SourceVideo.file_path.
    Returns:
        Resolved absolute Path.
    Raises:
        FileNotFoundError: If the resolved path doesn't exist or isn't a file.
    """
    resolved = Path(video_source_root) / file_path
    if not resolved.is_file():
        raise FileNotFoundError(
            f"Video file not found: {resolved} "
            f"(root={video_source_root!r}, relative={file_path!r})"
        )
    return resolved
 def extract_clip(
    input_path: Path | str,
    output_path: Path | str,
    start_secs: float,
    end_secs: float,
    vf_filter: str,
    ass_path: Path | str | None = None,
 ) -> None:
    """Extract a clip from a video file using ffmpeg.
    Seeks to *start_secs*, encodes until *end_secs*, and applies *vf_filter*.
    Uses ``-c:v libx264 -preset fast -crf 23`` for reasonable quality/speed.
    When *ass_path* is provided, the ASS subtitle filter is appended to the
    video filter chain so that captions are burned into the output video.
    Args:
        input_path: Source video file.
        output_path: Destination mp4 file (parent dir must exist).
        start_secs: Start time in seconds.
        end_secs: End time in seconds.
        vf_filter: ffmpeg ``-vf`` filter string.
        ass_path: Optional path to an ASS subtitle file. When provided,
            ``ass=<path>`` is appended to the filter chain.
    Raises:
        subprocess.CalledProcessError: If ffmpeg exits non-zero.
        subprocess.TimeoutExpired: If ffmpeg exceeds the timeout.
        ValueError: If start >= end.
    """
    duration = end_secs - start_secs
    if duration <= 0:
        raise ValueError(
            f"Invalid clip range: start={start_secs}s end={end_secs}s "
            f"(duration={duration}s)"
        )
    # Build the video filter chain — ASS burn-in comes after scale/pad
    effective_vf = vf_filter
    if ass_path is not None:
        # Escape colons and backslashes in the path for ffmpeg filter syntax
        escaped = str(ass_path).replace("\\", "\\\\").replace(":", "\\:")
        effective_vf = f"{vf_filter},ass={escaped}"
    cmd = [
        "ffmpeg",
        "-y",                          # overwrite output
        "-ss", str(start_secs),        # seek before input (fast)
        "-i", str(input_path),
        "-t", str(duration),
        "-vf", effective_vf,
        "-c:v", "libx264",
        "-preset", "fast",
        "-crf", "23",
        "-c:a", "aac",
        "-b:a", "128k",
        "-movflags", "+faststart",     # web-friendly mp4
        str(output_path),
    ]
    logger.info(
        "ffmpeg: extracting %.1fs clip from %s → %s",
        duration, input_path, output_path,
    )
    result = subprocess.run(
        cmd,
        capture_output=True,
        timeout=FFMPEG_TIMEOUT_SECS,
    )
    if result.returncode != 0:
        stderr_text = result.stderr.decode("utf-8", errors="replace")[-2000:]
        logger.error("ffmpeg failed (rc=%d): %s", result.returncode, stderr_text)
        raise subprocess.CalledProcessError(
            result.returncode, cmd, output=result.stdout, stderr=result.stderr,
        )
 def extract_clip_with_template(
    input_path: Path | str,
    output_path: Path | str,
    start_secs: float,
    end_secs: float,
    vf_filter: str,
    ass_path: Path | str | None = None,
    intro_path: Path | str | None = None,
    outro_path: Path | str | None = None,
 ) -> None:
    """Extract a clip and optionally prepend/append intro/outro cards.
    If neither intro nor outro is provided, delegates directly to
    :func:`extract_clip`. When cards are provided, the main clip is
    extracted to a temp file, then all segments are concatenated via
    :func:`~pipeline.card_renderer.concat_segments`.
    Args:
        input_path: Source video file.
        output_path: Final destination mp4 file.
        start_secs: Start time in seconds.
        end_secs: End time in seconds.
        vf_filter: ffmpeg ``-vf`` filter string.
        ass_path: Optional ASS subtitle file path.
        intro_path: Optional intro card mp4 path.
        outro_path: Optional outro card mp4 path.
    Raises:
        subprocess.CalledProcessError: If any ffmpeg command fails.
        ValueError: If clip range is invalid.
    """
    has_cards = intro_path is not None or outro_path is not None
    if not has_cards:
        # No template cards — simple extraction
        extract_clip(
            input_path=input_path,
            output_path=output_path,
            start_secs=start_secs,
            end_secs=end_secs,
            vf_filter=vf_filter,
            ass_path=ass_path,
        )
        return
    # Extract main clip to a temp file for concatenation
    main_clip_path = Path(str(output_path) + ".main.mp4")
    try:
        extract_clip(
            input_path=input_path,
            output_path=main_clip_path,
            start_secs=start_secs,
            end_secs=end_secs,
            vf_filter=vf_filter,
            ass_path=ass_path,
        )
        # Build segment list in order: intro → main → outro
        segments: list[Path] = []
        if intro_path is not None:
            segments.append(Path(intro_path))
        segments.append(main_clip_path)
        if outro_path is not None:
            segments.append(Path(outro_path))
        from pipeline.card_renderer import concat_segments
        concat_segments(segments=segments, output_path=Path(output_path))
    finally:
        # Clean up temp main clip
        if main_clip_path.exists():
            try:
                main_clip_path.unlink()
            except OSError:
                pass
--- a/backend/pipeline/stages.py
+++ b/backend/pipeline/stages.py
--- a/backend/pipeline/test_caption_generator.py
+++ b/backend/pipeline/test_caption_generator.py
@ -0,0 +1,159 @@
 """Unit tests for caption_generator module."""
 from __future__ import annotations
 import re
 import tempfile
 from pathlib import Path
 import pytest
 from pipeline.caption_generator import (
    DEFAULT_STYLE,
    _format_ass_time,
    generate_ass_captions,
    write_ass_file,
 )
 # ── Fixtures ─────────────────────────────────────────────────────────────────
@pytest.fixture
 def sample_word_timings() -> list[dict]:
    """Realistic word timings as produced by extract_word_timings."""
    return [
        {"word": "This", "start": 10.0, "end": 10.3},
        {"word": "is", "start": 10.3, "end": 10.5},
        {"word": "a", "start": 10.5, "end": 10.6},
        {"word": "test", "start": 10.6, "end": 11.0},
        {"word": "sentence", "start": 11.1, "end": 11.6},
    ]
 # ── Time formatting ─────────────────────────────────────────────────────────
 class TestFormatAssTime:
    def test_zero(self):
        assert _format_ass_time(0.0) == "0:00:00.00"
    def test_sub_second(self):
        assert _format_ass_time(0.5) == "0:00:00.50"
    def test_minutes(self):
        assert _format_ass_time(65.5) == "0:01:05.50"
    def test_hours(self):
        assert _format_ass_time(3661.25) == "1:01:01.25"
    def test_negative_clamps_to_zero(self):
        assert _format_ass_time(-5.0) == "0:00:00.00"
 # ── ASS generation ──────────────────────────────────────────────────────────
 class TestGenerateAssCaptions:
    def test_empty_timings_returns_header_only(self):
        result = generate_ass_captions([], clip_start=0.0)
        assert "[Script Info]" in result
        assert "[Events]" in result
        # No Dialogue lines
        assert "Dialogue:" not in result
    def test_structure_has_required_sections(self, sample_word_timings):
        result = generate_ass_captions(sample_word_timings, clip_start=10.0)
        assert "[Script Info]" in result
        assert "[V4+ Styles]" in result
        assert "[Events]" in result
        assert "Dialogue:" in result
    def test_clip_offset_applied(self, sample_word_timings):
        """Word at t=10.5 with clip_start=10.0 should become t=0.5 in ASS."""
        result = generate_ass_captions(sample_word_timings, clip_start=10.0)
        lines = result.strip().split("\n")
        dialogue_lines = [l for l in lines if l.startswith("Dialogue:")]
        # First word "This" starts at 10.0, clip_start=10.0 → relative 0.0
        assert dialogue_lines[0].startswith("Dialogue: 0,0:00:00.00,")
        # Third word "a" starts at 10.5, clip_start=10.0 → relative 0.5
        assert "0:00:00.50" in dialogue_lines[2]
    def test_karaoke_tags_present(self, sample_word_timings):
        result = generate_ass_captions(sample_word_timings, clip_start=10.0)
        lines = result.strip().split("\n")
        dialogue_lines = [l for l in lines if l.startswith("Dialogue:")]
        for line in dialogue_lines:
            # Each line should have a \kN tag
            assert re.search(r"\{\\k\d+\}", line), f"Missing karaoke tag in: {line}"
    def test_karaoke_duration_math(self, sample_word_timings):
        """Word "This" at [10.0, 10.3] → 0.3s → k30 (30 centiseconds)."""
        result = generate_ass_captions(sample_word_timings, clip_start=10.0)
        lines = result.strip().split("\n")
        dialogue_lines = [l for l in lines if l.startswith("Dialogue:")]
        # "This" duration: 10.3 - 10.0 = 0.3s = 30cs
        assert "{\\k30}This" in dialogue_lines[0]
        # "test" duration: 11.0 - 10.6 = 0.4s = 40cs
        assert "{\\k40}test" in dialogue_lines[3]
    def test_word_count_matches(self, sample_word_timings):
        result = generate_ass_captions(sample_word_timings, clip_start=10.0)
        lines = result.strip().split("\n")
        dialogue_lines = [l for l in lines if l.startswith("Dialogue:")]
        assert len(dialogue_lines) == 5
    def test_empty_word_text_skipped(self):
        timings = [
            {"word": "hello", "start": 0.0, "end": 0.5},
            {"word": "  ", "start": 0.5, "end": 0.7},  # whitespace-only
            {"word": "", "start": 0.7, "end": 0.8},     # empty
            {"word": "world", "start": 0.8, "end": 1.2},
        ]
        result = generate_ass_captions(timings, clip_start=0.0)
        lines = result.strip().split("\n")
        dialogue_lines = [l for l in lines if l.startswith("Dialogue:")]
        assert len(dialogue_lines) == 2  # only "hello" and "world"
    def test_custom_style_overrides(self, sample_word_timings):
        result = generate_ass_captions(
            sample_word_timings,
            clip_start=10.0,
            style_config={"font_size": 72, "font_name": "Roboto"},
        )
        assert "Roboto" in result
        assert ",72," in result
    def test_negative_relative_time_clamped(self):
        """Words before clip_start should clamp to 0."""
        timings = [{"word": "early", "start": 5.0, "end": 5.5}]
        result = generate_ass_captions(timings, clip_start=10.0)
        lines = [l for l in result.strip().split("\n") if l.startswith("Dialogue:")]
        # Both start and end clamped to 0
        assert lines[0].startswith("Dialogue: 0,0:00:00.00,0:00:00.00,")
 # ── File writing ─────────────────────────────────────────────────────────────
 class TestWriteAssFile:
    def test_writes_content(self):
        content = "[Script Info]\ntest content\n"
        with tempfile.TemporaryDirectory() as td:
            out = write_ass_file(content, Path(td) / "sub.ass")
            assert out.exists()
            assert out.read_text() == content
    def test_creates_parent_dirs(self):
        content = "test"
        with tempfile.TemporaryDirectory() as td:
            out = write_ass_file(content, Path(td) / "nested" / "deep" / "sub.ass")
            assert out.exists()
    def test_returns_path(self):
        content = "test"
        with tempfile.TemporaryDirectory() as td:
            target = Path(td) / "sub.ass"
            result = write_ass_file(content, target)
            assert result == target
--- a/backend/pipeline/test_card_renderer.py
+++ b/backend/pipeline/test_card_renderer.py
@ -0,0 +1,365 @@
 """Tests for card_renderer: ffmpeg card generation and concat pipeline.
 Tests verify command construction, concat list file format, and template
 config parsing — no actual ffmpeg execution required.
 """
 from __future__ import annotations
 import subprocess
 import textwrap
 from pathlib import Path
 from unittest.mock import MagicMock, patch
 import pytest
 from pipeline.card_renderer import (
    DEFAULT_ACCENT_COLOR,
    DEFAULT_FONT_FAMILY,
    DEFAULT_INTRO_DURATION,
    DEFAULT_OUTRO_DURATION,
    build_concat_list,
    concat_segments,
    parse_template_config,
    render_card,
    render_card_to_file,
 )
 # ── render_card tests ────────────────────────────────────────────────────────
 class TestRenderCard:
    """Tests for render_card() ffmpeg command generation."""
    def test_returns_list_of_strings(self):
        cmd = render_card("Hello", 2.0, 1080, 1920)
        assert isinstance(cmd, list)
        assert all(isinstance(s, str) for s in cmd)
    def test_contains_ffmpeg_and_lavfi(self):
        cmd = render_card("Test", 3.0, 1080, 1920)
        assert cmd[0] == "ffmpeg"
        assert "-f" in cmd
        lavfi_idx = cmd.index("-f")
        assert cmd[lavfi_idx + 1] == "lavfi"
    def test_contains_correct_dimensions_in_filtergraph(self):
        cmd = render_card("Test", 2.0, 1920, 1080)
        # The filtergraph is the arg after -i for lavfi
        filtergraph = None
        for i, arg in enumerate(cmd):
            if arg == "-i" and i > 0 and cmd[i - 1] == "lavfi":
                filtergraph = cmd[i + 1]
                break
        assert filtergraph is not None
        assert "s=1920x1080" in filtergraph
    def test_contains_duration_in_filtergraph(self):
        cmd = render_card("Test", 5.5, 1080, 1920)
        filtergraph = None
        for i, arg in enumerate(cmd):
            if arg == "-i" and i > 0 and cmd[i - 1] == "lavfi":
                filtergraph = cmd[i + 1]
                break
        assert "d=5.5" in filtergraph
    def test_contains_drawtext_with_text(self):
        cmd = render_card("My Creator", 2.0, 1080, 1920)
        filtergraph = None
        for i, arg in enumerate(cmd):
            if arg == "-i" and i > 0 and cmd[i - 1] == "lavfi":
                filtergraph = cmd[i + 1]
                break
        assert "drawtext=" in filtergraph
        assert "My Creator" in filtergraph
    def test_codec_settings(self):
        cmd = render_card("Test", 2.0, 1080, 1920)
        assert "-c:v" in cmd
        assert "libx264" in cmd
        assert "-c:a" in cmd
        assert "aac" in cmd
    def test_silent_audio_track(self):
        """Card includes anullsrc so concat with audio segments works."""
        cmd = render_card("Test", 2.0, 1080, 1920)
        # Should have a second -f lavfi -i anullsrc input
        cmd_str = " ".join(cmd)
        assert "anullsrc" in cmd_str
    def test_rejects_zero_duration(self):
        with pytest.raises(ValueError, match="positive"):
            render_card("Test", 0, 1080, 1920)
    def test_rejects_negative_duration(self):
        with pytest.raises(ValueError, match="positive"):
            render_card("Test", -1.0, 1080, 1920)
    def test_rejects_zero_dimensions(self):
        with pytest.raises(ValueError, match="positive"):
            render_card("Test", 2.0, 0, 1920)
    def test_custom_accent_color(self):
        cmd = render_card("Test", 2.0, 1080, 1920, accent_color="#ff0000")
        filtergraph = None
        for i, arg in enumerate(cmd):
            if arg == "-i" and i > 0 and cmd[i - 1] == "lavfi":
                filtergraph = cmd[i + 1]
                break
        assert "#ff0000" in filtergraph
    def test_escapes_colons_in_text(self):
        cmd = render_card("Hello: World", 2.0, 1080, 1920)
        filtergraph = None
        for i, arg in enumerate(cmd):
            if arg == "-i" and i > 0 and cmd[i - 1] == "lavfi":
                filtergraph = cmd[i + 1]
                break
        # Colons should be escaped for ffmpeg
        assert "Hello\\: World" in filtergraph
 # ── render_card_to_file tests ────────────────────────────────────────────────
 class TestRenderCardToFile:
    """Tests for render_card_to_file() — mocked ffmpeg execution."""
    @patch("pipeline.card_renderer.subprocess.run")
    def test_calls_ffmpeg_and_returns_path(self, mock_run, tmp_path):
        mock_run.return_value = MagicMock(returncode=0)
        out = tmp_path / "card.mp4"
        out.write_bytes(b"fake")  # stat().st_size needs the file
        result = render_card_to_file("Hi", 2.0, 1080, 1920, out)
        assert result == out
        mock_run.assert_called_once()
        # Output path should be the last arg
        call_args = mock_run.call_args[0][0]
        assert call_args[-1] == str(out)
    @patch("pipeline.card_renderer.subprocess.run")
    def test_raises_on_ffmpeg_failure(self, mock_run, tmp_path):
        mock_run.return_value = MagicMock(
            returncode=1,
            stderr=b"error: something failed",
        )
        out = tmp_path / "card.mp4"
        with pytest.raises(subprocess.CalledProcessError):
            render_card_to_file("Hi", 2.0, 1080, 1920, out)
 # ── build_concat_list tests ─────────────────────────────────────────────────
 class TestBuildConcatList:
    """Tests for build_concat_list() file content."""
    def test_writes_correct_format(self, tmp_path):
        seg1 = tmp_path / "intro.mp4"
        seg2 = tmp_path / "main.mp4"
        seg1.touch()
        seg2.touch()
        list_file = tmp_path / "concat.txt"
        result = build_concat_list([seg1, seg2], list_file)
        assert result == list_file
        content = list_file.read_text()
        lines = content.strip().split("\n")
        assert len(lines) == 2
        assert lines[0] == f"file '{seg1.resolve()}'"
        assert lines[1] == f"file '{seg2.resolve()}'"
    def test_three_segments(self, tmp_path):
        segs = [tmp_path / f"seg{i}.mp4" for i in range(3)]
        for s in segs:
            s.touch()
        list_file = tmp_path / "list.txt"
        build_concat_list(segs, list_file)
        content = list_file.read_text()
        lines = content.strip().split("\n")
        assert len(lines) == 3
 # ── concat_segments tests ────────────────────────────────────────────────────
 class TestConcatSegments:
    """Tests for concat_segments() — mocked ffmpeg execution."""
    @patch("pipeline.card_renderer.subprocess.run")
    def test_calls_ffmpeg_concat_demuxer(self, mock_run, tmp_path):
        mock_run.return_value = MagicMock(returncode=0)
        seg1 = tmp_path / "seg1.mp4"
        seg2 = tmp_path / "seg2.mp4"
        seg1.touch()
        seg2.touch()
        out = tmp_path / "output.mp4"
        out.write_bytes(b"fakemp4")
        result = concat_segments([seg1, seg2], out)
        assert result == out
        mock_run.assert_called_once()
        call_args = mock_run.call_args[0][0]
        assert "concat" in call_args
        assert "-safe" in call_args
        assert "0" in call_args
        assert "-c" in call_args
        assert "copy" in call_args
    def test_rejects_empty_segments(self):
        with pytest.raises(ValueError, match="empty"):
            concat_segments([], Path("/tmp/out.mp4"))
    @patch("pipeline.card_renderer.subprocess.run")
    def test_raises_on_ffmpeg_failure(self, mock_run, tmp_path):
        mock_run.return_value = MagicMock(
            returncode=1, stderr=b"concat error",
        )
        seg1 = tmp_path / "s.mp4"
        seg1.touch()
        out = tmp_path / "out.mp4"
        with pytest.raises(subprocess.CalledProcessError):
            concat_segments([seg1], out)
 # ── parse_template_config tests ──────────────────────────────────────────────
 class TestParseTemplateConfig:
    """Tests for parse_template_config() defaults and overrides."""
    def test_none_returns_all_defaults_disabled(self):
        cfg = parse_template_config(None)
        assert cfg["show_intro"] is False
        assert cfg["show_outro"] is False
        assert cfg["accent_color"] == DEFAULT_ACCENT_COLOR
        assert cfg["font_family"] == DEFAULT_FONT_FAMILY
        assert cfg["intro_duration"] == DEFAULT_INTRO_DURATION
        assert cfg["outro_duration"] == DEFAULT_OUTRO_DURATION
    def test_empty_dict_returns_defaults_disabled(self):
        cfg = parse_template_config({})
        assert cfg["show_intro"] is False
        assert cfg["show_outro"] is False
    def test_full_config_preserves_values(self):
        raw = {
            "show_intro": True,
            "intro_text": "Welcome!",
            "intro_duration": 3.0,
            "show_outro": True,
            "outro_text": "Bye!",
            "outro_duration": 1.5,
            "accent_color": "#ff0000",
            "font_family": "Roboto",
        }
        cfg = parse_template_config(raw)
        assert cfg["show_intro"] is True
        assert cfg["intro_text"] == "Welcome!"
        assert cfg["intro_duration"] == 3.0
        assert cfg["show_outro"] is True
        assert cfg["outro_text"] == "Bye!"
        assert cfg["outro_duration"] == 1.5
        assert cfg["accent_color"] == "#ff0000"
        assert cfg["font_family"] == "Roboto"
    def test_partial_config_fills_defaults(self):
        raw = {"show_intro": True, "intro_text": "Hi"}
        cfg = parse_template_config(raw)
        assert cfg["show_intro"] is True
        assert cfg["intro_text"] == "Hi"
        assert cfg["intro_duration"] == DEFAULT_INTRO_DURATION
        assert cfg["show_outro"] is False
        assert cfg["outro_text"] == ""
        assert cfg["accent_color"] == DEFAULT_ACCENT_COLOR
    def test_truthy_coercion(self):
        """Non-bool truthy values should coerce to bool."""
        cfg = parse_template_config({"show_intro": 1, "show_outro": 0})
        assert cfg["show_intro"] is True
        assert cfg["show_outro"] is False
    def test_duration_coercion_from_int(self):
        cfg = parse_template_config({"intro_duration": 5})
        assert cfg["intro_duration"] == 5.0
        assert isinstance(cfg["intro_duration"], float)
 # ── extract_clip_with_template tests ─────────────────────────────────────────
 class TestExtractClipWithTemplate:
    """Tests for the shorts_generator.extract_clip_with_template function."""
    @patch("pipeline.shorts_generator.extract_clip")
    def test_no_cards_delegates_to_extract_clip(self, mock_extract):
        from pipeline.shorts_generator import extract_clip_with_template
        extract_clip_with_template(
            input_path=Path("/fake/input.mp4"),
            output_path=Path("/fake/output.mp4"),
            start_secs=10.0,
            end_secs=20.0,
            vf_filter="scale=1080:-2",
        )
        mock_extract.assert_called_once()
    @patch("pipeline.card_renderer.concat_segments")
    @patch("pipeline.shorts_generator.extract_clip")
    def test_with_intro_concats_two_segments(self, mock_extract, mock_concat, tmp_path):
        from pipeline.shorts_generator import extract_clip_with_template
        intro = tmp_path / "intro.mp4"
        intro.touch()
        out = tmp_path / "final.mp4"
        main_tmp = Path(str(out) + ".main.mp4")
        # Create the main clip temp file so cleanup doesn't error
        main_tmp.touch()
        mock_concat.return_value = out
        extract_clip_with_template(
            input_path=Path("/fake/input.mp4"),
            output_path=out,
            start_secs=10.0,
            end_secs=20.0,
            vf_filter="scale=1080:-2",
            intro_path=intro,
        )
        mock_extract.assert_called_once()
        mock_concat.assert_called_once()
        # Segments should be [intro, main_clip]
        segments = mock_concat.call_args[1]["segments"]
        assert len(segments) == 2
        assert segments[0] == intro
    @patch("pipeline.card_renderer.concat_segments")
    @patch("pipeline.shorts_generator.extract_clip")
    def test_with_intro_and_outro_concats_three_segments(
        self, mock_extract, mock_concat, tmp_path,
    ):
        from pipeline.shorts_generator import extract_clip_with_template
        intro = tmp_path / "intro.mp4"
        outro = tmp_path / "outro.mp4"
        intro.touch()
        outro.touch()
        out = tmp_path / "final.mp4"
        main_tmp = Path(str(out) + ".main.mp4")
        main_tmp.touch()
        mock_concat.return_value = out
        extract_clip_with_template(
            input_path=Path("/fake/input.mp4"),
            output_path=out,
            start_secs=10.0,
            end_secs=20.0,
            vf_filter="scale=1080:-2",
            intro_path=intro,
            outro_path=outro,
        )
        segments = mock_concat.call_args[1]["segments"]
        assert len(segments) == 3
        assert segments[0] == intro
        assert segments[2] == outro
--- a/backend/pipeline/test_citation_utils.py
+++ b/backend/pipeline/test_citation_utils.py
@ -0,0 +1,108 @@
 """Unit tests for citation extraction and validation utilities."""
 from __future__ import annotations
 import pytest
 from pipeline.citation_utils import extract_citations, validate_citations
 from pipeline.schemas import BodySection, BodySubSection
 # ── extract_citations ────────────────────────────────────────────────────────
 class TestExtractCitations:
    def test_single_markers(self):
        assert extract_citations("This uses reverb [0] and delay [2].") == [0, 2]
    def test_multi_marker(self):
        assert extract_citations("Combined approach [0,2] works well.") == [0, 2]
    def test_multi_marker_with_spaces(self):
        assert extract_citations("See [1, 3, 5] for details.") == [1, 3, 5]
    def test_no_citations(self):
        assert extract_citations("Plain text without citations.") == []
    def test_duplicate_indices_deduplicated(self):
        assert extract_citations("[1] and again [1] and [1,2]") == [1, 2]
    def test_returns_sorted(self):
        assert extract_citations("[5] then [1] then [3]") == [1, 3, 5]
    def test_adjacent_markers(self):
        assert extract_citations("[0][1][2]") == [0, 1, 2]
    def test_does_not_match_non_numeric_brackets(self):
        assert extract_citations("[abc] and [N] but [7] works") == [7]
 # ── validate_citations ──────────────────────────────────────────────────────
 def _make_sections(texts: list[str], sub_texts: list[list[str]] | None = None) -> list[BodySection]:
    """Helper to build BodySection lists for testing."""
    sections = []
    for i, text in enumerate(texts):
        subs = []
        if sub_texts and i < len(sub_texts):
            subs = [BodySubSection(heading=f"Sub {j}", content=t) for j, t in enumerate(sub_texts[i])]
        sections.append(BodySection(heading=f"Section {i}", content=text, subsections=subs))
    return sections
 class TestValidateCitations:
    def test_all_moments_cited(self):
        sections = _make_sections(["Uses [0] and [1].", "Also [2]."])
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is True
        assert result["total_citations"] == 3
        assert result["invalid_indices"] == []
        assert result["uncited_moments"] == []
        assert result["coverage_pct"] == 100.0
    def test_out_of_range_index(self):
        sections = _make_sections(["Reference [0] and [5]."])
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is False
        assert result["invalid_indices"] == [5]
        assert result["uncited_moments"] == [1, 2]
    def test_multi_citation_markers(self):
        sections = _make_sections(["Combined [0,2] technique."])
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is False  # moment 1 uncited
        assert result["total_citations"] == 2
        assert result["uncited_moments"] == [1]
        assert result["coverage_pct"] == pytest.approx(66.7, abs=0.1)
    def test_no_citations_at_all(self):
        sections = _make_sections(["Plain text with no markers."])
        result = validate_citations(sections, moment_count=2)
        assert result["valid"] is False
        assert result["total_citations"] == 0
        assert result["uncited_moments"] == [0, 1]
        assert result["coverage_pct"] == 0.0
    def test_empty_sections(self):
        result = validate_citations([], moment_count=0)
        assert result["valid"] is True
        assert result["total_citations"] == 0
        assert result["coverage_pct"] == 0.0
    def test_subsection_citations_counted(self):
        sections = _make_sections(
            ["Section text [0]."],
            sub_texts=[["Subsection cites [1] and [2]."]],
        )
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is True
        assert result["total_citations"] == 3
    def test_zero_moment_count_with_citations(self):
        """Citations exist but moment_count is 0 — all indices are out of range."""
        sections = _make_sections(["References [0] and [1]."])
        result = validate_citations(sections, moment_count=0)
        assert result["valid"] is False
        assert result["invalid_indices"] == [0, 1]
        assert result["coverage_pct"] == 0.0
--- a/backend/pipeline/test_compose_pipeline.py
+++ b/backend/pipeline/test_compose_pipeline.py
@ -0,0 +1,360 @@
 """Unit tests for compose pipeline logic in stage5_synthesis.
 Covers:
 - _build_compose_user_prompt(): XML structure, offset indices, empty existing, page JSON
 - Compose-or-create branching: compose triggered vs create fallback
 - body_sections_format='v2' on persisted pages
 - TechniquePageVideo insertion via pg_insert with on_conflict_do_nothing
 """
 from __future__ import annotations
 import json
 import uuid
 from collections import namedtuple
 from unittest.mock import MagicMock, patch
 import pytest
 # ── Lightweight mock objects ─────────────────────────────────────────────────
 class _MockContentType:
    """Mimics ContentType enum with .value."""
    def __init__(self, value: str) -> None:
        self.value = value
 MockKeyMoment = namedtuple("MockKeyMoment", [
    "id", "title", "summary", "content_type", "start_time", "end_time",
    "plugins", "raw_transcript", "technique_page_id", "source_video_id",
 ])
 def _moment(
    title: str = "Test Moment",
    summary: str = "A moment.",
    content_type: str = "technique_demo",
    start_time: float = 0.0,
    end_time: float = 10.0,
    plugins: list[str] | None = None,
    raw_transcript: str | None = "Some transcript text",
    technique_page_id: uuid.UUID | None = None,
    source_video_id: uuid.UUID | None = None,
 ) -> MockKeyMoment:
    return MockKeyMoment(
        id=uuid.uuid4(),
        title=title,
        summary=summary,
        content_type=_MockContentType(content_type),
        start_time=start_time,
        end_time=end_time,
        plugins=plugins or [],
        raw_transcript=raw_transcript or "",
        technique_page_id=technique_page_id,
        source_video_id=source_video_id,
    )
 class _MockSourceQuality:
    """Mimics source_quality enum with .value."""
    def __init__(self, value: str = "high") -> None:
        self.value = value
 class MockTechniquePage:
    """Lightweight stand-in for the ORM TechniquePage."""
    def __init__(
        self,
        title: str = "Reverb Techniques",
        slug: str = "reverb-techniques",
        topic_category: str = "Sound Design",
        summary: str = "A page about reverb.",
        body_sections: list | None = None,
        signal_chains: list | None = None,
        plugins: list[str] | None = None,
        source_quality: str = "high",
        creator_id: uuid.UUID | None = None,
        body_sections_format: str | None = None,
    ):
        self.id = uuid.uuid4()
        self.title = title
        self.slug = slug
        self.topic_category = topic_category
        self.summary = summary
        self.body_sections = body_sections or [{"heading": "Overview", "content": "Intro text."}]
        self.signal_chains = signal_chains or []
        self.plugins = plugins or ["Valhalla VintageVerb"]
        self.source_quality = _MockSourceQuality(source_quality)
        self.creator_id = creator_id or uuid.uuid4()
        self.body_sections_format = body_sections_format
 def _cls_info(tags: list[str] | None = None) -> dict:
    return {"topic_category": "Sound Design", "topic_tags": tags or ["reverb", "delay"]}
 # ── Import the function under test ───────────────────────────────────────────
 # We need to patch modules before importing stages in some tests.
 # For _build_compose_user_prompt we can import directly since it's a pure function
 # that only depends on _build_moments_text.
@pytest.fixture
 def build_compose_prompt():
    """Import _build_compose_user_prompt from stages."""
    from pipeline.stages import _build_compose_user_prompt
    return _build_compose_user_prompt
 # ── Tests for _build_compose_user_prompt ─────────────────────────────────────
 class TestBuildComposeUserPrompt:
    """Tests for _build_compose_user_prompt XML structure and offset math."""
    def test_compose_prompt_xml_structure(self, build_compose_prompt):
        """Verify output contains all required XML tags."""
        page = MockTechniquePage()
        existing = [_moment(title="Existing 1")]
        new = [(_moment(title="New 1"), _cls_info())]
        result = build_compose_prompt(page, existing, new, "COPYCATT")
        assert "<existing_page>" in result
        assert "</existing_page>" in result
        assert "<existing_moments>" in result
        assert "</existing_moments>" in result
        assert "<new_moments>" in result
        assert "</new_moments>" in result
        assert "<creator>" in result
        assert "</creator>" in result
        assert "COPYCATT" in result
    def test_compose_prompt_offset_indices(self, build_compose_prompt):
        """With 3 existing + 2 new moments, new moments should use [3] and [4]."""
        page = MockTechniquePage()
        existing = [
            _moment(title=f"Existing {i}") for i in range(3)
        ]
        new = [
            (_moment(title=f"New {i}"), _cls_info()) for i in range(2)
        ]
        result = build_compose_prompt(page, existing, new, "COPYCATT")
        # New moments section should have [3] and [4]
        new_section_start = result.index("<new_moments>")
        new_section_end = result.index("</new_moments>")
        new_section = result[new_section_start:new_section_end]
        assert "[3]" in new_section
        assert "[4]" in new_section
        # Should NOT have [0], [1], [2] in the new section
        assert "[0]" not in new_section
        assert "[1]" not in new_section
        assert "[2]" not in new_section
    def test_compose_prompt_empty_existing_moments(self, build_compose_prompt):
        """0 existing moments → new moments start at [0]."""
        page = MockTechniquePage()
        existing = []
        new = [
            (_moment(title="New A"), _cls_info()),
            (_moment(title="New B"), _cls_info()),
        ]
        result = build_compose_prompt(page, existing, new, "COPYCATT")
        new_section_start = result.index("<new_moments>")
        new_section_end = result.index("</new_moments>")
        new_section = result[new_section_start:new_section_end]
        assert "[0]" in new_section
        assert "[1]" in new_section
    def test_compose_prompt_page_json(self, build_compose_prompt):
        """Existing page should be serialized as JSON within <existing_page> tags."""
        page = MockTechniquePage(title="My Page", slug="my-page", topic_category="Mixing")
        result = build_compose_prompt(page, [], [(_moment(), _cls_info())], "Creator")
        page_section_start = result.index("<existing_page>") + len("<existing_page>")
        page_section_end = result.index("</existing_page>")
        page_json_str = result[page_section_start:page_section_end].strip()
        page_dict = json.loads(page_json_str)
        assert page_dict["title"] == "My Page"
        assert page_dict["slug"] == "my-page"
        assert page_dict["topic_category"] == "Mixing"
        assert "summary" in page_dict
        assert "body_sections" in page_dict
    def test_compose_prompt_new_moment_content(self, build_compose_prompt):
        """New moments section includes title, summary, time range, and tags."""
        page = MockTechniquePage()
        m = _moment(title="Sidechain Pump", summary="How to create a sidechain pump",
                     start_time=30.0, end_time=45.5, plugins=["FabFilter Pro-C 2"])
        new = [(m, _cls_info(tags=["compression", "sidechain"]))]
        result = build_compose_prompt(page, [], new, "Creator")
        new_section_start = result.index("<new_moments>")
        new_section_end = result.index("</new_moments>")
        new_section = result[new_section_start:new_section_end]
        assert "Sidechain Pump" in new_section
        assert "How to create a sidechain pump" in new_section
        assert "30.0s" in new_section
        assert "45.5s" in new_section
        assert "FabFilter Pro-C 2" in new_section
        assert "compression" in new_section
        assert "sidechain" in new_section
 # ── Tests for compose-or-create branching ────────────────────────────────────
 class TestComposeOrCreateBranching:
    """Tests for the compose-or-create detection and branching in stage5_synthesis.
    Full integration-level mocking of stage5_synthesis is fragile (many DB queries).
    Instead, we verify:
    1. The code structure has correct branching (compose_target check → two paths)
    2. _compose_into_existing calls the LLM with compose prompt and returns parsed result
    """
    def test_compose_branch_exists_in_source(self):
        """Verify stage5 has compose detection → _compose_into_existing call path."""
        from pathlib import Path
        src = Path("backend/pipeline/stages.py").read_text()
        # The compose detection block
        assert "compose_matches = session.execute(" in src
        assert "compose_target = compose_matches[0] if compose_matches else None" in src
        # The compose branch calls _compose_into_existing
        assert "if compose_target is not None:" in src
        assert "_compose_into_existing(" in src
        # The create branch calls _synthesize_chunk
        assert "elif len(moment_group) <= chunk_size:" in src
    def test_create_branch_when_no_compose_target(self):
        """Verify the else/elif branches call _synthesize_chunk, not _compose_into_existing."""
        from pathlib import Path
        src = Path("backend/pipeline/stages.py").read_text()
        # Find the compose branch and the create branch — they're mutually exclusive
        compose_branch_idx = src.index("if compose_target is not None:")
        create_branch_idx = src.index("elif len(moment_group) <= chunk_size:")
        # The create branch must come after the compose branch (same if/elif chain)
        assert create_branch_idx > compose_branch_idx
        # _synthesize_chunk should appear in the create branch, not compose
        create_block = src[create_branch_idx:create_branch_idx + 500]
        assert "_synthesize_chunk(" in create_block
    @patch("pipeline.stages._safe_parse_llm_response")
    @patch("pipeline.stages._make_llm_callback", return_value=lambda **kw: None)
    @patch("pipeline.stages.estimate_max_tokens", return_value=4000)
    @patch("pipeline.stages._load_prompt", return_value="compose system prompt")
    def test_compose_into_existing_calls_llm(
        self, mock_load_prompt, mock_estimate, mock_callback, mock_parse,
    ):
        """_compose_into_existing calls LLM with compose prompt and returns parsed result."""
        from pipeline.schemas import SynthesisResult, SynthesizedPage
        from pipeline.stages import _compose_into_existing
        mock_llm = MagicMock()
        mock_llm.complete.return_value = "raw response"
        synth_page = SynthesizedPage(
            title="Merged Page", slug="merged-page", topic_category="Sound Design",
            summary="Merged", body_sections=[], signal_chains=[], plugins=[],
            source_quality="high", moment_indices=[0, 1],
        )
        mock_parse.return_value = SynthesisResult(pages=[synth_page])
        page = MockTechniquePage()
        existing_moments = [_moment(title="Old Moment")]
        new_moments = [(_moment(title="New Moment"), _cls_info())]
        result = _compose_into_existing(
            page, existing_moments, new_moments,
            "Sound Design", "COPYCATT", "system prompt",
            mock_llm, None, "text", 8000, str(uuid.uuid4()), None,
        )
        # LLM was called
        mock_llm.complete.assert_called_once()
        # The compose prompt template was loaded
        mock_load_prompt.assert_called_once()
        call_args = mock_load_prompt.call_args
        assert "stage5_compose" in call_args[0][0]
        # Result has the expected page
        assert len(result.pages) == 1
        assert result.pages[0].title == "Merged Page"
 # ── Tests for body_sections_format and TechniquePageVideo ────────────────────
 class TestBodySectionsFormatAndTracking:
    """Tests for body_sections_format='v2' and TechniquePageVideo insertion."""
    def test_body_sections_format_v2_set_on_page(self):
        """Verify the persist section sets body_sections_format='v2' on pages."""
        # Read stages.py source and verify the assignment exists
        from pathlib import Path
        stages_src = Path("backend/pipeline/stages.py").read_text()
        # The line `page.body_sections_format = "v2"` must appear in the persist block
        assert 'page.body_sections_format = "v2"' in stages_src, (
            "body_sections_format = 'v2' assignment not found in stages.py"
        )
    def test_technique_page_video_pg_insert(self):
        """Verify TechniquePageVideo insertion uses pg_insert with on_conflict_do_nothing."""
        from pathlib import Path
        stages_src = Path("backend/pipeline/stages.py").read_text()
        assert "pg_insert(TechniquePageVideo.__table__)" in stages_src, (
            "pg_insert(TechniquePageVideo.__table__) not found in stages.py"
        )
        assert "on_conflict_do_nothing()" in stages_src, (
            "on_conflict_do_nothing() not found in stages.py"
        )
    def test_technique_page_video_values(self):
        """Verify TechniquePageVideo INSERT includes technique_page_id and source_video_id."""
        from pathlib import Path
        stages_src = Path("backend/pipeline/stages.py").read_text()
        # Find the pg_insert block
        idx = stages_src.index("pg_insert(TechniquePageVideo.__table__)")
        block = stages_src[idx:idx + 200]
        assert "technique_page_id" in block
        assert "source_video_id" in block
 # ── Tests for category case-insensitivity ────────────────────────────────────
 class TestCategoryCaseInsensitive:
    """Verify the compose detection query uses func.lower for category matching."""
    def test_compose_detection_uses_func_lower(self):
        """The compose detection query must use func.lower on both sides."""
        from pathlib import Path
        stages_src = Path("backend/pipeline/stages.py").read_text()
        # Find the compose detection block — need enough window to capture the full query
        idx = stages_src.index("Compose-or-create detection")
        block = stages_src[idx:idx + 600]
        assert "func.lower(TechniquePage.topic_category)" in block
        assert "func.lower(category)" in block
--- a/backend/pipeline/test_harness.py
+++ b/backend/pipeline/test_harness.py
@ -0,0 +1,830 @@
 """Offline prompt test harness for Chrysopedia synthesis.
 Loads a fixture JSON (exported by export_fixture.py) and a prompt file,
 calls the LLM, and outputs the synthesized result. No Docker, no database,
 no Redis, no Celery — just prompt + fixture + LLM endpoint.
 Usage:
    python -m pipeline.test_harness \\
      --fixture fixtures/real_video_xyz.json \\
      --prompt prompts/stage5_synthesis.txt \\
      --output /tmp/result.json
    # Run all categories in a fixture:
    python -m pipeline.test_harness --fixture fixtures/video.json
    # Run a specific category only:
    python -m pipeline.test_harness --fixture fixtures/video.json --category "Sound Design"
 Exit codes: 0=success, 1=LLM error, 2=parse error, 3=fixture error
 """
 from __future__ import annotations
 import argparse
 import json
 import sys
 import time
 from collections import Counter, defaultdict
 from dataclasses import dataclass
 from pathlib import Path
 from typing import NamedTuple
 from pydantic import ValidationError
 from config import get_settings
 from pipeline.citation_utils import validate_citations
 from pipeline.llm_client import LLMClient, estimate_max_tokens
 from pipeline.schemas import SynthesizedPage, SynthesisResult
 # ── Lightweight stand-in for KeyMoment ORM model ───────────────────────────
 class _MockContentType:
    """Mimics KeyMomentContentType enum with a .value property."""
    def __init__(self, value: str) -> None:
        self.value = value
 class MockKeyMoment(NamedTuple):
    """Lightweight stand-in for the ORM KeyMoment.
    Has the same attributes that _build_moments_text() accesses:
    title, summary, content_type, start_time, end_time, plugins, raw_transcript.
    """
    title: str
    summary: str
    content_type: object  # _MockContentType
    start_time: float
    end_time: float
    plugins: list[str]
    raw_transcript: str
 def _log(tag: str, msg: str, level: str = "INFO") -> None:
    """Write structured log line to stderr."""
    print(f"[HARNESS] [{level}] {tag}: {msg}", file=sys.stderr)
 # ── Moment text builder (mirrors stages.py _build_moments_text) ────────────
 def build_moments_text(
    moment_group: list[tuple[MockKeyMoment, dict]],
    category: str,
 ) -> tuple[str, set[str]]:
    """Build the moments prompt text — matches _build_moments_text in stages.py."""
    moments_lines = []
    all_tags: set[str] = set()
    for i, (m, cls_info) in enumerate(moment_group):
        tags = cls_info.get("topic_tags", [])
        all_tags.update(tags)
        moments_lines.append(
            f"[{i}] Title: {m.title}\n"
            f"    Summary: {m.summary}\n"
            f"    Content type: {m.content_type.value}\n"
            f"    Time: {m.start_time:.1f}s - {m.end_time:.1f}s\n"
            f"    Plugins: {', '.join(m.plugins) if m.plugins else 'none'}\n"
            f"    Category: {category}\n"
            f"    Tags: {', '.join(tags) if tags else 'none'}\n"
            f"    Transcript excerpt: {(m.raw_transcript or '')[:300]}"
        )
    return "\n\n".join(moments_lines), all_tags
 # ── Fixture loading ────────────────────────────────────────────────────────
@dataclass
 class FixtureData:
    """Parsed fixture with moments grouped by category."""
    creator_name: str
    video_id: str
    content_type: str
    filename: str
    # Groups: category -> list of (MockKeyMoment, cls_info_dict)
    groups: dict[str, list[tuple[MockKeyMoment, dict]]]
    total_moments: int
 def load_fixture(path: str) -> FixtureData:
    """Load and parse a fixture JSON file into grouped moments."""
    fixture_path = Path(path)
    if not fixture_path.exists():
        raise FileNotFoundError(f"Fixture not found: {path}")
    raw = fixture_path.read_text(encoding="utf-8")
    size_kb = len(raw.encode("utf-8")) / 1024
    data = json.loads(raw)
    moments_raw = data.get("moments", [])
    if not moments_raw:
        raise ValueError(f"Fixture has no moments: {path}")
    _log("FIXTURE", f"Loading: {path} ({size_kb:.1f} KB, {len(moments_raw)} moments)")
    # Build MockKeyMoment objects and group by category
    groups: dict[str, list[tuple[MockKeyMoment, dict]]] = defaultdict(list)
    for m in moments_raw:
        cls = m.get("classification", {})
        category = cls.get("topic_category", m.get("topic_category", "Uncategorized"))
        tags = cls.get("topic_tags", m.get("topic_tags", []))
        mock = MockKeyMoment(
            title=m.get("title", m.get("summary", "")[:80]),
            summary=m.get("summary", ""),
            content_type=_MockContentType(m.get("content_type", "technique")),
            start_time=m.get("start_time", 0.0),
            end_time=m.get("end_time", 0.0),
            plugins=m.get("plugins", []),
            raw_transcript=m.get("raw_transcript", m.get("transcript_excerpt", "")),
        )
        cls_info = {"topic_category": category, "topic_tags": tags}
        groups[category].append((mock, cls_info))
    # Log breakdown
    cat_counts = {cat: len(moms) for cat, moms in groups.items()}
    counts = list(cat_counts.values())
    _log(
        "FIXTURE",
        f"Breakdown: {len(groups)} categories, "
        f"moments per category: min={min(counts)}, max={max(counts)}, "
        f"avg={sum(counts)/len(counts):.1f}",
    )
    for cat, count in sorted(cat_counts.items(), key=lambda x: -x[1]):
        _log("FIXTURE", f"  {cat}: {count} moments")
    return FixtureData(
        creator_name=data.get("creator_name", "Unknown"),
        video_id=data.get("video_id", "unknown"),
        content_type=data.get("content_type", "tutorial"),
        filename=data.get("filename", "unknown"),
        groups=dict(groups),
        total_moments=len(moments_raw),
    )
 # ── Synthesis runner ───────────────────────────────────────────────────────
 def run_synthesis(
    fixture: FixtureData,
    prompt_path: str,
    category_filter: str | None = None,
    model_override: str | None = None,
    modality: str | None = None,
 ) -> tuple[list[dict], int]:
    """Run synthesis on fixture data, returns (pages, exit_code).
    Returns all synthesized pages as dicts plus an exit code.
    """
    # Load prompt
    prompt_file = Path(prompt_path)
    if not prompt_file.exists():
        _log("ERROR", f"Prompt file not found: {prompt_path}", level="ERROR")
        return [], 3
    system_prompt = prompt_file.read_text(encoding="utf-8")
    _log("PROMPT", f"Loading: {prompt_path} ({len(system_prompt)} chars)")
    # Setup LLM
    settings = get_settings()
    llm = LLMClient(settings)
    stage_model = model_override or settings.llm_stage5_model or settings.llm_model
    stage_modality = modality or settings.llm_stage5_modality or "thinking"
    hard_limit = settings.llm_max_tokens_hard_limit
    _log("LLM", f"Model: {stage_model}, modality: {stage_modality}, hard_limit: {hard_limit}")
    # Filter categories if requested
    categories = fixture.groups
    if category_filter:
        if category_filter not in categories:
            _log("ERROR", f"Category '{category_filter}' not found. Available: {list(categories.keys())}", level="ERROR")
            return [], 3
        categories = {category_filter: categories[category_filter]}
    all_pages: list[dict] = []
    total_prompt_tokens = 0
    total_completion_tokens = 0
    total_duration_ms = 0
    exit_code = 0
    for cat_idx, (category, moment_group) in enumerate(categories.items(), 1):
        _log("SYNTH", f"Category {cat_idx}/{len(categories)}: '{category}' ({len(moment_group)} moments)")
        # Build user prompt (same format as stages.py _synthesize_chunk)
        moments_text, all_tags = build_moments_text(moment_group, category)
        user_prompt = f"<creator>{fixture.creator_name}</creator>\n<moments>\n{moments_text}\n</moments>"
        estimated_tokens = estimate_max_tokens(
            system_prompt, user_prompt,
            stage="stage5_synthesis",
            hard_limit=hard_limit,
        )
        _log(
            "SYNTH",
            f"  Building prompt: {len(moment_group)} moments, "
            f"max_tokens={estimated_tokens}, tags={sorted(all_tags)[:5]}{'...' if len(all_tags) > 5 else ''}",
        )
        # Call LLM
        call_start = time.monotonic()
        _log("LLM", f"  Calling: model={stage_model}, max_tokens={estimated_tokens}, modality={stage_modality}")
        try:
            raw = llm.complete(
                system_prompt,
                user_prompt,
                response_model=SynthesisResult,
                modality=stage_modality,
                model_override=stage_model,
                max_tokens=estimated_tokens,
            )
        except Exception as exc:
            _log("ERROR", f"  LLM call failed: {exc}", level="ERROR")
            exit_code = 1
            continue
        call_duration_ms = int((time.monotonic() - call_start) * 1000)
        prompt_tokens = getattr(raw, "prompt_tokens", None) or 0
        completion_tokens = getattr(raw, "completion_tokens", None) or 0
        finish_reason = getattr(raw, "finish_reason", "unknown")
        total_prompt_tokens += prompt_tokens
        total_completion_tokens += completion_tokens
        total_duration_ms += call_duration_ms
        _log(
            "LLM",
            f"  Response: {prompt_tokens} prompt + {completion_tokens} completion tokens, "
            f"{call_duration_ms}ms, finish_reason={finish_reason}",
        )
        if finish_reason == "length":
            _log(
                "WARN",
                "  finish_reason=length — output likely truncated! "
                "Consider reducing fixture size or increasing max_tokens.",
                level="WARN",
            )
        # Parse response
        try:
            result = SynthesisResult.model_validate_json(str(raw))
        except (ValidationError, json.JSONDecodeError) as exc:
            _log("ERROR", f"  Parse failed: {exc}", level="ERROR")
            _log("ERROR", f"  Raw response (first 2000 chars): {str(raw)[:2000]}", level="ERROR")
            exit_code = 2
            continue
        # Log per-page summary
        _log("SYNTH", f"  Parsed: {len(result.pages)} pages synthesized")
        total_words = 0
        for page in result.pages:
            sections = page.body_sections or []
            section_count = len(sections)
            subsection_count = sum(len(s.subsections) for s in sections)
            word_count = sum(
                len(s.content.split()) + sum(len(sub.content.split()) for sub in s.subsections)
                for s in sections
            )
            total_words += word_count
            _log(
                "PAGE",
                f"    '{page.title}' ({page.slug}): "
                f"{section_count} sections ({subsection_count} subsections), "
                f"{word_count} words, "
                f"{len(page.moment_indices)} moments linked, "
                f"quality={page.source_quality}",
            )
            # Citation coverage reporting
            cit = validate_citations(sections, len(page.moment_indices))
            _log(
                "CITE",
                f"    Citations: {cit['total_citations']}/{len(page.moment_indices)} moments cited "
                f"({cit['coverage_pct']}% coverage)"
                + (f", invalid indices: {cit['invalid_indices']}" if cit['invalid_indices'] else "")
                + (f", uncited: {cit['uncited_moments']}" if cit['uncited_moments'] else ""),
            )
            all_pages.append(page.model_dump())
    # Summary
    _log("SUMMARY", f"Total: {len(all_pages)} pages across {len(categories)} categories")
    _log("SUMMARY", f"Tokens: {total_prompt_tokens} prompt + {total_completion_tokens} completion = {total_prompt_tokens + total_completion_tokens} total")
    _log("SUMMARY", f"Duration: {total_duration_ms}ms ({total_duration_ms / 1000:.1f}s)")
    return all_pages, exit_code
 # ── Compose: merge new moments into existing page ──────────────────────────
 def _count_page_words(page_dict: dict) -> int:
    """Count total words in a page's body sections."""
    return sum(
        len(s.get("content", "").split())
        + sum(len(sub.get("content", "").split()) for sub in s.get("subsections", []))
        for s in page_dict.get("body_sections", [])
    )
 def build_compose_prompt(
    existing_page: dict,
    existing_moments: list[tuple[MockKeyMoment, dict]],
    new_moments: list[tuple[MockKeyMoment, dict]],
    creator_name: str,
 ) -> str:
    """Build the user prompt for composition (merging new moments into an existing page).
    Existing moments keep indices [0]-[N-1].
    New moments get indices [N]-[N+M-1].
    Uses build_moments_text() for formatting, with index offsets applied for new moments.
    """
    category = existing_page.get("topic_category", "Uncategorized")
    # Format existing moments [0]-[N-1]
    existing_text, _ = build_moments_text(existing_moments, category)
    # Format new moments with offset indices [N]-[N+M-1]
    n = len(existing_moments)
    new_lines = []
    for i, (m, cls_info) in enumerate(new_moments):
        tags = cls_info.get("topic_tags", [])
        new_lines.append(
            f"[{n + i}] Title: {m.title}\n"
            f"    Summary: {m.summary}\n"
            f"    Content type: {m.content_type.value}\n"
            f"    Time: {m.start_time:.1f}s - {m.end_time:.1f}s\n"
            f"    Plugins: {', '.join(m.plugins) if m.plugins else 'none'}\n"
            f"    Category: {category}\n"
            f"    Tags: {', '.join(tags) if tags else 'none'}\n"
            f"    Transcript excerpt: {(m.raw_transcript or '')[:300]}"
        )
    new_text = "\n\n".join(new_lines)
    page_json = json.dumps(existing_page, indent=2, ensure_ascii=False)
    return (
        f"<existing_page>\n{page_json}\n</existing_page>\n"
        f"<existing_moments>\n{existing_text}\n</existing_moments>\n"
        f"<new_moments>\n{new_text}\n</new_moments>\n"
        f"<creator>{creator_name}</creator>"
    )
 def run_compose(
    existing_page_path: str,
    existing_fixture_path: str,
    new_fixture_path: str,
    prompt_path: str,
    category_filter: str | None = None,
    model_override: str | None = None,
    modality: str | None = None,
 ) -> tuple[list[dict], int]:
    """Run composition: merge new fixture moments into an existing page.
    Returns (pages, exit_code) — same shape as run_synthesis().
    """
    # Load existing page JSON
    existing_page_file = Path(existing_page_path)
    if not existing_page_file.exists():
        _log("ERROR", f"Existing page not found: {existing_page_path}", level="ERROR")
        return [], 3
    try:
        existing_raw = json.loads(existing_page_file.read_text(encoding="utf-8"))
    except json.JSONDecodeError as exc:
        _log("ERROR", f"Invalid JSON in existing page: {exc}", level="ERROR")
        return [], 3
    # The existing page file might be a harness output (with .pages[]) or a raw SynthesizedPage
    if "pages" in existing_raw and isinstance(existing_raw["pages"], list):
        page_dicts = existing_raw["pages"]
        _log("COMPOSE", f"Loaded harness output with {len(page_dicts)} pages")
    elif "title" in existing_raw and "body_sections" in existing_raw:
        page_dicts = [existing_raw]
        _log("COMPOSE", "Loaded single SynthesizedPage")
    else:
        _log("ERROR", "Existing page JSON must be a SynthesizedPage or harness output with 'pages' key", level="ERROR")
        return [], 3
    # Validate each page against SynthesizedPage
    validated_pages: list[dict] = []
    for pd in page_dicts:
        try:
            SynthesizedPage.model_validate(pd)
            validated_pages.append(pd)
        except ValidationError as exc:
            _log("WARN", f"Skipping invalid page '{pd.get('title', '?')}': {exc}", level="WARN")
    if not validated_pages:
        _log("ERROR", "No valid SynthesizedPage found in existing page file", level="ERROR")
        return [], 3
    # Apply category filter
    if category_filter:
        validated_pages = [p for p in validated_pages if p.get("topic_category") == category_filter]
        if not validated_pages:
            _log("ERROR", f"No pages match category '{category_filter}'", level="ERROR")
            return [], 3
    # Load existing moments fixture (the original moments the page was built from)
    try:
        existing_fixture = load_fixture(existing_fixture_path)
    except (FileNotFoundError, ValueError, json.JSONDecodeError) as exc:
        _log("ERROR", f"Existing fixture error: {exc}", level="ERROR")
        return [], 3
    # Load new moments fixture
    try:
        new_fixture = load_fixture(new_fixture_path)
    except (FileNotFoundError, ValueError, json.JSONDecodeError) as exc:
        _log("ERROR", f"New fixture error: {exc}", level="ERROR")
        return [], 3
    # Load prompt
    prompt_file = Path(prompt_path)
    if not prompt_file.exists():
        _log("ERROR", f"Prompt file not found: {prompt_path}", level="ERROR")
        return [], 3
    system_prompt = prompt_file.read_text(encoding="utf-8")
    _log("PROMPT", f"Loading compose prompt: {prompt_path} ({len(system_prompt)} chars)")
    # Setup LLM
    settings = get_settings()
    llm = LLMClient(settings)
    stage_model = model_override or settings.llm_stage5_model or settings.llm_model
    stage_modality = modality or settings.llm_stage5_modality or "thinking"
    hard_limit = settings.llm_max_tokens_hard_limit
    _log("LLM", f"Model: {stage_model}, modality: {stage_modality}, hard_limit: {hard_limit}")
    all_pages: list[dict] = []
    exit_code = 0
    for page_idx, existing_page in enumerate(validated_pages, 1):
        page_category = existing_page.get("topic_category", "Uncategorized")
        page_title = existing_page.get("title", "Untitled")
        _log("COMPOSE", f"Page {page_idx}/{len(validated_pages)}: '{page_title}' ({page_category})")
        # Get existing moments for this page's category
        existing_moments = existing_fixture.groups.get(page_category, [])
        if not existing_moments:
            _log("WARN", f"  No existing moments found for category '{page_category}' — skipping", level="WARN")
            continue
        # Get new moments for this page's category
        new_moments = new_fixture.groups.get(page_category, [])
        if not new_moments:
            _log("WARN", f"  No new moments for category '{page_category}' — nothing to compose", level="WARN")
            all_pages.append(existing_page)
            continue
        n_existing = len(existing_moments)
        n_new = len(new_moments)
        total_moments = n_existing + n_new
        # Before metrics
        before_words = _count_page_words(existing_page)
        before_sections = len(existing_page.get("body_sections", []))
        _log(
            "COMPOSE",
            f"  Existing: {n_existing} moments, {before_sections} sections, {before_words} words | "
            f"New: {n_new} moments | Total citation space: [0]-[{total_moments - 1}]",
        )
        # Build compose prompt
        user_prompt = build_compose_prompt(
            existing_page=existing_page,
            existing_moments=existing_moments,
            new_moments=new_moments,
            creator_name=existing_fixture.creator_name,
        )
        estimated_tokens = estimate_max_tokens(
            system_prompt, user_prompt,
            stage="stage5_synthesis",
            hard_limit=hard_limit,
        )
        _log("COMPOSE", f"  Prompt built: {len(user_prompt)} chars, max_tokens={estimated_tokens}")
        # Call LLM
        call_start = time.monotonic()
        _log("LLM", f"  Calling: model={stage_model}, max_tokens={estimated_tokens}, modality={stage_modality}")
        try:
            raw = llm.complete(
                system_prompt,
                user_prompt,
                response_model=SynthesisResult,
                modality=stage_modality,
                model_override=stage_model,
                max_tokens=estimated_tokens,
            )
        except Exception as exc:
            _log("ERROR", f"  LLM call failed: {exc}", level="ERROR")
            exit_code = 1
            continue
        call_duration_ms = int((time.monotonic() - call_start) * 1000)
        prompt_tokens = getattr(raw, "prompt_tokens", None) or 0
        completion_tokens = getattr(raw, "completion_tokens", None) or 0
        finish_reason = getattr(raw, "finish_reason", "unknown")
        _log(
            "LLM",
            f"  Response: {prompt_tokens} prompt + {completion_tokens} completion tokens, "
            f"{call_duration_ms}ms, finish_reason={finish_reason}",
        )
        if finish_reason == "length":
            _log("WARN", "  finish_reason=length — output likely truncated!", level="WARN")
        # Parse response
        try:
            result = SynthesisResult.model_validate_json(str(raw))
        except (ValidationError, json.JSONDecodeError) as exc:
            _log("ERROR", f"  Parse failed: {exc}", level="ERROR")
            _log("ERROR", f"  Raw response (first 2000 chars): {str(raw)[:2000]}", level="ERROR")
            exit_code = 2
            continue
        # Log compose-specific metrics
        for page in result.pages:
            page_dict = page.model_dump()
            after_words = _count_page_words(page_dict)
            after_sections = len(page.body_sections or [])
            # Identify new sections (headings not in the original)
            existing_headings = {s.get("heading", "") for s in existing_page.get("body_sections", [])}
            new_section_headings = [
                s.heading for s in (page.body_sections or []) if s.heading not in existing_headings
            ]
            _log(
                "COMPOSE",
                f"  Result: '{page.title}' — "
                f"words {before_words}→{after_words} ({after_words - before_words:+d}), "
                f"sections {before_sections}→{after_sections} ({after_sections - before_sections:+d})"
                + (f", new sections: {new_section_headings}" if new_section_headings else ""),
            )
            # Citation validation with unified moment count
            cit = validate_citations(page.body_sections or [], total_moments)
            _log(
                "CITE",
                f"  Citations: {cit['total_citations']}/{total_moments} moments cited "
                f"({cit['coverage_pct']}% coverage)"
                + (f", invalid indices: {cit['invalid_indices']}" if cit['invalid_indices'] else "")
                + (f", uncited: {cit['uncited_moments']}" if cit['uncited_moments'] else ""),
            )
            all_pages.append(page_dict)
    _log("SUMMARY", f"Compose complete: {len(all_pages)} pages")
    return all_pages, exit_code
 # ── Promote: deploy a prompt to production ─────────────────────────────────
 _STAGE_PROMPT_MAP = {
    2: "stage2_segmentation.txt",
    3: "stage3_extraction.txt",
    4: "stage4_classification.txt",
    5: "stage5_synthesis.txt",
 }
 def promote_prompt(prompt_path: str, stage: int, reason: str, commit: bool = False) -> int:
    """Copy a winning prompt to the canonical path and create a backup.
    The worker reads prompts from disk at runtime — no restart needed.
    """
    import hashlib
    import shutil
    if stage not in _STAGE_PROMPT_MAP:
        _log("ERROR", f"Invalid stage {stage}. Valid: {sorted(_STAGE_PROMPT_MAP)}", level="ERROR")
        return 1
    settings = get_settings()
    template_name = _STAGE_PROMPT_MAP[stage]
    canonical = Path(settings.prompts_path) / template_name
    source = Path(prompt_path)
    if not source.exists():
        _log("ERROR", f"Source prompt not found: {prompt_path}", level="ERROR")
        return 1
    new_prompt = source.read_text(encoding="utf-8")
    new_hash = hashlib.sha256(new_prompt.encode()).hexdigest()[:12]
    # Backup current prompt
    old_prompt = ""
    old_hash = "none"
    if canonical.exists():
        old_prompt = canonical.read_text(encoding="utf-8")
        old_hash = hashlib.sha256(old_prompt.encode()).hexdigest()[:12]
        if old_prompt.strip() == new_prompt.strip():
            _log("PROMOTE", "No change — new prompt is identical to current prompt")
            return 0
        archive_dir = Path(settings.prompts_path) / "archive"
        archive_dir.mkdir(parents=True, exist_ok=True)
        ts = time.strftime("%Y%m%d_%H%M%S", time.gmtime())
        backup = archive_dir / f"{template_name.replace('.txt', '')}_{ts}.txt"
        shutil.copy2(canonical, backup)
        _log("PROMOTE", f"Backed up current prompt: {old_hash} -> {backup}")
    # Write new prompt
    canonical.write_text(new_prompt, encoding="utf-8")
    old_lines = old_prompt.strip().splitlines()
    new_lines = new_prompt.strip().splitlines()
    _log("PROMOTE", f"Installed new prompt: {new_hash} ({len(new_prompt)} chars, {len(new_lines)} lines)")
    _log("PROMOTE", f"Previous: {old_hash} ({len(old_prompt)} chars, {len(old_lines)} lines)")
    _log("PROMOTE", f"Reason: {reason}")
    _log("PROMOTE", "Worker reads prompts from disk at runtime — no restart needed")
    if commit:
        import subprocess
        try:
            subprocess.run(
                ["git", "add", str(canonical)],
                cwd=str(canonical.parent.parent),
                check=True, capture_output=True,
            )
            msg = f"prompt: promote stage{stage} — {reason}"
            subprocess.run(
                ["git", "commit", "-m", msg],
                cwd=str(canonical.parent.parent),
                check=True, capture_output=True,
            )
            _log("PROMOTE", f"Git commit created: {msg}")
        except subprocess.CalledProcessError as exc:
            _log("PROMOTE", f"Git commit failed: {exc}", level="WARN")
    return 0
 # ── CLI ────────────────────────────────────────────────────────────────────
 def main() -> int:
    parser = argparse.ArgumentParser(
        prog="pipeline.test_harness",
        description="Offline prompt test harness for Chrysopedia synthesis",
    )
    sub = parser.add_subparsers(dest="command")
    # -- run subcommand (default behavior) --
    run_parser = sub.add_parser("run", help="Run synthesis against a fixture")
    run_parser.add_argument("--fixture", "-f", type=str, required=True, help="Fixture JSON file")
    run_parser.add_argument("--prompt", "-p", type=str, default=None, help="Prompt file (default: stage5_synthesis.txt)")
    run_parser.add_argument("--output", "-o", type=str, default=None, help="Output file path")
    run_parser.add_argument("--category", "-c", type=str, default=None, help="Filter to a specific category")
    run_parser.add_argument("--model", type=str, default=None, help="Override LLM model")
    run_parser.add_argument("--modality", type=str, default=None, choices=["chat", "thinking"])
    # -- promote subcommand --
    promo_parser = sub.add_parser("promote", help="Deploy a winning prompt to production")
    promo_parser.add_argument("--prompt", "-p", type=str, required=True, help="Path to the winning prompt file")
    promo_parser.add_argument("--stage", "-s", type=int, default=5, help="Stage number (default: 5)")
    promo_parser.add_argument("--reason", "-r", type=str, required=True, help="Why this prompt is being promoted")
    promo_parser.add_argument("--commit", action="store_true", help="Also create a git commit")
    # -- compose subcommand --
    compose_parser = sub.add_parser("compose", help="Merge new moments into an existing page")
    compose_parser.add_argument("--existing-page", type=str, required=True, help="Existing page JSON (harness output or raw SynthesizedPage)")
    compose_parser.add_argument("--fixture", "-f", type=str, required=True, help="New moments fixture JSON")
    compose_parser.add_argument("--existing-fixture", type=str, required=True, help="Original moments fixture JSON (for citation context)")
    compose_parser.add_argument("--prompt", "-p", type=str, default=None, help="Compose prompt file (default: stage5_compose.txt)")
    compose_parser.add_argument("--output", "-o", type=str, default=None, help="Output file path")
    compose_parser.add_argument("--category", "-c", type=str, default=None, help="Filter to a specific category")
    compose_parser.add_argument("--model", type=str, default=None, help="Override LLM model")
    compose_parser.add_argument("--modality", type=str, default=None, choices=["chat", "thinking"])
    args = parser.parse_args()
    # If no subcommand, check for --fixture for backward compat
    if args.command is None:
        # Support running without subcommand for backward compat
        parser.print_help()
        return 1
    if args.command == "promote":
        return promote_prompt(args.prompt, args.stage, args.reason, args.commit)
    if args.command == "compose":
        # Resolve default compose prompt
        prompt_path = args.prompt
        if prompt_path is None:
            settings = get_settings()
            prompt_path = str(Path(settings.prompts_path) / "stage5_compose.txt")
        overall_start = time.monotonic()
        pages, exit_code = run_compose(
            existing_page_path=args.existing_page,
            existing_fixture_path=args.existing_fixture,
            new_fixture_path=args.fixture,
            prompt_path=prompt_path,
            category_filter=args.category,
            model_override=args.model,
            modality=args.modality,
        )
        if not pages and exit_code != 0:
            return exit_code
        output = {
            "existing_page_source": args.existing_page,
            "existing_fixture_source": args.existing_fixture,
            "new_fixture_source": args.fixture,
            "prompt_source": prompt_path,
            "category_filter": args.category,
            "pages": pages,
            "metadata": {
                "page_count": len(pages),
                "total_words": sum(_count_page_words(p) for p in pages),
                "elapsed_seconds": round(time.monotonic() - overall_start, 1),
            },
        }
        output_json = json.dumps(output, indent=2, ensure_ascii=False)
        if args.output:
            Path(args.output).parent.mkdir(parents=True, exist_ok=True)
            Path(args.output).write_text(output_json, encoding="utf-8")
            _log("OUTPUT", f"Written to: {args.output} ({len(output_json) / 1024:.1f} KB)")
        else:
            print(output_json)
            _log("OUTPUT", f"Printed to stdout ({len(output_json) / 1024:.1f} KB)")
        total_elapsed = time.monotonic() - overall_start
        _log("DONE", f"Compose completed in {total_elapsed:.1f}s (exit_code={exit_code})")
        return exit_code
    # -- run command --
    prompt_path = args.prompt
    if prompt_path is None:
        settings = get_settings()
        prompt_path = str(Path(settings.prompts_path) / "stage5_synthesis.txt")
    overall_start = time.monotonic()
    try:
        fixture = load_fixture(args.fixture)
    except (FileNotFoundError, ValueError, json.JSONDecodeError) as exc:
        _log("ERROR", f"Fixture error: {exc}", level="ERROR")
        return 3
    pages, exit_code = run_synthesis(
        fixture=fixture,
        prompt_path=prompt_path,
        category_filter=args.category,
        model_override=args.model,
        modality=args.modality,
    )
    if not pages and exit_code != 0:
        return exit_code
    output = {
        "fixture_source": args.fixture,
        "prompt_source": prompt_path,
        "creator_name": fixture.creator_name,
        "video_id": fixture.video_id,
        "category_filter": args.category,
        "pages": pages,
        "metadata": {
            "page_count": len(pages),
            "total_words": sum(
                sum(
                    len(s.get("content", "").split())
                    + sum(len(sub.get("content", "").split()) for sub in s.get("subsections", []))
                    for s in p.get("body_sections", [])
                )
                for p in pages
            ),
            "elapsed_seconds": round(time.monotonic() - overall_start, 1),
        },
    }
    output_json = json.dumps(output, indent=2, ensure_ascii=False)
    if args.output:
        Path(args.output).parent.mkdir(parents=True, exist_ok=True)
        Path(args.output).write_text(output_json, encoding="utf-8")
        _log("OUTPUT", f"Written to: {args.output} ({len(output_json) / 1024:.1f} KB)")
    else:
        print(output_json)
        _log("OUTPUT", f"Printed to stdout ({len(output_json) / 1024:.1f} KB)")
    total_elapsed = time.monotonic() - overall_start
    _log("DONE", f"Completed in {total_elapsed:.1f}s (exit_code={exit_code})")
    return exit_code
 if __name__ == "__main__":
    sys.exit(main())
--- a/backend/pipeline/test_harness_compose.py
+++ b/backend/pipeline/test_harness_compose.py
@ -0,0 +1,389 @@
 """Tests for compose-mode prompt building and validation.
 Covers prompt construction, citation re-indexing math, category filtering,
 and edge cases — no LLM calls required.
 """
 from __future__ import annotations
 import json
 import pytest
 from pipeline.citation_utils import validate_citations
 from pipeline.schemas import BodySection, BodySubSection, SynthesizedPage
 from pipeline.test_harness import (
    MockKeyMoment,
    _MockContentType,
    build_compose_prompt,
    build_moments_text,
 )
 # ── Fixtures / helpers ───────────────────────────────────────────────────────
 def _moment(
    title: str = "Test Moment",
    summary: str = "A moment.",
    content_type: str = "technique_demo",
    start_time: float = 0.0,
    end_time: float = 10.0,
    plugins: list[str] | None = None,
    raw_transcript: str | None = "Some transcript text",
 ) -> MockKeyMoment:
    return MockKeyMoment(
        title=title,
        summary=summary,
        content_type=_MockContentType(content_type),
        start_time=start_time,
        end_time=end_time,
        plugins=plugins or [],
        raw_transcript=raw_transcript or "",
    )
 def _cls_info(
    category: str = "Sound Design",
    tags: list[str] | None = None,
 ) -> dict:
    return {
        "topic_category": category,
        "topic_tags": tags or ["reverb", "delay"],
    }
 def _make_page(
    title: str = "Reverb Techniques",
    slug: str = "reverb-techniques",
    category: str = "Sound Design",
    sections: list[BodySection] | None = None,
    moment_indices: list[int] | None = None,
 ) -> dict:
    """Build a SynthesizedPage dict (as it would appear in harness output)."""
    if sections is None:
        sections = [
            BodySection(
                heading="Overview",
                content="Reverb is essential [0]. Basics of space [1].",
                subsections=[
                    BodySubSection(
                        heading="Room Types",
                        content="Rooms vary in character [2].",
                    )
                ],
            )
        ]
    page = SynthesizedPage(
        title=title,
        slug=slug,
        topic_category=category,
        summary="A page about reverb.",
        body_sections=sections,
        moment_indices=moment_indices or [0, 1, 2],
    )
    return json.loads(page.model_dump_json())
 # ── TestBuildComposePrompt ──────────────────────────────────────────────────
 class TestBuildComposePrompt:
    """Verify prompt construction for compose mode."""
    def test_prompt_contains_xml_tags(self):
        """Existing page + 3 old + 2 new → prompt has all required XML tags."""
        existing_moments = [(_moment(title=f"Old {i}"), _cls_info()) for i in range(3)]
        new_moments = [(_moment(title=f"New {i}"), _cls_info()) for i in range(2)]
        page = _make_page(moment_indices=[0, 1, 2])
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing_moments,
            new_moments=new_moments,
            creator_name="DJ Test",
        )
        assert "<existing_page>" in prompt
        assert "</existing_page>" in prompt
        assert "<existing_moments>" in prompt
        assert "</existing_moments>" in prompt
        assert "<new_moments>" in prompt
        assert "</new_moments>" in prompt
        assert "<creator>" in prompt
        assert "</creator>" in prompt
    def test_old_moments_indexed_0_to_n(self):
        """3 old moments are indexed [0], [1], [2]."""
        existing_moments = [(_moment(title=f"Old {i}"), _cls_info()) for i in range(3)]
        new_moments = [(_moment(title=f"New {i}"), _cls_info()) for i in range(2)]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing_moments,
            new_moments=new_moments,
            creator_name="DJ Test",
        )
        # Old moments section uses [0], [1], [2]
        existing_block = prompt.split("<existing_moments>")[1].split("</existing_moments>")[0]
        assert "[0] Title:" in existing_block
        assert "[1] Title:" in existing_block
        assert "[2] Title:" in existing_block
    def test_new_moments_indexed_n_to_n_plus_m(self):
        """2 new moments after 3 old → indexed [3] and [4]."""
        existing_moments = [(_moment(title=f"Old {i}"), _cls_info()) for i in range(3)]
        new_moments = [(_moment(title=f"New {i}"), _cls_info()) for i in range(2)]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing_moments,
            new_moments=new_moments,
            creator_name="DJ Test",
        )
        new_block = prompt.split("<new_moments>")[1].split("</new_moments>")[0]
        assert "[3] Title:" in new_block
        assert "[4] Title:" in new_block
        # Should NOT contain [0]-[2] in new moments block
        assert "[0] Title:" not in new_block
    def test_creator_name_in_prompt(self):
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=[(_moment(), _cls_info())],
            new_moments=[(_moment(), _cls_info())],
            creator_name="Keota",
        )
        assert "<creator>Keota</creator>" in prompt
    def test_existing_page_json_valid(self):
        """Existing page JSON in the prompt is valid and parseable."""
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=[(_moment(), _cls_info())],
            new_moments=[(_moment(), _cls_info())],
            creator_name="Test",
        )
        page_block = prompt.split("<existing_page>")[1].split("</existing_page>")[0].strip()
        parsed = json.loads(page_block)
        assert parsed["title"] == "Reverb Techniques"
        assert parsed["slug"] == "reverb-techniques"
    def test_moment_format_matches_build_moments_text(self):
        """Existing moments format matches build_moments_text output."""
        moments = [
            (_moment(title="Delay Basics", plugins=["Valhalla"]), _cls_info(tags=["delay"])),
        ]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=moments,
            new_moments=[(_moment(), _cls_info())],
            creator_name="Test",
        )
        # build_moments_text produces the same format for existing moments
        expected_text, _ = build_moments_text(moments, "Sound Design")
        existing_block = prompt.split("<existing_moments>")[1].split("</existing_moments>")[0].strip()
        assert expected_text.strip() == existing_block
 # ── TestCitationReindexing ──────────────────────────────────────────────────
 class TestCitationReindexing:
    """Verify citation index math for compose mode."""
    def test_5_old_3_new_valid_range(self):
        """5 old + 3 new → valid range is [0]-[7], moment_count=8."""
        # Build content that references all 8 indices
        sections = [
            BodySection(
                heading="Section",
                content="Refs [0] [1] [2] [3] [4] [5] [6] [7].",
            )
        ]
        result = validate_citations(sections, moment_count=8)
        assert result["valid"] is True
        assert result["total_citations"] == 8
        assert result["invalid_indices"] == []
    def test_accepts_citations_in_valid_range(self):
        """validate_citations with moment_count=8 accepts [0]-[7]."""
        sections = [
            BodySection(
                heading="S1",
                content="See [0] and [3] and [7].",
                subsections=[
                    BodySubSection(heading="Sub", content="Also [1] [2] [4] [5] [6].")
                ],
            )
        ]
        result = validate_citations(sections, moment_count=8)
        assert result["valid"] is True
        assert result["invalid_indices"] == []
    def test_rejects_out_of_range_citation(self):
        """validate_citations with moment_count=8 rejects [8]."""
        sections = [
            BodySection(
                heading="S1",
                content="Bad ref [8] and valid [0].",
            )
        ]
        result = validate_citations(sections, moment_count=8)
        assert result["valid"] is False
        assert 8 in result["invalid_indices"]
    def test_compose_offset_arithmetic(self):
        """Verify the offset math: N existing → new moments start at [N]."""
        n_existing = 5
        n_new = 3
        existing = [(_moment(title=f"E{i}"), _cls_info()) for i in range(n_existing)]
        new = [(_moment(title=f"N{i}"), _cls_info()) for i in range(n_new)]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new,
            creator_name="Test",
        )
        new_block = prompt.split("<new_moments>")[1].split("</new_moments>")[0]
        # First new moment should be [5], last should be [7]
        assert "[5] Title:" in new_block
        assert "[6] Title:" in new_block
        assert "[7] Title:" in new_block
        assert "[4] Title:" not in new_block  # last old moment, not in new block
 # ── TestCategoryFiltering ───────────────────────────────────────────────────
 class TestCategoryFiltering:
    """Verify that compose filters moments by category to match existing page."""
    def test_only_matching_category_moments_used(self):
        """Moments from category B are excluded when composing a category A page."""
        page = _make_page(category="Sound Design")
        existing = [(_moment(title="E0"), _cls_info(category="Sound Design"))]
        # Mix of matching and non-matching new moments
        new_sound = [(_moment(title="New SD"), _cls_info(category="Sound Design"))]
        new_mixing = [(_moment(title="New Mix"), _cls_info(category="Mixing"))]
        # build_compose_prompt doesn't filter by category — that's run_compose's job.
        # But we can verify the prompt only contains what we pass in.
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new_sound,  # Only Sound Design moments
            creator_name="Test",
        )
        new_block = prompt.split("<new_moments>")[1].split("</new_moments>")[0]
        assert "New SD" in new_block
        assert "New Mix" not in new_block
    def test_category_from_page_used_in_moments_text(self):
        """The page's topic_category is used in the moment formatting."""
        page = _make_page(category="Mixing")
        existing = [(_moment(), _cls_info(category="Mixing"))]
        new = [(_moment(), _cls_info(category="Mixing"))]
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new,
            creator_name="Test",
        )
        # The category in the formatted moments comes from the page's topic_category
        assert "Category: Mixing" in prompt
 # ── TestEdgeCases ──────────────────────────────────────────────────────────
 class TestEdgeCases:
    """Edge cases for compose prompt construction."""
    def test_empty_new_moments(self):
        """Empty new moments → prompt still valid with empty new_moments block."""
        page = _make_page()
        existing = [(_moment(), _cls_info())]
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=[],
            creator_name="Test",
        )
        assert "<new_moments>" in prompt
        assert "</new_moments>" in prompt
        # Existing moments still present
        assert "[0] Title:" in prompt
    def test_single_new_moment_at_offset_n(self):
        """Single new moment after 2 existing → indexed [2]."""
        existing = [(_moment(title=f"E{i}"), _cls_info()) for i in range(2)]
        new = [(_moment(title="Single New"), _cls_info())]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new,
            creator_name="Test",
        )
        new_block = prompt.split("<new_moments>")[1].split("</new_moments>")[0]
        assert "[2] Title: Single New" in new_block
    def test_existing_page_no_subsections(self):
        """Page with sections but no subsections → handled correctly."""
        sections = [
            BodySection(heading="Flat Section", content="Content [0]."),
        ]
        page = _make_page(sections=sections, moment_indices=[0])
        existing = [(_moment(), _cls_info())]
        new = [(_moment(title="New One"), _cls_info())]
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new,
            creator_name="Test",
        )
        page_block = prompt.split("<existing_page>")[1].split("</existing_page>")[0].strip()
        parsed = json.loads(page_block)
        assert len(parsed["body_sections"]) == 1
        assert parsed["body_sections"][0]["subsections"] == []
    def test_large_offset_indices(self):
        """10 existing + 5 new → new moments indexed [10]-[14]."""
        existing = [(_moment(title=f"E{i}"), _cls_info()) for i in range(10)]
        new = [(_moment(title=f"N{i}"), _cls_info()) for i in range(5)]
        page = _make_page()
        prompt = build_compose_prompt(
            existing_page=page,
            existing_moments=existing,
            new_moments=new,
            creator_name="Test",
        )
        new_block = prompt.split("<new_moments>")[1].split("</new_moments>")[0]
        assert "[10] Title:" in new_block
        assert "[14] Title:" in new_block
        assert "[9] Title:" not in new_block  # last existing, not in new block
--- a/backend/pipeline/test_harness_v2_format.py
+++ b/backend/pipeline/test_harness_v2_format.py
@ -0,0 +1,213 @@
 """Tests for test_harness compatibility with v2 body_sections format.
 Validates that word-counting and citation integration work correctly
 with the list[BodySection] structure (v2) instead of the old dict format.
 """
 from __future__ import annotations
 import pytest
 from pipeline.citation_utils import validate_citations
 from pipeline.schemas import BodySection, BodySubSection, SynthesizedPage, SynthesisResult
 # ── Helpers ──────────────────────────────────────────────────────────────────
 def _make_page(
    body_sections: list[BodySection],
    moment_indices: list[int] | None = None,
    title: str = "Test Page",
    slug: str = "test-page",
 ) -> SynthesizedPage:
    return SynthesizedPage(
        title=title,
        slug=slug,
        topic_category="Testing",
        summary="A test page.",
        body_sections=body_sections,
        moment_indices=moment_indices or [],
    )
 def _count_words_v2(sections: list[BodySection]) -> int:
    """Replicate the word-counting logic from the updated test_harness."""
    return sum(
        len(s.content.split()) + sum(len(sub.content.split()) for sub in s.subsections)
        for s in sections
    )
 def _count_words_metadata(pages_dicts: list[dict]) -> int:
    """Replicate the metadata total_words logic (operates on dicts after model_dump)."""
    return sum(
        sum(
            len(s.get("content", "").split())
            + sum(len(sub.get("content", "").split()) for sub in s.get("subsections", []))
            for s in p.get("body_sections", [])
        )
        for p in pages_dicts
    )
 # ── Word counting tests ─────────────────────────────────────────────────────
 class TestWordCounting:
    def test_flat_sections_no_subsections(self):
        sections = [
            BodySection(heading="Intro", content="one two three"),
            BodySection(heading="Details", content="four five"),
        ]
        assert _count_words_v2(sections) == 5
    def test_sections_with_subsections(self):
        sections = [
            BodySection(
                heading="Main",
                content="alpha beta",  # 2 words
                subsections=[
                    BodySubSection(heading="Sub A", content="gamma delta epsilon"),  # 3 words
                    BodySubSection(heading="Sub B", content="zeta"),  # 1 word
                ],
            ),
        ]
        assert _count_words_v2(sections) == 6
    def test_empty_sections_list(self):
        assert _count_words_v2([]) == 0
    def test_section_with_empty_content(self):
        sections = [
            BodySection(heading="Empty", content=""),
        ]
        # "".split() returns [], len([]) == 0
        assert _count_words_v2(sections) == 0
    def test_metadata_word_count_matches(self):
        """Metadata total_words (from model_dump dicts) matches Pydantic object counting."""
        sections = [
            BodySection(
                heading="H2",
                content="one two three",
                subsections=[
                    BodySubSection(heading="H3", content="four five six seven"),
                ],
            ),
            BodySection(heading="Another", content="eight nine"),
        ]
        page = _make_page(sections, moment_indices=[0, 1])
        pages_dicts = [page.model_dump()]
        assert _count_words_v2(sections) == 9
        assert _count_words_metadata(pages_dicts) == 9
 # ── Section/subsection counting ─────────────────────────────────────────────
 class TestSectionCounting:
    def test_section_and_subsection_counts(self):
        sections = [
            BodySection(heading="A", content="text", subsections=[
                BodySubSection(heading="A.1", content="sub text"),
            ]),
            BodySection(heading="B", content="more text"),
            BodySection(heading="C", content="even more", subsections=[
                BodySubSection(heading="C.1", content="sub1"),
                BodySubSection(heading="C.2", content="sub2"),
            ]),
        ]
        section_count = len(sections)
        subsection_count = sum(len(s.subsections) for s in sections)
        assert section_count == 3
        assert subsection_count == 3
 # ── Citation integration ─────────────────────────────────────────────────────
 class TestCitationIntegration:
    def test_full_coverage(self):
        sections = [
            BodySection(heading="Intro", content="First point [0]. Second point [1]."),
            BodySection(heading="Details", content="More on [0] and [2]."),
        ]
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is True
        assert result["coverage_pct"] == 100.0
        assert result["invalid_indices"] == []
        assert result["uncited_moments"] == []
    def test_partial_coverage(self):
        sections = [
            BodySection(heading="Intro", content="Only cites [0]."),
        ]
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is False
        assert result["coverage_pct"] == pytest.approx(33.3, abs=0.1)
        assert result["uncited_moments"] == [1, 2]
    def test_invalid_index(self):
        sections = [
            BodySection(heading="Bad", content="Cites [0] and [99]."),
        ]
        result = validate_citations(sections, moment_count=2)
        assert result["invalid_indices"] == [99]
    def test_citations_in_subsections(self):
        sections = [
            BodySection(
                heading="Main",
                content="See [0].",
                subsections=[
                    BodySubSection(heading="Sub", content="Also [1] and [2]."),
                ],
            ),
        ]
        result = validate_citations(sections, moment_count=3)
        assert result["valid"] is True
        assert result["total_citations"] == 3
    def test_multi_citation_markers(self):
        sections = [
            BodySection(heading="X", content="Both sources agree [0,1]."),
        ]
        result = validate_citations(sections, moment_count=2)
        assert result["valid"] is True
        assert result["total_citations"] == 2
    def test_no_sections(self):
        result = validate_citations([], moment_count=0)
        assert result["valid"] is True
        assert result["coverage_pct"] == 0.0
 # ── End-to-end: SynthesisResult with v2 body_sections ───────────────────────
 class TestSynthesisResultV2:
    def test_round_trip_model_dump(self):
        """SynthesisResult with v2 body_sections round-trips through model_dump/validate."""
        sections = [
            BodySection(
                heading="Overview",
                content="This technique [0] is fundamental.",
                subsections=[
                    BodySubSection(heading="Key Concept", content="Detail [1]."),
                ],
            ),
        ]
        page = _make_page(sections, moment_indices=[0, 1])
        result = SynthesisResult(pages=[page])
        dumped = result.model_dump()
        restored = SynthesisResult.model_validate(dumped)
        assert len(restored.pages) == 1
        restored_page = restored.pages[0]
        assert len(restored_page.body_sections) == 1
        assert restored_page.body_sections[0].heading == "Overview"
        assert len(restored_page.body_sections[0].subsections) == 1
        assert restored_page.body_sections_format == "v2"
--- a/backend/pipeline/test_highlight_scorer.py
+++ b/backend/pipeline/test_highlight_scorer.py
@ -0,0 +1,521 @@
 """Tests for the highlight scoring engine.
 Verifies heuristic scoring produces sensible orderings and handles
 edge cases gracefully.
 """
 from __future__ import annotations
 import pytest
 from backend.pipeline.highlight_scorer import (
    _content_type_weight,
    _duration_fitness,
    _pause_density,
    _plugin_richness,
    _source_quality_weight,
    _speaking_pace_fitness,
    _specificity_density,
    _speech_rate_variance,
    _transcript_energy,
    _video_type_weight,
    extract_word_timings,
    score_moment,
 )
 # ── Fixture helpers ──────────────────────────────────────────────────────────
 def _ideal_moment() -> dict:
    """45s technique moment, 3 plugins, specific summary, structured source."""
    return dict(
        start_time=10.0,
        end_time=55.0,  # 45s duration
        content_type="technique",
        summary=(
            "Set the compressor threshold to -18 dB with a 4:1 ratio, "
            "then boost the high shelf at 12 kHz by 3.5 dB using FabFilter Pro-Q 3."
        ),
        plugins=["FabFilter Pro-Q 3", "SSL G-Bus Compressor", "Valhalla Room"],
        raw_transcript=(
            "The trick is to set the threshold low enough. Notice how "
            "the compressor grabs the transients. Because we want to preserve "
            "the dynamics, I always back off the ratio. The key is finding "
            "that sweet spot where it's controlling but not squashing."
        ),
        source_quality="structured",
        video_content_type="tutorial",
    )
 def _mediocre_moment() -> dict:
    """90s settings moment, 1 plugin, decent summary, mixed source."""
    return dict(
        start_time=120.0,
        end_time=210.0,  # 90s duration
        content_type="settings",
        summary="Adjust the EQ settings for the vocal track to get a clearer sound.",
        plugins=["FabFilter Pro-Q 3"],
        raw_transcript=(
            "So here we're just going to adjust this. I think it sounds "
            "better when we cut some of the low end. Let me show you what "
            "I mean. Yeah, that's better."
        ),
        source_quality="mixed",
        video_content_type="breakdown",
    )
 def _poor_moment() -> dict:
    """300s reasoning moment, 0 plugins, vague summary, unstructured source."""
    return dict(
        start_time=0.0,
        end_time=300.0,  # 300s duration → zero for duration_fitness
        content_type="reasoning",
        summary="General discussion about mixing philosophy and approach.",
        plugins=[],
        raw_transcript=(
            "I think mixing is really about taste. Everyone has their own "
            "approach. Some people like it loud, some people like it quiet. "
            "There's no right or wrong way to do it really."
        ),
        source_quality="unstructured",
        video_content_type="livestream",
    )
 def _make_word_timings(
    start: float = 0.0,
    count: int = 40,
    wps: float = 4.0,
    pause_every: int | None = None,
    pause_duration: float = 0.8,
 ) -> list[dict]:
    """Generate synthetic word-timing dicts for testing.
    Parameters
    ----------
    start : float
        Start time in seconds.
    count : int
        Number of words to generate.
    wps : float
        Words per second (base rate).
    pause_every : int | None
        Insert a pause every N words. None = no pauses.
    pause_duration : float
        Duration of each pause in seconds.
    """
    timings = []
    t = start
    word_dur = 1.0 / wps * 0.7  # 70% speaking, 30% normal gap
    gap = 1.0 / wps * 0.3
    for i in range(count):
        timings.append({"word": f"word{i}", "start": t, "end": t + word_dur})
        t += word_dur + gap
        if pause_every and (i + 1) % pause_every == 0:
            t += pause_duration
    return timings
 def _make_transcript_segments(word_timings: list[dict], words_per_segment: int = 10) -> list[dict]:
    """Group word timings into transcript segments for extract_word_timings tests."""
    segments = []
    for i in range(0, len(word_timings), words_per_segment):
        chunk = word_timings[i : i + words_per_segment]
        segments.append({"words": chunk})
    return segments
 # ── Tests ────────────────────────────────────────────────────────────────────
 class TestScoreMoment:
    def test_ideal_moment_scores_high(self):
        result = score_moment(**_ideal_moment())
        assert result["score"] > 0.7, f"Ideal moment scored {result['score']}, expected > 0.7"
    def test_poor_moment_scores_low(self):
        result = score_moment(**_poor_moment())
        assert result["score"] < 0.4, f"Poor moment scored {result['score']}, expected < 0.4"
    def test_ordering_is_sensible(self):
        ideal = score_moment(**_ideal_moment())
        mediocre = score_moment(**_mediocre_moment())
        poor = score_moment(**_poor_moment())
        assert ideal["score"] > mediocre["score"] > poor["score"], (
            f"Expected ideal ({ideal['score']:.3f}) > "
            f"mediocre ({mediocre['score']:.3f}) > "
            f"poor ({poor['score']:.3f})"
        )
    def test_score_bounds(self):
        """All scores in [0.0, 1.0] for edge cases."""
        edge_cases = [
            dict(start_time=0, end_time=0, summary="", plugins=None, raw_transcript=None),
            dict(start_time=0, end_time=500, summary=None, plugins=[], raw_transcript=""),
            dict(start_time=0, end_time=45, summary="x" * 10000, plugins=["a"] * 100),
            dict(start_time=100, end_time=100),  # zero duration
        ]
        for kwargs in edge_cases:
            result = score_moment(**kwargs)
            assert 0.0 <= result["score"] <= 1.0, f"Score {result['score']} out of bounds for {kwargs}"
            for dim, val in result["score_breakdown"].items():
                assert 0.0 <= val <= 1.0, f"{dim}={val} out of bounds for {kwargs}"
    def test_missing_optional_fields(self):
        """None raw_transcript and None plugins don't crash."""
        result = score_moment(
            start_time=10.0,
            end_time=55.0,
            content_type="technique",
            summary="A summary.",
            plugins=None,
            raw_transcript=None,
            source_quality=None,
            video_content_type=None,
        )
        assert 0.0 <= result["score"] <= 1.0
        assert result["duration_secs"] == 45.0
        assert len(result["score_breakdown"]) == 10
    def test_returns_duration_secs(self):
        result = score_moment(start_time=10.0, end_time=55.0)
        assert result["duration_secs"] == 45.0
    def test_breakdown_has_ten_dimensions(self):
        result = score_moment(**_ideal_moment())
        assert len(result["score_breakdown"]) == 10
        expected_keys = {
            "duration_score", "content_density_score", "technique_relevance_score",
            "plugin_diversity_score", "engagement_proxy_score", "position_score",
            "uniqueness_score", "speech_rate_variance_score", "pause_density_score",
            "speaking_pace_score",
        }
        assert set(result["score_breakdown"].keys()) == expected_keys
    def test_without_word_timings_audio_dims_are_neutral(self):
        """When word_timings is None, audio proxy dimensions score 0.5."""
        result = score_moment(start_time=10.0, end_time=55.0)
        bd = result["score_breakdown"]
        assert bd["speech_rate_variance_score"] == 0.5
        assert bd["pause_density_score"] == 0.5
        assert bd["speaking_pace_score"] == 0.5
    def test_with_word_timings_changes_score(self):
        """Providing word_timings should shift the composite score vs without."""
        base = _ideal_moment()
        without = score_moment(**base)
        # Add word timings at a good teaching pace (~4 WPS) with some pauses
        timings = _make_word_timings(start=10.0, count=120, wps=4.0, pause_every=15)
        with_timings = score_moment(**base, word_timings=timings)
        # Scores should differ since audio dims are no longer neutral
        assert with_timings["score"] != without["score"]
 class TestDurationFitness:
    def test_bell_curve_peak(self):
        """45s scores higher than 10s, 10s scores higher than 400s."""
        assert _duration_fitness(45) > _duration_fitness(10)
        assert _duration_fitness(10) > _duration_fitness(400)
    def test_sweet_spot(self):
        assert _duration_fitness(30) == 1.0
        assert _duration_fitness(45) == 1.0
        assert _duration_fitness(60) == 1.0
    def test_zero_at_extremes(self):
        assert _duration_fitness(0) == 0.0
        assert _duration_fitness(300) == 0.0
        assert _duration_fitness(500) == 0.0
    def test_negative_duration(self):
        assert _duration_fitness(-10) == 0.0
 class TestContentTypeWeight:
    def test_technique_highest(self):
        assert _content_type_weight("technique") == 1.0
    def test_reasoning_lowest_known(self):
        assert _content_type_weight("reasoning") == 0.4
    def test_unknown_gets_default(self):
        assert _content_type_weight("unknown") == 0.5
        assert _content_type_weight(None) == 0.5
 class TestSpecificityDensity:
    def test_specific_summary_scores_high(self):
        summary = "Set threshold to -18 dB with 4:1 ratio, boost 12 kHz by 3.5 dB"
        score = _specificity_density(summary)
        assert score > 0.5
    def test_vague_summary_scores_low(self):
        score = _specificity_density("General discussion about mixing philosophy.")
        assert score < 0.3
    def test_empty_returns_zero(self):
        assert _specificity_density("") == 0.0
        assert _specificity_density(None) == 0.0
 class TestPluginRichness:
    def test_three_plugins_maxes_out(self):
        assert _plugin_richness(["a", "b", "c"]) == 1.0
    def test_more_than_three_capped(self):
        assert _plugin_richness(["a", "b", "c", "d"]) == 1.0
    def test_empty(self):
        assert _plugin_richness([]) == 0.0
        assert _plugin_richness(None) == 0.0
 class TestTranscriptEnergy:
    def test_teaching_phrases_score_high(self):
        transcript = (
            "The trick is to notice how the compressor behaves. "
            "Because we want dynamics, I always set it gently. The key is balance."
        )
        score = _transcript_energy(transcript)
        assert score > 0.5
    def test_bland_transcript_scores_low(self):
        transcript = "And then we adjust this slider here. Okay that sounds fine."
        score = _transcript_energy(transcript)
        assert score < 0.3
    def test_empty(self):
        assert _transcript_energy("") == 0.0
        assert _transcript_energy(None) == 0.0
 class TestSourceQualityWeight:
    def test_structured_highest(self):
        assert _source_quality_weight("structured") == 1.0
    def test_none_default(self):
        assert _source_quality_weight(None) == 0.5
 class TestVideoTypeWeight:
    def test_tutorial_highest(self):
        assert _video_type_weight("tutorial") == 1.0
    def test_short_form_lowest(self):
        assert _video_type_weight("short_form") == 0.3
    def test_none_default(self):
        assert _video_type_weight(None) == 0.5
 # ── Audio proxy function tests ───────────────────────────────────────────────
 class TestExtractWordTimings:
    def test_filters_by_time_window(self):
        words = _make_word_timings(start=0.0, count=40, wps=4.0)
        segments = _make_transcript_segments(words)
        # Extract window 2.0–5.0s
        result = extract_word_timings(segments, start_time=2.0, end_time=5.0)
        for w in result:
            assert 2.0 <= w["start"] <= 5.0
    def test_returns_all_when_window_covers_entire_range(self):
        words = _make_word_timings(start=0.0, count=20, wps=4.0)
        segments = _make_transcript_segments(words)
        result = extract_word_timings(segments, start_time=0.0, end_time=100.0)
        assert len(result) == 20
    def test_empty_transcript_data(self):
        assert extract_word_timings([], start_time=0.0, end_time=10.0) == []
    def test_no_words_in_window(self):
        words = _make_word_timings(start=0.0, count=10, wps=4.0)
        segments = _make_transcript_segments(words)
        # Window far beyond the word timings
        result = extract_word_timings(segments, start_time=100.0, end_time=200.0)
        assert result == []
    def test_segments_without_words_key(self):
        """Segments missing 'words' are skipped gracefully."""
        segments = [{"text": "hello"}, {"words": [{"start": 1.0, "end": 1.2, "word": "a"}]}]
        result = extract_word_timings(segments, start_time=0.0, end_time=10.0)
        assert len(result) == 1
    def test_words_without_start_are_skipped(self):
        segments = [{"words": [{"end": 1.2, "word": "a"}, {"start": 2.0, "end": 2.2, "word": "b"}]}]
        result = extract_word_timings(segments, start_time=0.0, end_time=10.0)
        assert len(result) == 1
        assert result[0]["word"] == "b"
 class TestSpeechRateVariance:
    def test_none_returns_neutral(self):
        assert _speech_rate_variance(None) == 0.5
    def test_too_few_words_returns_neutral(self):
        timings = _make_word_timings(count=3, wps=4.0)
        assert _speech_rate_variance(timings) == 0.5
    def test_short_span_returns_neutral(self):
        """Words spanning <5s should return neutral."""
        timings = _make_word_timings(count=10, wps=4.0, start=0.0)
        # 10 words at 4 WPS = 2.5s span → too short
        assert _speech_rate_variance(timings) == 0.5
    def test_uniform_pace_scores_low(self):
        """Steady 4 WPS for 30s → low variance."""
        timings = _make_word_timings(start=0.0, count=120, wps=4.0)
        score = _speech_rate_variance(timings)
        assert score < 0.4, f"Uniform pace scored {score}, expected < 0.4"
    def test_varied_pace_scores_higher(self):
        """Alternating fast/slow sections → higher variance."""
        timings = []
        t = 0.0
        # Fast section: 6 WPS for 10s
        for i in range(60):
            dur = 0.12
            timings.append({"word": f"w{i}", "start": t, "end": t + dur})
            t += 1.0 / 6.0
        # Slow section: 2 WPS for 10s
        for i in range(20):
            dur = 0.3
            timings.append({"word": f"w{60+i}", "start": t, "end": t + dur})
            t += 0.5
        score = _speech_rate_variance(timings)
        uniform_score = _speech_rate_variance(
            _make_word_timings(start=0.0, count=80, wps=4.0)
        )
        assert score > uniform_score, (
            f"Varied pace ({score:.3f}) should be > uniform ({uniform_score:.3f})"
        )
    def test_score_bounded(self):
        timings = _make_word_timings(start=0.0, count=200, wps=4.0)
        score = _speech_rate_variance(timings)
        assert 0.0 <= score <= 1.0
 class TestPauseDensity:
    def test_none_returns_neutral(self):
        assert _pause_density(None) == 0.5
    def test_single_word_returns_neutral(self):
        assert _pause_density([{"start": 0.0, "end": 0.2}]) == 0.5
    def test_no_pauses_scores_zero(self):
        """Continuous speech with no gaps >0.5s → 0."""
        timings = _make_word_timings(start=0.0, count=60, wps=4.0)
        score = _pause_density(timings)
        assert score == 0.0
    def test_frequent_pauses_scores_high(self):
        """Pauses every 5 words → high density."""
        timings = _make_word_timings(start=0.0, count=60, wps=4.0, pause_every=5, pause_duration=0.8)
        score = _pause_density(timings)
        assert score > 0.5, f"Frequent pauses scored {score}, expected > 0.5"
    def test_long_pauses_weighted_more(self):
        """One 1.5s pause should score higher than one 0.6s pause in a longer segment."""
        # Build timings with one long pause at midpoint — 60 words for longer duration
        long_pause = []
        t = 0.0
        for i in range(60):
            long_pause.append({"word": f"w{i}", "start": t, "end": t + 0.15})
            t += 0.25
            if i == 29:
                t += 1.5  # long pause >1.0s
        # Build timings with one short pause — same word count
        short_pause = []
        t = 0.0
        for i in range(60):
            short_pause.append({"word": f"w{i}", "start": t, "end": t + 0.15})
            t += 0.25
            if i == 29:
                t += 0.6  # short pause >0.5s but <1.0s
        assert _pause_density(long_pause) > _pause_density(short_pause)
    def test_score_bounded(self):
        timings = _make_word_timings(start=0.0, count=60, wps=4.0, pause_every=3, pause_duration=1.5)
        score = _pause_density(timings)
        assert 0.0 <= score <= 1.0
 class TestSpeakingPaceFitness:
    def test_none_returns_neutral(self):
        assert _speaking_pace_fitness(None) == 0.5
    def test_single_word_returns_neutral(self):
        assert _speaking_pace_fitness([{"start": 0.0, "end": 0.2}]) == 0.5
    def test_optimal_pace_scores_high(self):
        """4 WPS (optimal teaching pace) → 1.0."""
        timings = _make_word_timings(start=0.0, count=40, wps=4.0)
        score = _speaking_pace_fitness(timings)
        assert score == 1.0, f"4 WPS scored {score}, expected 1.0"
    def test_three_wps_is_sweet_spot_edge(self):
        timings = _make_word_timings(start=0.0, count=30, wps=3.0)
        score = _speaking_pace_fitness(timings)
        assert score == 1.0
    def test_five_wps_is_sweet_spot_edge(self):
        timings = _make_word_timings(start=0.0, count=50, wps=5.0)
        score = _speaking_pace_fitness(timings)
        assert score > 0.95, f"5 WPS scored {score}, expected near 1.0"
    def test_too_slow_scores_lower(self):
        """1.5 WPS → below sweet spot."""
        timings = _make_word_timings(start=0.0, count=15, wps=1.5)
        score = _speaking_pace_fitness(timings)
        assert 0.4 < score < 0.6, f"1.5 WPS scored {score}, expected ~0.5"
    def test_too_fast_scores_lower(self):
        """8 WPS → above sweet spot."""
        timings = _make_word_timings(start=0.0, count=80, wps=8.0)
        score = _speaking_pace_fitness(timings)
        assert 0.0 < score < 1.0
    def test_very_fast_scores_zero(self):
        """10+ WPS → 0."""
        timings = _make_word_timings(start=0.0, count=110, wps=11.0)
        score = _speaking_pace_fitness(timings)
        assert score == 0.0
    def test_zero_wps_scores_zero(self):
        """Very short duration → neutral."""
        timings = [{"start": 0.0, "end": 0.01}, {"start": 0.005, "end": 0.015}]
        score = _speaking_pace_fitness(timings)
        # Duration ~0.015s → too short → 0.5 (neutral)
        assert score == 0.5
    def test_score_bounded(self):
        for wps in [0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 10.0]:
            timings = _make_word_timings(start=0.0, count=max(10, int(wps * 10)), wps=wps)
            score = _speaking_pace_fitness(timings)
            assert 0.0 <= score <= 1.0, f"WPS {wps} scored {score} out of bounds"
 class TestBackwardCompatibility:
    """Ensure the weight rebalancing doesn't break existing relative orderings."""
    def test_ideal_still_beats_poor(self):
        ideal = score_moment(**_ideal_moment())
        poor = score_moment(**_poor_moment())
        assert ideal["score"] > poor["score"]
    def test_ideal_still_above_threshold(self):
        result = score_moment(**_ideal_moment())
        assert result["score"] > 0.6, f"Ideal scored {result['score']}, expected > 0.6"
    def test_poor_still_below_threshold(self):
        result = score_moment(**_poor_moment())
        assert result["score"] < 0.45, f"Poor scored {result['score']}, expected < 0.45"
    def test_weights_sum_to_one(self):
        from backend.pipeline.highlight_scorer import _WEIGHTS
        assert abs(sum(_WEIGHTS.values()) - 1.0) < 1e-9
--- a/backend/pipeline/test_section_embedding.py
+++ b/backend/pipeline/test_section_embedding.py
@ -0,0 +1,328 @@
 """Unit tests for per-section embedding in stage 6.
 Tests _slugify_heading, section embed text construction, delete-before-upsert
 ordering, v1 page skipping, upsert payload correctness, and deterministic UUIDs.
 """
 from __future__ import annotations
 import uuid
 from unittest.mock import MagicMock, call, patch
 import pytest
 # ── slugify tests ────────────────────────────────────────────────────────────
 from pipeline.stages import _slugify_heading
 class TestSlugifyHeading:
    """Verify _slugify_heading matches frontend TableOfContents.tsx slugify."""
    def test_simple_heading(self):
        assert _slugify_heading("Grain Position Control") == "grain-position-control"
    def test_ampersand_and_special_chars(self):
        # Consecutive non-alphanumeric chars collapse to a single hyphen
        assert _slugify_heading("LFO Routing & Modulation") == "lfo-routing-modulation"
    def test_leading_trailing_special(self):
        assert _slugify_heading("  —Hello World!  ") == "hello-world"
    def test_numbers_preserved(self):
        assert _slugify_heading("Step 1: Setup") == "step-1-setup"
    def test_empty_string(self):
        assert _slugify_heading("") == ""
    def test_only_special_chars(self):
        assert _slugify_heading("!@#$%") == ""
    def test_unicode_stripped(self):
        assert _slugify_heading("Café Sounds") == "caf-sounds"
    def test_multiple_hyphens_collapse(self):
        assert _slugify_heading("A -- B --- C") == "a-b-c"
 # ── Deterministic UUID tests ─────────────────────────────────────────────────
 _QDRANT_NAMESPACE = uuid.UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890")
 class TestDeterministicUUIDs:
    """Verify same page+section always produces the same point ID."""
    def test_same_input_same_uuid(self):
        id1 = str(uuid.uuid5(_QDRANT_NAMESPACE, "ts:page-abc:grain-position-control"))
        id2 = str(uuid.uuid5(_QDRANT_NAMESPACE, "ts:page-abc:grain-position-control"))
        assert id1 == id2
    def test_different_section_different_uuid(self):
        id1 = str(uuid.uuid5(_QDRANT_NAMESPACE, "ts:page-abc:section-a"))
        id2 = str(uuid.uuid5(_QDRANT_NAMESPACE, "ts:page-abc:section-b"))
        assert id1 != id2
 # ── QdrantManager section methods ────────────────────────────────────────────
 class TestQdrantManagerSections:
    """Test upsert_technique_sections and delete_sections_by_page_id."""
    def _make_manager(self):
        """Create a QdrantManager with a mocked client."""
        with patch("pipeline.qdrant_client.QdrantClient") as MockClient:
            mock_client = MockClient.return_value
            from pipeline.qdrant_client import QdrantManager
            settings = MagicMock()
            settings.qdrant_url = "http://localhost:6333"
            settings.qdrant_collection = "test_collection"
            settings.embedding_dimensions = 768
            mgr = QdrantManager(settings)
            mgr._client = mock_client
            return mgr, mock_client
    def test_upsert_builds_correct_payloads(self):
        mgr, mock_client = self._make_manager()
        sections = [
            {
                "page_id": "p1",
                "creator_id": "c1",
                "creator_name": "Keota",
                "title": "Granular Synthesis",
                "slug": "granular-synthesis",
                "section_heading": "Grain Position Control",
                "section_anchor": "grain-position-control",
                "topic_category": "Sound Design",
                "topic_tags": ["granular", "synthesis"],
                "summary": "Control the grain position parameter.",
            },
        ]
        vectors = [[0.1] * 768]
        mgr.upsert_technique_sections(sections, vectors)
        # Verify upsert was called
        assert mock_client.upsert.called
        points = mock_client.upsert.call_args[1]["points"]
        assert len(points) == 1
        payload = points[0].payload
        assert payload["type"] == "technique_section"
        assert payload["page_id"] == "p1"
        assert payload["section_heading"] == "Grain Position Control"
        assert payload["section_anchor"] == "grain-position-control"
        assert payload["slug"] == "granular-synthesis"
        # Verify deterministic UUID
        expected_id = str(uuid.uuid5(_QDRANT_NAMESPACE, "ts:p1:grain-position-control"))
        assert points[0].id == expected_id
    def test_upsert_count_mismatch_skips(self):
        mgr, mock_client = self._make_manager()
        mgr.upsert_technique_sections([{"page_id": "p1"}], [[0.1], [0.2]])
        assert not mock_client.upsert.called
    def test_upsert_empty_list_skips(self):
        mgr, mock_client = self._make_manager()
        mgr.upsert_technique_sections([], [])
        assert not mock_client.upsert.called
    def test_summary_truncated_to_200_chars(self):
        mgr, mock_client = self._make_manager()
        long_summary = "x" * 500
        sections = [{
            "page_id": "p1", "section_heading": "H", "section_anchor": "h",
            "summary": long_summary,
        }]
        vectors = [[0.1] * 768]
        mgr.upsert_technique_sections(sections, vectors)
        payload = mock_client.upsert.call_args[1]["points"][0].payload
        assert len(payload["summary"]) == 200
    def test_delete_sections_by_page_id(self):
        mgr, mock_client = self._make_manager()
        mgr.delete_sections_by_page_id("p1")
        assert mock_client.delete.called
        filter_arg = mock_client.delete.call_args[1]["points_selector"]
        # Verify filter has both page_id and type conditions
        must_conditions = filter_arg.must
        assert len(must_conditions) == 2
        keys = {c.key for c in must_conditions}
        assert keys == {"page_id", "type"}
    def test_delete_sections_logs_on_failure(self):
        mgr, mock_client = self._make_manager()
        mock_client.delete.side_effect = Exception("connection refused")
        # Should not raise
        mgr.delete_sections_by_page_id("p1")
 # ── Stage 6 section embedding logic ─────────────────────────────────────────
 class TestStage6SectionEmbedding:
    """Test the section embedding block within stage6_embed_and_index.
    Uses mocked DB, embedding client, and QdrantManager to verify:
    - v2 pages produce section points
    - v1 pages are skipped
    - delete is called before upsert
    - embed text includes creator/page/section context
    - sections with empty headings are skipped
    - subsection content is included in embed text
    """
    def _make_page(self, page_id="p1", creator_id="c1", format_="v2",
                   body_sections=None, title="Granular Synthesis",
                   slug="granular-synthesis"):
        """Create a mock TechniquePage-like object."""
        page = MagicMock()
        page.id = page_id
        page.creator_id = creator_id
        page.body_sections_format = format_
        page.body_sections = body_sections
        page.title = title
        page.slug = slug
        page.topic_category = "Sound Design"
        page.topic_tags = ["granular"]
        page.summary = "Page summary"
        return page
    def test_v1_page_produces_zero_sections(self):
        """Pages with body_sections_format != 'v2' should be skipped."""
        page = self._make_page(format_="v1", body_sections=[
            {"heading": "Section A", "content": "Content A"},
        ])
        v2_pages = [p for p in [page] if getattr(p, "body_sections_format", "v1") == "v2"]
        assert len(v2_pages) == 0
    def test_v2_page_none_body_sections(self):
        """Page with body_sections=None → skipped (not a list)."""
        page = self._make_page(format_="v2", body_sections=None)
        v2_pages = [p for p in [page] if getattr(p, "body_sections_format", "v1") == "v2"]
        assert len(v2_pages) == 1
        # body_sections is None → not a list → skipped in the loop
        assert not isinstance(page.body_sections, list)
    def test_section_empty_heading_skipped(self):
        """Sections with empty heading should be skipped."""
        page = self._make_page(body_sections=[
            {"heading": "", "content": "Orphan content"},
            {"heading": "Valid", "content": "Real content"},
        ])
        sections_with_heading = [
            s for s in page.body_sections
            if isinstance(s, dict) and s.get("heading", "").strip()
        ]
        assert len(sections_with_heading) == 1
        assert sections_with_heading[0]["heading"] == "Valid"
    def test_subsection_content_included_in_embed_text(self):
        """Section with subsections should include subsection content."""
        section = {
            "heading": "Grain Position Control",
            "content": "Main content",
            "subsections": [
                {"heading": "Fine Tuning", "content": "Fine tune the position."},
                {"heading": "Automation", "content": "Automate grain pos."},
            ],
        }
        # Reproduce the embed text construction from stage 6
        creator_name = "Keota"
        page_title = "Granular Synthesis"
        heading = section["heading"]
        section_content = section.get("content", "")
        subsection_parts = []
        for sub in section.get("subsections", []):
            if isinstance(sub, dict):
                sub_heading = sub.get("heading", "")
                sub_content = sub.get("content", "")
                if sub_heading:
                    subsection_parts.append(f"{sub_heading}: {sub_content}")
                elif sub_content:
                    subsection_parts.append(sub_content)
        embed_text = (
            f"{creator_name} {page_title} — {heading}: "
            f"{section_content} {' '.join(subsection_parts)}"
        ).strip()
        assert "Fine Tuning: Fine tune the position." in embed_text
        assert "Automation: Automate grain pos." in embed_text
        assert "Keota Granular Synthesis" in embed_text
    def test_subsection_no_direct_content(self):
        """Section with subsections but no direct content still embeds subsection text."""
        section = {
            "heading": "Advanced Techniques",
            "content": "",
            "subsections": [
                {"heading": "Sub A", "content": "Content A"},
            ],
        }
        heading = section["heading"]
        section_content = section.get("content", "")
        subsection_parts = []
        for sub in section.get("subsections", []):
            if isinstance(sub, dict):
                sub_heading = sub.get("heading", "")
                sub_content = sub.get("content", "")
                if sub_heading:
                    subsection_parts.append(f"{sub_heading}: {sub_content}")
                elif sub_content:
                    subsection_parts.append(sub_content)
        embed_text = (
            f"Creator Page — {heading}: "
            f"{section_content} {' '.join(subsection_parts)}"
        ).strip()
        assert "Sub A: Content A" in embed_text
    def test_delete_called_before_upsert_ordering(self):
        """Verify delete_sections_by_page_id is called before upsert_technique_sections."""
        call_order = []
        mock_qdrant = MagicMock()
        mock_qdrant.delete_sections_by_page_id.side_effect = lambda pid: call_order.append(("delete", pid))
        mock_qdrant.upsert_technique_sections.side_effect = lambda s, v: call_order.append(("upsert", len(s)))
        mock_embed = MagicMock()
        mock_embed.embed.return_value = [[0.1] * 768]  # One vector
        page = self._make_page(body_sections=[
            {"heading": "Section A", "content": "Content A"},
        ])
        creator_map = {str(page.creator_id): "TestCreator"}
        v2_pages = [page]
        page_id_str = str(page.id)
        # Simulate the section embedding block
        for p in v2_pages:
            body_sections = p.body_sections
            if not isinstance(body_sections, list):
                continue
            creator_name = creator_map.get(str(p.creator_id), "")
            mock_qdrant.delete_sections_by_page_id(str(p.id))
            section_texts = []
            section_dicts = []
            for section in body_sections:
                if not isinstance(section, dict):
                    continue
                heading = section.get("heading", "")
                if not heading or not heading.strip():
                    continue
                section_anchor = _slugify_heading(heading)
                section_texts.append(f"{creator_name} {p.title} — {heading}")
                section_dicts.append({"page_id": str(p.id), "section_anchor": section_anchor})
            if section_texts:
                vectors = mock_embed.embed(section_texts)
                if vectors:
                    mock_qdrant.upsert_technique_sections(section_dicts, vectors)
        assert call_order[0][0] == "delete"
        assert call_order[1][0] == "upsert"
--- a/backend/pytest.ini
+++ b/backend/pytest.ini
@ -0,0 +1,3 @@
 [pytest]
 asyncio_mode = auto
 testpaths = tests
--- a/backend/rate_limiter.py
+++ b/backend/rate_limiter.py
@ -0,0 +1,116 @@
 """Redis sliding-window rate limiter using sorted sets.
 Each rate limit key is a Redis sorted set where members are unique
 request identifiers (timestamps with microseconds) and scores are
 Unix timestamps. On each check, expired entries are pruned, the
 current request is added, and the count determines whether the
 request is allowed.
 Fail-open: If Redis is unavailable, requests are allowed through
 with a WARNING log.
 """
 from __future__ import annotations
 import logging
 import time
 from dataclasses import dataclass
 import redis.asyncio as aioredis
 logger = logging.getLogger("chrysopedia.rate_limiter")
 _KEY_PREFIX = "chrysopedia:ratelimit"
@dataclass
 class RateLimitResult:
    """Result of a rate limit check."""
    allowed: bool
    remaining: int
    retry_after: int  # seconds until the window slides enough to allow a request; 0 if allowed
 class RateLimiter:
    """Sliding-window rate limiter backed by Redis sorted sets.
    Usage::
        limiter = RateLimiter(redis)
        result = await limiter.check_rate_limit("user:abc123", limit=30, window_seconds=3600)
        if not result.allowed:
            return 429, result.retry_after
    """
    def __init__(self, redis: aioredis.Redis) -> None:
        self._redis = redis
    @staticmethod
    def key(scope: str, identifier: str) -> str:
        """Build a namespaced Redis key for a rate limit bucket."""
        return f"{_KEY_PREFIX}:{scope}:{identifier}"
    async def check_rate_limit(
        self,
        key: str,
        limit: int,
        window_seconds: int = 3600,
    ) -> RateLimitResult:
        """Check whether a request is within the rate limit.
        Uses a sorted set where:
        - ZREMRANGEBYSCORE prunes entries older than the window
        - ZCARD counts current entries
        - ZADD adds the current request if under limit
        Returns a RateLimitResult with allowed/remaining/retry_after.
        On Redis errors, fails open (allowed=True).
        """
        now = time.time()
        window_start = now - window_seconds
        try:
            pipe = self._redis.pipeline(transaction=True)
            # Remove expired entries
            pipe.zremrangebyscore(key, "-inf", window_start)
            # Count remaining entries
            pipe.zcard(key)
            results = await pipe.execute()
            current_count: int = results[1]
            if current_count >= limit:
                # Over limit — calculate retry_after from oldest entry
                oldest = await self._redis.zrange(key, 0, 0, withscores=True)
                if oldest:
                    oldest_score = oldest[0][1]
                    retry_after = int(oldest_score + window_seconds - now) + 1
                    retry_after = max(retry_after, 1)
                else:
                    retry_after = window_seconds
                return RateLimitResult(
                    allowed=False,
                    remaining=0,
                    retry_after=retry_after,
                )
            # Under limit — add this request
            member = f"{now}:{id(key)}"  # unique member per call
            await self._redis.zadd(key, {member: now})
            # Set TTL on the key so it auto-expires after the window
            await self._redis.expire(key, window_seconds + 60)
            remaining = limit - current_count - 1
            return RateLimitResult(
                allowed=True,
                remaining=max(remaining, 0),
                retry_after=0,
            )
        except Exception:
            logger.warning(
                "rate_limit_redis_error key=%s — failing open", key, exc_info=True
            )
            return RateLimitResult(allowed=True, remaining=limit, retry_after=0)
--- a/backend/redis_client.py
+++ b/backend/redis_client.py
@ -0,0 +1,15 @@
 """Async Redis client helper for Chrysopedia."""
 import redis.asyncio as aioredis
 from config import get_settings
 async def get_redis() -> aioredis.Redis:
    """Return an async Redis client from the configured URL.
    Callers should close the connection when done, or use it
    as a short-lived client within a request handler.
    """
    settings = get_settings()
    return aioredis.from_url(settings.redis_url, decode_responses=True)
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -0,0 +1,23 @@
 fastapi>=0.115.0,<1.0
 uvicorn[standard]>=0.32.0,<1.0
 sqlalchemy[asyncio]>=2.0,<3.0
 asyncpg>=0.30.0,<1.0
 alembic>=1.14.0,<2.0
 pydantic>=2.0,<3.0
 pydantic-settings>=2.0,<3.0
 celery[redis]>=5.4.0,<6.0
 redis>=5.0,<6.0
 python-dotenv>=1.0,<2.0
 python-multipart>=0.0.9,<1.0
 httpx>=0.27.0,<1.0
 openai>=1.0,<2.0
 qdrant-client>=1.9,<2.0
 pyyaml>=6.0,<7.0
 psycopg2-binary>=2.9,<3.0
 watchdog>=4.0,<5.0
 PyJWT>=2.8,<3.0
 bcrypt>=4.0,<6.0
 minio>=7.2,<8.0
 # Test dependencies
 pytest>=8.0,<10.0
 pytest-asyncio>=0.24,<1.0
--- a/backend/routers/init.py
+++ b/backend/routers/init.py
@ -0,0 +1 @@
 """Chrysopedia API routers package."""
--- a/backend/routers/admin.py
+++ b/backend/routers/admin.py
@ -0,0 +1,417 @@
 """Admin router — user management, impersonation, and usage analytics."""
 from __future__ import annotations
 import logging
 from datetime import datetime, timedelta, timezone
 from typing import Annotated
 from uuid import UUID
 from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
 from pydantic import BaseModel
 from sqlalchemy import func, select
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.orm import aliased
 from auth import (
    create_impersonation_token,
    decode_access_token,
    get_current_user,
    require_role,
 )
 from database import get_session
 from models import ChatUsageLog, ImpersonationLog, User, UserRole
 logger = logging.getLogger("chrysopedia.admin")
 router = APIRouter(prefix="/admin", tags=["admin"])
 _require_admin = require_role(UserRole.admin)
 # ── Schemas ──────────────────────────────────────────────────────────────────
 class UserListItem(BaseModel):
    id: str
    email: str
    display_name: str
    role: str
    creator_id: str | None
    is_active: bool
    class Config:
        from_attributes = True
 class ImpersonateResponse(BaseModel):
    access_token: str
    token_type: str = "bearer"
    target_user: UserListItem
 class StopImpersonateResponse(BaseModel):
    message: str
 class StartImpersonationRequest(BaseModel):
    write_mode: bool = False
 class ImpersonationLogItem(BaseModel):
    id: str
    admin_name: str
    target_name: str
    action: str
    write_mode: bool
    ip_address: str | None
    created_at: datetime
 # ── Helpers ──────────────────────────────────────────────────────────────────
 def _client_ip(request: Request) -> str | None:
    """Best-effort client IP from X-Forwarded-For or direct connection."""
    forwarded = request.headers.get("x-forwarded-for")
    if forwarded:
        return forwarded.split(",")[0].strip()
    if request.client:
        return request.client.host
    return None
 # ── Endpoints ────────────────────────────────────────────────────────────────
@router.get("/users", response_model=list[UserListItem])
 async def list_users(
    _admin: Annotated[User, Depends(_require_admin)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """List all users. Admin only."""
    result = await session.execute(
        select(User).order_by(User.display_name)
    )
    users = result.scalars().all()
    return [
        UserListItem(
            id=str(u.id),
            email=u.email,
            display_name=u.display_name,
            role=u.role.value,
            creator_id=str(u.creator_id) if u.creator_id else None,
            is_active=u.is_active,
        )
        for u in users
    ]
@router.post("/impersonate/{user_id}", response_model=ImpersonateResponse)
 async def start_impersonation(
    user_id: UUID,
    request: Request,
    admin: Annotated[User, Depends(_require_admin)],
    session: Annotated[AsyncSession, Depends(get_session)],
    body: StartImpersonationRequest | None = None,
 ):
    """Start impersonating a user. Admin only. Returns a scoped JWT."""
    if body is None:
        body = StartImpersonationRequest()
    # Cannot impersonate yourself
    if admin.id == user_id:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Cannot impersonate yourself",
        )
    # Load target user
    result = await session.execute(select(User).where(User.id == user_id))
    target = result.scalar_one_or_none()
    if target is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Target user not found",
        )
    # Create impersonation token
    token = create_impersonation_token(
        admin_user_id=admin.id,
        target_user_id=target.id,
        target_role=target.role.value,
        write_mode=body.write_mode,
    )
    # Audit log
    session.add(ImpersonationLog(
        admin_user_id=admin.id,
        target_user_id=target.id,
        action="start",
        write_mode=body.write_mode,
        ip_address=_client_ip(request),
    ))
    await session.commit()
    logger.info(
        "Impersonation started: admin=%s target=%s write_mode=%s",
        admin.id, target.id, body.write_mode,
    )
    return ImpersonateResponse(
        access_token=token,
        target_user=UserListItem(
            id=str(target.id),
            email=target.email,
            display_name=target.display_name,
            role=target.role.value,
            creator_id=str(target.creator_id) if target.creator_id else None,
            is_active=target.is_active,
        ),
    )
@router.post("/impersonate/stop", response_model=StopImpersonateResponse)
 async def stop_impersonation(
    request: Request,
    current_user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Stop impersonation. Requires a valid impersonation token."""
    admin_id = getattr(current_user, "_impersonating_admin_id", None)
    if admin_id is None:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="Not currently impersonating",
        )
    # Audit log
    session.add(ImpersonationLog(
        admin_user_id=admin_id,
        target_user_id=current_user.id,
        action="stop",
        ip_address=_client_ip(request),
    ))
    await session.commit()
    logger.info(
        "Impersonation stopped: admin=%s target=%s",
        admin_id, current_user.id,
    )
    return StopImpersonateResponse(message="Impersonation ended")
@router.get("/impersonation-log", response_model=list[ImpersonationLogItem])
 async def get_impersonation_log(
    _admin: Annotated[User, Depends(_require_admin)],
    session: Annotated[AsyncSession, Depends(get_session)],
    page: int = Query(1, ge=1),
    page_size: int = Query(50, ge=1, le=200),
 ):
    """Paginated impersonation audit log. Admin only."""
    AdminUser = aliased(User, name="admin_user")
    TargetUser = aliased(User, name="target_user")
    stmt = (
        select(ImpersonationLog, AdminUser.display_name, TargetUser.display_name)
        .join(AdminUser, ImpersonationLog.admin_user_id == AdminUser.id)
        .join(TargetUser, ImpersonationLog.target_user_id == TargetUser.id)
        .order_by(ImpersonationLog.created_at.desc())
        .offset((page - 1) * page_size)
        .limit(page_size)
    )
    result = await session.execute(stmt)
    rows = result.all()
    return [
        ImpersonationLogItem(
            id=str(log.id),
            admin_name=admin_name,
            target_name=target_name,
            action=log.action,
            write_mode=log.write_mode,
            ip_address=log.ip_address,
            created_at=log.created_at,
        )
        for log, admin_name, target_name in rows
    ]
@router.post("/creators/{slug}/extract-profile")
 async def extract_creator_profile(
    slug: str,
    _admin: Annotated[User, Depends(_require_admin)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Queue personality profile extraction for a creator. Admin only."""
    from models import Creator
    result = await session.execute(
        select(Creator).where(Creator.slug == slug)
    )
    creator = result.scalar_one_or_none()
    if creator is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail=f"Creator not found: {slug}",
        )
    from pipeline.stages import extract_personality_profile
    extract_personality_profile.delay(str(creator.id))
    logger.info("Queued personality extraction for creator=%s (%s)", slug, creator.id)
    return {"status": "queued", "creator_id": str(creator.id)}
 # ── Usage Analytics ──────────────────────────────────────────────────────────
 class _PeriodStats(BaseModel):
    request_count: int
    total_tokens: int
    prompt_tokens: int
    completion_tokens: int
 class _CreatorUsage(BaseModel):
    creator_slug: str
    request_count: int
    total_tokens: int
 class _UserUsage(BaseModel):
    identifier: str  # display_name or IP
    request_count: int
    total_tokens: int
 class _DailyCount(BaseModel):
    date: str  # ISO date YYYY-MM-DD
    request_count: int
 class UsageStatsResponse(BaseModel):
    today: _PeriodStats
    week: _PeriodStats
    month: _PeriodStats
    top_creators: list[_CreatorUsage]
    top_users: list[_UserUsage]
    daily_counts: list[_DailyCount]
 async def _period_stats(
    session: AsyncSession, since: datetime,
 ) -> _PeriodStats:
    """Aggregate token stats for chat usage since a given timestamp."""
    stmt = select(
        func.count().label("cnt"),
        func.coalesce(func.sum(ChatUsageLog.total_tokens), 0).label("total"),
        func.coalesce(func.sum(ChatUsageLog.prompt_tokens), 0).label("prompt"),
        func.coalesce(func.sum(ChatUsageLog.completion_tokens), 0).label("completion"),
    ).where(ChatUsageLog.created_at >= since)
    row = (await session.execute(stmt)).one()
    return _PeriodStats(
        request_count=row.cnt,
        total_tokens=row.total,
        prompt_tokens=row.prompt,
        completion_tokens=row.completion,
    )
@router.get("/usage", response_model=UsageStatsResponse)
 async def get_usage_stats(
    _admin: Annotated[User, Depends(_require_admin)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Aggregated chat usage statistics. Admin only."""
    now = datetime.now(timezone.utc).replace(tzinfo=None)
    today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
    week_start = today_start - timedelta(days=today_start.weekday())  # Monday
    month_start = today_start.replace(day=1)
    today = await _period_stats(session, today_start)
    week = await _period_stats(session, week_start)
    month = await _period_stats(session, month_start)
    # Top 10 creators by total tokens (this month)
    creator_stmt = (
        select(
            ChatUsageLog.creator_slug,
            func.count().label("cnt"),
            func.coalesce(func.sum(ChatUsageLog.total_tokens), 0).label("total"),
        )
        .where(
            ChatUsageLog.created_at >= month_start,
            ChatUsageLog.creator_slug.isnot(None),
        )
        .group_by(ChatUsageLog.creator_slug)
        .order_by(func.sum(ChatUsageLog.total_tokens).desc())
        .limit(10)
    )
    creator_rows = (await session.execute(creator_stmt)).all()
    top_creators = [
        _CreatorUsage(creator_slug=r.creator_slug, request_count=r.cnt, total_tokens=r.total)
        for r in creator_rows
    ]
    # Top 10 users by request count (this month)
    # Join with users table to get display_name; fall back to IP for anonymous
    user_stmt = (
        select(
            ChatUsageLog.user_id,
            ChatUsageLog.client_ip,
            func.count().label("cnt"),
            func.coalesce(func.sum(ChatUsageLog.total_tokens), 0).label("total"),
        )
        .where(ChatUsageLog.created_at >= month_start)
        .group_by(ChatUsageLog.user_id, ChatUsageLog.client_ip)
        .order_by(func.count().desc())
        .limit(10)
    )
    user_rows = (await session.execute(user_stmt)).all()
    # Resolve user display names
    user_ids = [r.user_id for r in user_rows if r.user_id is not None]
    name_map: dict[str, str] = {}
    if user_ids:
        name_result = await session.execute(
            select(User.id, User.display_name).where(User.id.in_(user_ids))
        )
        for uid, name in name_result.all():
            name_map[str(uid)] = name
    top_users = [
        _UserUsage(
            identifier=name_map.get(str(r.user_id), r.client_ip or "anonymous")
            if r.user_id
            else (r.client_ip or "anonymous"),
            request_count=r.cnt,
            total_tokens=r.total,
        )
        for r in user_rows
    ]
    # Daily request counts for last 7 days
    seven_days_ago = today_start - timedelta(days=6)
    day_col = func.date_trunc("day", ChatUsageLog.created_at).label("day")
    daily_stmt = (
        select(day_col, func.count().label("cnt"))
        .where(ChatUsageLog.created_at >= seven_days_ago)
        .group_by(day_col)
        .order_by(day_col)
    )
    daily_rows = (await session.execute(daily_stmt)).all()
    daily_counts = [
        _DailyCount(date=r.day.strftime("%Y-%m-%d"), request_count=r.cnt)
        for r in daily_rows
    ]
    return UsageStatsResponse(
        today=today,
        week=week,
        month=month,
        top_creators=top_creators,
        top_users=top_users,
        daily_counts=daily_counts,
    )
--- a/backend/routers/auth.py
+++ b/backend/routers/auth.py
@ -0,0 +1,189 @@
 """Auth router — registration, login, profile management."""
 from __future__ import annotations
 import logging
 from datetime import datetime, timezone
 from typing import Annotated
 from fastapi import APIRouter, Depends, HTTPException, status
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession
 from auth import (
    create_access_token,
    get_current_user,
    hash_password,
    reject_impersonation,
    verify_password,
 )
 from database import get_session
 from models import Creator, InviteCode, User
 from schemas import (
    LoginRequest,
    RegisterRequest,
    TokenResponse,
    UpdateProfileRequest,
    UserResponse,
 )
 logger = logging.getLogger("chrysopedia.auth")
 router = APIRouter(prefix="/auth", tags=["auth"])
 # ── Registration ─────────────────────────────────────────────────────────────
@router.post("/register", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
 async def register(
    body: RegisterRequest,
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Register a new user with a valid invite code."""
    # 1. Validate invite code
    result = await session.execute(
        select(InviteCode).where(InviteCode.code == body.invite_code)
    )
    invite = result.scalar_one_or_none()
    if invite is None:
        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid invite code")
    now = datetime.now(timezone.utc).replace(tzinfo=None)
    if invite.expires_at is not None and invite.expires_at < now:
        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invite code has expired")
    if invite.uses_remaining <= 0:
        raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invite code exhausted")
    # 2. Check email uniqueness
    existing = await session.execute(select(User).where(User.email == body.email))
    if existing.scalar_one_or_none() is not None:
        raise HTTPException(status_code=status.HTTP_409_CONFLICT, detail="Email already registered")
    # 3. Optionally resolve creator_id from slug
    creator_id = None
    if body.creator_slug:
        creator_result = await session.execute(
            select(Creator).where(Creator.slug == body.creator_slug)
        )
        creator = creator_result.scalar_one_or_none()
        if creator is not None:
            creator_id = creator.id
    # 4. Create user
    user = User(
        email=body.email,
        hashed_password=hash_password(body.password),
        display_name=body.display_name,
        creator_id=creator_id,
    )
    session.add(user)
    # 5. Decrement invite code uses
    invite.uses_remaining -= 1
    await session.commit()
    await session.refresh(user)
    logger.info("User registered: %s (email=%s)", user.id, user.email)
    return user
 # ── Login ────────────────────────────────────────────────────────────────────
@router.post("/login", response_model=TokenResponse)
 async def login(
    body: LoginRequest,
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Authenticate with email + password, return JWT."""
    result = await session.execute(select(User).where(User.email == body.email))
    user = result.scalar_one_or_none()
    if user is None or not verify_password(body.password, user.hashed_password):
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid email or password",
        )
    token = create_access_token(user.id, user.role.value)
    logger.info("User logged in: %s", user.id)
    return TokenResponse(access_token=token)
 # ── Profile ──────────────────────────────────────────────────────────────────
@router.get("/me", response_model=UserResponse)
 async def get_profile(
    current_user: Annotated[User, Depends(get_current_user)],
 ):
    """Return the current user's profile."""
    resp = UserResponse.model_validate(current_user)
    admin_id = getattr(current_user, "_impersonating_admin_id", None)
    if admin_id is not None:
        resp.impersonating = True
    return resp
@router.put("/me", response_model=UserResponse)
 async def update_profile(
    body: UpdateProfileRequest,
    current_user: Annotated[User, Depends(reject_impersonation)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Update the current user's display name and/or password."""
    if body.display_name is not None:
        current_user.display_name = body.display_name
    if body.new_password is not None:
        if body.current_password is None:
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail="Current password required to set new password",
            )
        if not verify_password(body.current_password, current_user.hashed_password):
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail="Current password is incorrect",
            )
        current_user.hashed_password = hash_password(body.new_password)
    await session.commit()
    await session.refresh(current_user)
    logger.info("Profile updated: %s", current_user.id)
    return current_user
 # ── Seed ─────────────────────────────────────────────────────────────────────
 async def seed_invite_codes(session: AsyncSession) -> None:
    """Create default invite code if none exist. Call from lifespan or CLI."""
    result = await session.execute(select(InviteCode))
    if result.scalar_one_or_none() is None:
        session.add(InviteCode(
            code="CHRYSOPEDIA-ALPHA-2026",
            uses_remaining=100,
        ))
        await session.commit()
        logger.info("Seeded default invite code: CHRYSOPEDIA-ALPHA-2026")
 # ── Onboarding ───────────────────────────────────────────────────────────────
@router.post("/onboarding-complete", response_model=UserResponse)
 async def complete_onboarding(
    current_user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Mark the current user's onboarding as completed."""
    current_user.onboarding_completed = True
    await session.commit()
    await session.refresh(current_user)
    logger.info("Onboarding completed: %s", current_user.id)
    return UserResponse.model_validate(current_user)
--- a/backend/routers/chat.py
+++ b/backend/routers/chat.py
@ -0,0 +1,145 @@
 """Chat endpoint — POST /api/v1/chat with SSE streaming response.
 Accepts a query and optional creator filter, returns a Server-Sent Events
 stream with sources, token, done, and error events.
 Rate limiting: per-user (authenticated), per-IP (anonymous), and per-creator.
 """
 from __future__ import annotations
 import logging
 from fastapi import APIRouter, Depends, Request
 from fastapi.responses import JSONResponse, StreamingResponse
 from pydantic import BaseModel, Field
 from sqlalchemy.ext.asyncio import AsyncSession
 from auth import get_optional_user
 from chat_service import ChatService
 from config import Settings, get_settings
 from database import get_session
 from models import User
 from rate_limiter import RateLimiter
 from redis_client import get_redis
 logger = logging.getLogger("chrysopedia.chat.router")
 router = APIRouter(prefix="/chat", tags=["chat"])
 class ChatRequest(BaseModel):
    """Request body for the chat endpoint."""
    query: str = Field(..., min_length=1, max_length=1000)
    creator: str | None = None
    conversation_id: str | None = None
    personality_weight: float = Field(default=0.0, ge=0.0, le=1.0)
 def _get_client_ip(request: Request) -> str:
    """Extract client IP, preferring X-Forwarded-For behind a reverse proxy."""
    forwarded = request.headers.get("x-forwarded-for")
    if forwarded:
        return forwarded.split(",")[0].strip()
    return request.client.host if request.client else "unknown"
@router.post("", response_model=None)
 async def chat(
    body: ChatRequest,
    request: Request,
    db: AsyncSession = Depends(get_session),
    settings: Settings = Depends(get_settings),
    user: User | None = Depends(get_optional_user),
 ):
    """Stream a chat response as Server-Sent Events.
    Rate limits are checked before processing:
    - Authenticated users: ``rate_limit_user_per_hour`` requests/hour
    - Anonymous (IP-based): ``rate_limit_ip_per_hour`` requests/hour
    - Per-creator (if creator filter set): ``rate_limit_creator_per_hour`` requests/hour
    SSE protocol:
    - ``event: sources`` — citation metadata array (sent first)
    - ``event: token``   — streamed text chunk (repeated)
    - ``event: done``    — completion metadata with cascade_tier, conversation_id
    - ``event: error``   — error message (on failure)
    """
    client_ip = _get_client_ip(request)
    user_id = user.id if user else None
    logger.info(
        "chat_request query=%r creator=%r cid=%r weight=%.2f user=%s ip=%s",
        body.query, body.creator, body.conversation_id,
        body.personality_weight, user_id, client_ip,
    )
    redis = await get_redis()
    # ── Rate limiting ───────────────────────────────────────────────────
    limiter = RateLimiter(redis)
    # User-based limit (authenticated) or IP-based limit (anonymous)
    if user_id:
        identity_key = RateLimiter.key("user", str(user_id))
        identity_limit = settings.rate_limit_user_per_hour
    else:
        identity_key = RateLimiter.key("ip", client_ip)
        identity_limit = settings.rate_limit_ip_per_hour
    result = await limiter.check_rate_limit(identity_key, identity_limit, window_seconds=3600)
    if not result.allowed:
        scope = "user" if user_id else "ip"
        logger.warning(
            "rate_limit_exceeded scope=%s key=%s remaining=%d retry_after=%d",
            scope, identity_key, result.remaining, result.retry_after,
        )
        return JSONResponse(
            status_code=429,
            content={
                "error": "Rate limit exceeded",
                "retry_after": result.retry_after,
            },
            headers={"Retry-After": str(result.retry_after)},
        )
    # Per-creator limit (if creator filter is provided)
    if body.creator:
        creator_key = RateLimiter.key("creator", body.creator)
        creator_result = await limiter.check_rate_limit(
            creator_key, settings.rate_limit_creator_per_hour, window_seconds=3600,
        )
        if not creator_result.allowed:
            logger.warning(
                "rate_limit_exceeded scope=creator key=%s retry_after=%d",
                creator_key, creator_result.retry_after,
            )
            return JSONResponse(
                status_code=429,
                content={
                    "error": "Creator rate limit exceeded",
                    "retry_after": creator_result.retry_after,
                },
                headers={"Retry-After": str(creator_result.retry_after)},
            )
    # ── Stream response ─────────────────────────────────────────────────
    service = ChatService(settings, redis=redis)
    return StreamingResponse(
        service.stream_response(
            query=body.query,
            db=db,
            creator=body.creator,
            conversation_id=body.conversation_id,
            personality_weight=body.personality_weight,
            user_id=user_id,
            client_ip=client_ip,
        ),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
        },
    )
--- a/backend/routers/consent.py
+++ b/backend/routers/consent.py
@ -0,0 +1,322 @@
 """Consent router — per-video consent toggles with versioned audit trail.
 Creator endpoints (ownership-gated):
  GET  /consent/videos              List consent for the current creator's videos
  GET  /consent/videos/{video_id}   Single video consent status
  PUT  /consent/videos/{video_id}   Upsert consent (partial update, audit logged)
  GET  /consent/videos/{video_id}/history  Audit trail for a video
 Admin endpoint:
  GET  /consent/admin/summary       Aggregate consent flag counts
 """
 from __future__ import annotations
 import logging
 import uuid
 from typing import Annotated
 from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
 from sqlalchemy import func, select
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.orm import selectinload
 from auth import get_current_user, reject_impersonation, require_role
 from database import get_session
 from models import (
    ConsentAuditLog,
    ConsentField,
    SourceVideo,
    User,
    UserRole,
    VideoConsent,
 )
 from schemas import (
    ConsentAuditEntry,
    ConsentListResponse,
    ConsentSummary,
    VideoConsentRead,
    VideoConsentUpdate,
 )
 logger = logging.getLogger("chrysopedia.consent")
 router = APIRouter(prefix="/consent", tags=["consent"])
 # ── Helpers ──────────────────────────────────────────────────────────────────
 async def _verify_video_ownership(
    video_id: uuid.UUID,
    user: User,
    session: AsyncSession,
 ) -> SourceVideo:
    """Load a SourceVideo and verify the user owns it (or is admin).
    Returns the SourceVideo on success.
    Raises 403 if user has no creator_id or doesn't own the video.
    Raises 404 if video doesn't exist.
    """
    result = await session.execute(
        select(SourceVideo).where(SourceVideo.id == video_id)
    )
    video = result.scalar_one_or_none()
    if video is None:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
            detail="Video not found",
        )
    # Admin bypasses ownership check
    if user.role == UserRole.admin:
        return video
    if user.creator_id is None:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="User is not linked to a creator profile",
        )
    if video.creator_id != user.creator_id:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="You do not own this video",
        )
    return video
 def _consent_to_read(consent: VideoConsent, filename: str) -> VideoConsentRead:
    """Map a VideoConsent ORM instance to the read schema."""
    return VideoConsentRead(
        source_video_id=consent.source_video_id,
        video_filename=filename,
        creator_id=consent.creator_id,
        kb_inclusion=consent.kb_inclusion,
        training_usage=consent.training_usage,
        public_display=consent.public_display,
        updated_at=consent.updated_at,
    )
 # ── Endpoints ────────────────────────────────────────────────────────────────
@router.get("/videos", response_model=ConsentListResponse)
 async def list_video_consents(
    current_user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
    offset: Annotated[int, Query(ge=0)] = 0,
    limit: Annotated[int, Query(ge=1, le=100)] = 50,
 ):
    """List consent records for the current creator's videos."""
    if current_user.creator_id is None and current_user.role != UserRole.admin:
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="User is not linked to a creator profile",
        )
    stmt = (
        select(VideoConsent)
        .join(SourceVideo, VideoConsent.source_video_id == SourceVideo.id)
        .options(selectinload(VideoConsent.source_video))
    )
    # Non-admin sees only their own videos
    if current_user.role != UserRole.admin:
        stmt = stmt.where(VideoConsent.creator_id == current_user.creator_id)
    # Count
    count_stmt = select(func.count()).select_from(stmt.subquery())
    total = (await session.execute(count_stmt)).scalar() or 0
    # Fetch page
    stmt = stmt.order_by(VideoConsent.updated_at.desc())
    stmt = stmt.offset(offset).limit(limit)
    result = await session.execute(stmt)
    consents = result.scalars().all()
    items = [
        _consent_to_read(c, c.source_video.filename) for c in consents
    ]
    return ConsentListResponse(items=items, total=total)
@router.get("/videos/{video_id}", response_model=VideoConsentRead)
 async def get_video_consent(
    video_id: uuid.UUID,
    current_user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Get consent status for a single video."""
    video = await _verify_video_ownership(video_id, current_user, session)
    result = await session.execute(
        select(VideoConsent).where(VideoConsent.source_video_id == video_id)
    )
    consent = result.scalar_one_or_none()
    if consent is None:
        # No consent record yet — return defaults
        return VideoConsentRead(
            source_video_id=video_id,
            video_filename=video.filename,
            creator_id=video.creator_id,
            kb_inclusion=False,
            training_usage=False,
            public_display=True,
            updated_at=video.created_at,
        )
    return _consent_to_read(consent, video.filename)
@router.put("/videos/{video_id}", response_model=VideoConsentRead)
 async def update_video_consent(
    video_id: uuid.UUID,
    body: VideoConsentUpdate,
    current_user: Annotated[User, Depends(reject_impersonation)],
    session: Annotated[AsyncSession, Depends(get_session)],
    request: Request,
 ):
    """Upsert consent for a video. Only non-None fields are changed.
    Creates audit log entries for each changed field with incrementing
    version numbers.
    """
    video = await _verify_video_ownership(video_id, current_user, session)
    # Load or create consent record
    result = await session.execute(
        select(VideoConsent).where(VideoConsent.source_video_id == video_id)
    )
    consent = result.scalar_one_or_none()
    is_new = consent is None
    if is_new:
        consent = VideoConsent(
            source_video_id=video_id,
            creator_id=video.creator_id,
            updated_by=current_user.id,
        )
        session.add(consent)
        await session.flush()  # get consent.id for audit entries
    # Determine the next version number
    max_version_result = await session.execute(
        select(func.coalesce(func.max(ConsentAuditLog.version), 0)).where(
            ConsentAuditLog.video_consent_id == consent.id
        )
    )
    next_version = (max_version_result.scalar() or 0) + 1
    # Collect client IP for audit
    client_ip = request.client.host if request.client else None
    # Apply changes and build audit entries
    fields_changed: list[str] = []
    update_data = body.model_dump(exclude_none=True)
    for field_name, new_value in update_data.items():
        # Validate field name against the enum
        try:
            ConsentField(field_name)
        except ValueError:
            continue
        old_value = getattr(consent, field_name)
        # Skip if no actual change
        if old_value == new_value:
            continue
        # Update the consent record
        setattr(consent, field_name, new_value)
        fields_changed.append(field_name)
        # Create audit entry
        audit_entry = ConsentAuditLog(
            video_consent_id=consent.id,
            version=next_version,
            field_name=field_name,
            old_value=old_value if not is_new else None,
            new_value=new_value,
            changed_by=current_user.id,
            ip_address=client_ip,
        )
        session.add(audit_entry)
        next_version += 1
    if fields_changed:
        consent.updated_by = current_user.id
        await session.commit()
        await session.refresh(consent)
        logger.info(
            "Consent updated: video_id=%s fields_changed=%s user=%s",
            video_id,
            fields_changed,
            current_user.id,
        )
    else:
        # No actual changes — still commit if we created a new record
        if is_new:
            await session.commit()
            await session.refresh(consent)
        else:
            # Nothing changed, no audit entries
            pass
    return _consent_to_read(consent, video.filename)
@router.get("/videos/{video_id}/history", response_model=list[ConsentAuditEntry])
 async def get_consent_history(
    video_id: uuid.UUID,
    current_user: Annotated[User, Depends(get_current_user)],
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Get the audit trail for a video's consent changes."""
    await _verify_video_ownership(video_id, current_user, session)
    # Find the consent record
    result = await session.execute(
        select(VideoConsent).where(VideoConsent.source_video_id == video_id)
    )
    consent = result.scalar_one_or_none()
    if consent is None:
        return []
    # Fetch audit entries ordered by version
    audit_result = await session.execute(
        select(ConsentAuditLog)
        .where(ConsentAuditLog.video_consent_id == consent.id)
        .order_by(ConsentAuditLog.version.asc())
    )
    return audit_result.scalars().all()
@router.get(
    "/admin/summary",
    response_model=ConsentSummary,
    dependencies=[Depends(require_role(UserRole.admin))],
 )
 async def consent_admin_summary(
    session: Annotated[AsyncSession, Depends(get_session)],
 ):
    """Aggregate consent flag counts across all videos (admin only)."""
    result = await session.execute(
        select(
            func.count().label("total"),
            func.sum(VideoConsent.kb_inclusion.cast(int)).label("kb"),
            func.sum(VideoConsent.training_usage.cast(int)).label("tu"),
            func.sum(VideoConsent.public_display.cast(int)).label("pd"),
        )
    )
    row = result.one()
    return ConsentSummary(
        total_videos=row.total or 0,
        kb_inclusion_granted=row.kb or 0,
        training_usage_granted=row.tu or 0,
        public_display_granted=row.pd or 0,
    )
--- a/backend/routers/creator_chapters.py
+++ b/backend/routers/creator_chapters.py
@ -0,0 +1,172 @@
 """Creator chapter management endpoints — review, edit, reorder, approve chapters.
 Auth-guarded endpoints for creators to manage auto-detected chapters for
 their videos before publication.
 """
 import logging
 import uuid
 from typing import Annotated
 from fastapi import APIRouter, Depends, HTTPException
 from sqlalchemy import select, update
 from sqlalchemy.ext.asyncio import AsyncSession
 from auth import get_current_user
 from database import get_session
 from models import ChapterStatus, KeyMoment, SourceVideo, User
 from schemas import (
    ChapterBulkApproveRequest,
    ChapterMarkerRead,
    ChapterReorderRequest,
    ChapterUpdate,
    ChaptersResponse,
 )
 logger = logging.getLogger("chrysopedia.creator_chapters")
 router = APIRouter(prefix="/creator", tags=["creator-chapters"])
 async def _verify_creator_owns_video(
    current_user: User,
    video_id: uuid.UUID,
    db: AsyncSession,
 ) -> None:
    """Verify the user is a creator and owns the specified video."""
    if current_user.creator_id is None:
        raise HTTPException(status_code=403, detail="No creator profile linked")
    video = (await db.execute(
        select(SourceVideo).where(
            SourceVideo.id == video_id,
            SourceVideo.creator_id == current_user.creator_id,
        )
    )).scalar_one_or_none()
    if video is None:
        raise HTTPException(status_code=404, detail="Video not found or not owned by you")
@router.get("/{video_id}/chapters", response_model=ChaptersResponse)
 async def get_creator_chapters(
    video_id: uuid.UUID,
    current_user: Annotated[User, Depends(get_current_user)],
    db: AsyncSession = Depends(get_session),
 ) -> ChaptersResponse:
    """Return all chapters for a creator's video (all statuses)."""
    await _verify_creator_owns_video(current_user, video_id, db)
    stmt = (
        select(KeyMoment)
        .where(KeyMoment.source_video_id == video_id)
        .order_by(KeyMoment.sort_order, KeyMoment.start_time)
    )
    result = await db.execute(stmt)
    moments = result.scalars().all()
    logger.debug("Creator chapters for %s: %d", video_id, len(moments))
    return ChaptersResponse(
        video_id=video_id,
        chapters=[ChapterMarkerRead.model_validate(m) for m in moments],
    )
@router.patch("/chapters/{chapter_id}", response_model=ChapterMarkerRead)
 async def update_chapter(
    chapter_id: uuid.UUID,
    body: ChapterUpdate,
    current_user: Annotated[User, Depends(get_current_user)],
    db: AsyncSession = Depends(get_session),
 ) -> ChapterMarkerRead:
    """Update a single chapter (title, times, status)."""
    if current_user.creator_id is None:
        raise HTTPException(status_code=403, detail="No creator profile linked")
    # Fetch the chapter and verify ownership via the video
    chapter = (await db.execute(
        select(KeyMoment).where(KeyMoment.id == chapter_id)
    )).scalar_one_or_none()
    if chapter is None:
        raise HTTPException(status_code=404, detail="Chapter not found")
    await _verify_creator_owns_video(current_user, chapter.source_video_id, db)
    # Apply partial updates
    update_data = body.model_dump(exclude_unset=True)
    if "chapter_status" in update_data:
        update_data["chapter_status"] = ChapterStatus(update_data["chapter_status"])
    for field, value in update_data.items():
        setattr(chapter, field, value)
    await db.commit()
    await db.refresh(chapter)
    logger.info("Updated chapter %s: %s", chapter_id, list(update_data.keys()))
    return ChapterMarkerRead.model_validate(chapter)
@router.put("/{video_id}/chapters/reorder", response_model=ChaptersResponse)
 async def reorder_chapters(
    video_id: uuid.UUID,
    body: ChapterReorderRequest,
    current_user: Annotated[User, Depends(get_current_user)],
    db: AsyncSession = Depends(get_session),
 ) -> ChaptersResponse:
    """Reorder chapters for a video by setting sort_order values."""
    await _verify_creator_owns_video(current_user, video_id, db)
    for item in body.chapters:
        await db.execute(
            update(KeyMoment)
            .where(KeyMoment.id == item.id, KeyMoment.source_video_id == video_id)
            .values(sort_order=item.sort_order)
        )
    await db.commit()
    # Return updated list
    stmt = (
        select(KeyMoment)
        .where(KeyMoment.source_video_id == video_id)
        .order_by(KeyMoment.sort_order, KeyMoment.start_time)
    )
    result = await db.execute(stmt)
    moments = result.scalars().all()
    logger.info("Reordered %d chapters for video %s", len(body.chapters), video_id)
    return ChaptersResponse(
        video_id=video_id,
        chapters=[ChapterMarkerRead.model_validate(m) for m in moments],
    )
@router.post("/{video_id}/chapters/approve", response_model=ChaptersResponse)
 async def bulk_approve_chapters(
    video_id: uuid.UUID,
    body: ChapterBulkApproveRequest,
    current_user: Annotated[User, Depends(get_current_user)],
    db: AsyncSession = Depends(get_session),
 ) -> ChaptersResponse:
    """Bulk-approve chapters by ID list."""
    await _verify_creator_owns_video(current_user, video_id, db)
    if body.chapter_ids:
        await db.execute(
            update(KeyMoment)
            .where(
                KeyMoment.id.in_(body.chapter_ids),
                KeyMoment.source_video_id == video_id,
            )
            .values(chapter_status=ChapterStatus.approved)
        )
        await db.commit()
        logger.info("Bulk-approved %d chapters for video %s", len(body.chapter_ids), video_id)
    # Return updated list
    stmt = (
        select(KeyMoment)
        .where(KeyMoment.source_video_id == video_id)
        .order_by(KeyMoment.sort_order, KeyMoment.start_time)
    )
    result = await db.execute(stmt)
    moments = result.scalars().all()
    return ChaptersResponse(
        video_id=video_id,
        chapters=[ChapterMarkerRead.model_validate(m) for m in moments],
    )
--- a/Show more
+++ b/Show more