test: Built ChatService with retrieve-prompt-stream pipeline, POST /api…

- "backend/chat_service.py" - "backend/routers/chat.py" - "backend/main.py" - "backend/tests/test_chat.py" GSD-Task: S03/T01
2026-04-04 05:19:44 +00:00 · 2026-04-04 05:19:44 +00:00 · a9589bfc93
commit a9589bfc93
parent 195ba6e0a7
15 changed files with 1039 additions and 4 deletions
--- a/.gsd/DECISIONS.md
+++ b/.gsd/DECISIONS.md
@ -44,4 +44,5 @@
 | D036 | M019/S02 | architecture | JWT auth configuration for creator authentication | HS256 with existing app_secret_key, 24-hour expiry, OAuth2PasswordBearer at /api/v1/auth/login | Reuses existing secret from config.py settings. 24-hour expiry balances convenience with security for a single-admin/invite-only tool. OAuth2PasswordBearer integrates with FastAPI's dependency injection and auto-generates OpenAPI security schemes. | Yes | agent |
 | D037 |  | architecture | Search impressions query strategy for creator dashboard | Exact case-insensitive title match via EXISTS subquery against SearchLog | MVP approach — counts SearchLog rows where query exactly matches (case-insensitive) any of the creator's technique page titles. Sufficient for initial dashboard. Can be expanded to ILIKE partial matching or full-text search later when more search data accumulates. | Yes | agent |
 | D038 |  | infrastructure | Primary git remote for chrysopedia | git.xpltd.co (Forgejo) instead of github.com | Consolidating on self-hosted Forgejo instance at git.xpltd.co. Wiki is already there. Single source of truth. | Yes | human |
-| D039 |  | architecture | LightRAG search result scoring strategy | Rank by retrieval order (1.0→0.5 descending) since /query/data returns no numeric relevance score | LightRAG /query/data endpoint returns chunks and entities without relevance scores. Position-based scoring preserves the retrieval engine's internal ranking while providing the float scores needed by the existing dedup/sort pipeline. | Yes | agent |
+| D039 |  | architecture | LightRAG vs Qdrant search execution strategy | Sequential with fallback — LightRAG first, Qdrant only on LightRAG failure/empty, not parallel | Running both in parallel would double latency overhead. LightRAG is the primary engine; Qdrant is a safety net. Sequential approach reduces load and simplifies result merging. | Yes | agent |
+| D040 | M021/S02 | architecture | Creator-scoped retrieval cascade strategy | Sequential 4-tier cascade (creator → domain → global → none) with ll_keywords scoping and post-filtering | Sequential cascade is simpler than parallel-with-priority and avoids wasted LightRAG calls when early tiers succeed. ll_keywords hints LightRAG's retrieval without hard constraints. Post-filtering on tier 1 ensures strict creator scoping while 3x oversampling compensates for filtering losses. Domain tier uses ≥2 page threshold to avoid noise from sparse creators. | Yes | agent |
--- a/.gsd/KNOWLEDGE.md
+++ b/.gsd/KNOWLEDGE.md
@ -294,3 +294,17 @@
 **Context:** The Forgejo wiki API `PATCH /api/v1/repos/{owner}/{repo}/wiki/page/{pageName}` renames the underlying git file to `unnamed.md` when updating content. Each subsequent PATCH overwrites the same `unnamed.md`, destroying the original page. This happens silently — the API returns success but the page is effectively deleted and replaced with an unnamed page.

 **Fix:** Never use the Forgejo wiki PATCH API for updating existing pages. Instead, use git clone → edit files → git push. For creating new pages, `POST /api/v1/repos/{owner}/{repo}/wiki/new` works correctly. For bulk updates, always use the git approach. If the PATCH API corrupts pages, recover by resetting to a pre-corruption commit and force-pushing.
+
+## Mocking sequential httpx calls for multi-tier cascade tests
+
+**Context:** When testing a cascade (e.g., creator → domain → global), each tier makes a separate httpx POST to the same LightRAG endpoint. Using `side_effect` with a counter-based function lets you return different mock responses for each sequential call.
+
+**Pattern:** Create a side_effect function that tracks `call_count` and returns different mock responses based on which call number it is (1st = creator tier response, 2nd = domain tier response, 3rd = global tier response). This avoids complex URL-based routing since all tiers hit the same endpoint.
+
+**Where:** `backend/tests/test_search.py` — cascade tests from M021/S02
+
+## LightRAG ll_keywords for scoped retrieval
+
+**Context:** LightRAG's `/query/data` endpoint accepts `ll_keywords` (list of strings) that bias retrieval toward matching content without hard filtering. For creator-scoped search, pass the creator's name as a keyword; for domain-scoped, pass the topic category. Combine with post-filtering for strict creator scoping (request 3x results, filter locally by creator_id).
+
+**Where:** `backend/search_service.py` — `_creator_scoped_search()`, `_domain_scoped_search()`
--- a/.gsd/milestones/M021/M021-ROADMAP.md
+++ b/.gsd/milestones/M021/M021-ROADMAP.md
@ -7,7 +7,7 @@ LightRAG becomes the primary search engine. Chat engine goes live (encyclopedic
 | ID | Slice | Risk | Depends | Done | After this |
 |----|-------|------|---------|------|------------|
 | S01 | [B] LightRAG Search Cutover | high | — | ✅ | Primary search backed by LightRAG. Old system remains as automatic fallback. |
-| S02 | [B] Creator-Scoped Retrieval Cascade | medium | S01 | ⬜ | Question on Keota's profile first checks Keota's content, then sound design domain, then full KB, then graceful fallback |
+| S02 | [B] Creator-Scoped Retrieval Cascade | medium | S01 | ✅ | Question on Keota's profile first checks Keota's content, then sound design domain, then full KB, then graceful fallback |
 | S03 | [B] Chat Engine MVP | high | S02 | ⬜ | User asks a question, receives a streamed response with citations linking to source videos and technique pages |
 | S04 | [B] Highlight Detection v1 | medium | — | ⬜ | Scored highlight candidates generated from existing pipeline data for a sample of videos |
 | S05 | [A] Audio Mode + Chapter Markers | medium | — | ⬜ | Media player with waveform visualization in audio mode and chapter markers on the timeline |
--- a/.gsd/milestones/M021/slices/S02/S02-SUMMARY.md
+++ b/.gsd/milestones/M021/slices/S02/S02-SUMMARY.md
@ -0,0 +1,129 @@
+---
+id: S02
+parent: M021
+milestone: M021
+provides:
+  - Creator-scoped search via ?creator= query param on GET /api/v1/search
+  - cascade_tier field in SearchResponse for downstream consumers (chat engine)
+  - 4-tier cascade pattern reusable for future scoped retrieval needs
+requires:
+  - slice: S01
+    provides: LightRAG search integration with _lightrag_search() and mock-httpx-at-instance test pattern
+affects:
+  - S03
+key_files:
+  - backend/search_service.py
+  - backend/schemas.py
+  - backend/routers/search.py
+  - backend/tests/test_search.py
+key_decisions:
+  - Cascade runs sequentially (not parallel) since each tier is a fallback for the previous
+  - Post-filtering on tier 1 uses 3x top_k to survive filtering losses
+  - Domain tier requires ≥2 technique pages to declare a dominant category
+  - Used side_effect with call_count to mock sequential httpx calls for multi-tier cascade testing
+patterns_established:
+  - 4-tier cascade pattern: creator → domain → global → none with cascade_tier field indicating which tier served
+  - ll_keywords parameter in LightRAG queries for scoping without hard filtering
+  - Post-filter pattern: request 3x results from LightRAG, filter locally by creator_id
+  - Domain detection via SQL aggregation of topic_category with minimum page threshold
+observability_surfaces:
+  - cascade_tier field in SearchResponse reveals which tier served results
+  - logger.info per cascade tier with query, creator, tier, latency_ms, result_count
+  - logger.warning on tier skip/empty with reason= tag for each tier independently
+drill_down_paths:
+  - .gsd/milestones/M021/slices/S02/tasks/T01-SUMMARY.md
+  - .gsd/milestones/M021/slices/S02/tasks/T02-SUMMARY.md
+duration: ""
+verification_result: passed
+completed_at: 2026-04-04T05:09:12.346Z
+blocker_discovered: false
+---
+
+# S02: [B] Creator-Scoped Retrieval Cascade
+
+**Added 4-tier creator-scoped retrieval cascade (creator → domain → global → none) to SearchService with cascade_tier in API response and 6 integration tests.**
+
+## What Happened
+
+## What Was Built
+
+This slice added a creator-scoped retrieval cascade to SearchService that progressively widens search scope when queried in a creator context (e.g. from a creator profile page).
+
+### T01: Cascade Implementation (search_service.py, schemas.py, routers/search.py)
+
+Four new methods were added to SearchService:
+
+1. **`_resolve_creator(creator_ref, db)`** — Accepts UUID or slug, resolves to (creator_id, creator_name) tuple. Falls back gracefully to (None, None) if not found.
+
+2. **`_get_creator_domain(creator_id, db)`** — SQL aggregation query finds the dominant `topic_category` across a creator's technique pages. Returns None if fewer than 2 pages exist (insufficient signal for domain classification).
+
+3. **`_creator_scoped_search(query, creator_id, creator_name, limit, db)`** — Posts to LightRAG `/query/data` with `ll_keywords: [creator_name]` and requests 3× results. Post-filters by `creator_id` match to ensure only that creator's content returns. Uses the same chunk-parsing pipeline as `_lightrag_search()`.
+
+4. **`_domain_scoped_search(query, domain, limit, db)`** — Posts to LightRAG with `ll_keywords: [domain]`. No post-filtering — any creator's content in the domain qualifies.
+
+The `search()` orchestrator was modified to accept an optional `creator` parameter. When present, it runs the cascade sequentially: creator tier → domain tier → global tier → none. The first tier that returns results wins. `cascade_tier` field was added to `SearchResponse` (defaults to empty string for backward compatibility). The `creator` query param was wired into the search router.
+
+All exception paths log WARNING with `reason=` tags and return empty lists — each tier failing is non-fatal; the cascade continues to the next tier.
+
+### T02: Integration Tests (tests/test_search.py)
+
+Six new tests were added following the established mock-httpx-at-instance pattern from S01:
+
+- **test_search_cascade_creator_tier** — Creator chunks match post-filter → cascade_tier="creator"
+- **test_search_cascade_domain_tier** — Creator filter empties → domain keywords return results → cascade_tier="domain"
+- **test_search_cascade_global_fallback** — Creator+domain empty → global search returns → cascade_tier="global"
+- **test_search_cascade_graceful_empty** — All tiers empty → cascade_tier="none"
+- **test_search_cascade_unknown_creator** — Unknown slug → cascade skipped, normal search path
+- **test_search_no_creator_param_unchanged** — No creator param → existing behavior, cascade_tier=""
+
+Tests use `side_effect` with call counting to mock sequential httpx calls for multi-tier testing. A `_seed_cascade_data` helper creates Keota (3 Sound Design pages, ≥2 threshold) and Virtual Riot (1 Synthesis page, below threshold).
+
+## Verification Summary
+
+All static checks pass: py_compile on all 3 modified files, grep confirms cascade_tier in schemas.py, creator Query param in router, all 4 cascade methods in search_service.py. Integration tests require PostgreSQL (ub01:5433) — T02 confirmed all 6 cascade tests pass and 34/35 total tests pass (1 pre-existing failure unrelated to cascade work).
+
+## Verification
+
+- py_compile on search_service.py, schemas.py, routers/search.py — all 3 compile clean ✅
+- grep cascade_tier in schemas.py — present ✅
+- grep creator.*Query in routers/search.py — present ✅
+- grep _creator_scoped_search, _domain_scoped_search, _get_creator_domain, _resolve_creator in search_service.py — all 4 present ✅
+- pytest -k cascade (run on ub01 with DB): 6/6 pass ✅ (per T02 summary)
+- pytest full suite (run on ub01 with DB): 34/35 pass, 1 pre-existing failure unrelated ✅
+
+## Requirements Advanced
+
+- R015 — Creator-scoped search narrows results to relevant creator context, reducing time-to-answer for creator-specific queries
+
+## Requirements Validated
+
+None.
+
+## New Requirements Surfaced
+
+None.
+
+## Requirements Invalidated or Re-scoped
+
+None.
+
+## Deviations
+
+Added 6th test (test_search_no_creator_param_unchanged) beyond the minimum 5 specified in the plan. No structural deviations.
+
+## Known Limitations
+
+- Pre-existing test_keyword_search_match_context_tag failure (expects "Tag: granular" but gets "Title match") — unrelated to cascade work.
+- Integration tests require PostgreSQL on ub01:5433 — cannot run locally on dev machine.
+
+## Follow-ups
+
+- S03 Chat Engine MVP will consume cascade_tier to show users which scope served their answer.
+- Consider adding latency metrics per cascade tier for performance monitoring once traffic exists.
+
+## Files Created/Modified
+
+- `backend/search_service.py` — Added 4 new methods (_resolve_creator, _get_creator_domain, _creator_scoped_search, _domain_scoped_search) and modified search() orchestrator to support creator cascade
+- `backend/schemas.py` — Added cascade_tier: str = '' field to SearchResponse
+- `backend/routers/search.py` — Added creator query param, wired cascade_tier into response
+- `backend/tests/test_search.py` — Added 6 cascade integration tests with _seed_cascade_data helper
--- a/.gsd/milestones/M021/slices/S02/S02-UAT.md
+++ b/.gsd/milestones/M021/slices/S02/S02-UAT.md
@ -0,0 +1,53 @@
+# S02: [B] Creator-Scoped Retrieval Cascade — UAT
+
+**Milestone:** M021
+**Written:** 2026-04-04T05:09:12.347Z
+
+## UAT: Creator-Scoped Retrieval Cascade
+
+### Preconditions
+- Chrysopedia stack running on ub01 (docker ps shows chrysopedia-api, chrysopedia-worker, chrysopedia-db)
+- At least one creator with ≥3 technique pages in the same topic_category (e.g. Keota with Sound Design pages)
+- LightRAG service accessible from API container
+
+### Test Cases
+
+#### TC1: Creator Tier — Direct Creator Match
+1. Navigate to `http://ub01:8096` or use curl
+2. `curl "http://ub01:8096/api/v1/search?q=reese+bass&creator=keota"`
+3. **Expected:** Results returned with `cascade_tier: "creator"`. All results should be from Keota's technique pages.
+4. **Expected:** `fallback_used: false`
+
+#### TC2: Domain Tier — Creator Has No Direct Match
+1. `curl "http://ub01:8096/api/v1/search?q=reverb+design&creator=keota"` (query about a topic Keota hasn't covered but is in Sound Design domain)
+2. **Expected:** If creator tier returns empty, results come from domain tier with `cascade_tier: "domain"`. Results may include other creators in the Sound Design domain.
+
+#### TC3: Global Tier — No Creator or Domain Match
+1. `curl "http://ub01:8096/api/v1/search?q=mixing+vocals&creator=keota"` (query outside Keota's domain entirely)
+2. **Expected:** If creator and domain tiers both return empty, `cascade_tier: "global"` with results from the full KB.
+
+#### TC4: Graceful Empty — All Tiers Return Nothing
+1. `curl "http://ub01:8096/api/v1/search?q=xyznonexistent123&creator=keota"`
+2. **Expected:** `cascade_tier: "none"`. Qdrant fallback may still fire (`fallback_used: true`), or items may be empty.
+
+#### TC5: Unknown Creator — Cascade Skipped
+1. `curl "http://ub01:8096/api/v1/search?q=bass+design&creator=nonexistent-creator-slug"`
+2. **Expected:** `cascade_tier: ""` (empty string). Normal search behavior — cascade is not attempted for unknown creators.
+
+#### TC6: No Creator Param — Existing Behavior Unchanged
+1. `curl "http://ub01:8096/api/v1/search?q=reese+bass"`
+2. **Expected:** `cascade_tier: ""` (empty string). Same results as before the cascade feature was added. No regression in existing search.
+
+#### TC7: Creator by UUID — UUID Resolution Works
+1. Look up a creator's UUID from the DB: `docker exec chrysopedia-db psql -U chrysopedia -c "SELECT id, name FROM creators LIMIT 3;"`
+2. `curl "http://ub01:8096/api/v1/search?q=bass&creator=<UUID>"`
+3. **Expected:** Cascade triggers normally, same as using the slug.
+
+#### TC8: Observability — Cascade Logging
+1. `docker logs chrysopedia-api 2>&1 | grep -i cascade` after running TC1-TC4
+2. **Expected:** Log lines showing cascade tier transitions with query, creator, tier name, and result count. WARNING logs for any skipped/empty tiers with `reason=` tags.
+
+### Edge Cases
+- Creator with exactly 1 technique page → domain tier should be skipped (_get_creator_domain returns None when <2 pages)
+- Very short query (1-2 chars) → may not trigger cascade if below minimum query length
+- Creator slug with special characters → should be safely handled by _resolve_creator
--- a/.gsd/milestones/M021/slices/S02/tasks/T02-VERIFY.json
+++ b/.gsd/milestones/M021/slices/S02/tasks/T02-VERIFY.json
@ -0,0 +1,16 @@
+{
+  "schemaVersion": 1,
+  "taskId": "T02",
+  "unitId": "M021/S02/T02",
+  "timestamp": 1775279244068,
+  "passed": true,
+  "discoverySource": "task-plan",
+  "checks": [
+    {
+      "command": "cd backend",
+      "exitCode": 0,
+      "durationMs": 5,
+      "verdict": "pass"
+    }
+  ]
+}
--- a/.gsd/milestones/M021/slices/S03/S03-PLAN.md
+++ b/.gsd/milestones/M021/slices/S03/S03-PLAN.md
@ -1,6 +1,14 @@
 # S03: [B] Chat Engine MVP

-**Goal:** Build chat completion pipeline: query → intent classify → retrieve → generate → cite → stream
+**Goal:** User asks a question via chat, receives a streamed encyclopedic response with inline citations linking to source technique pages and videos.
 **Demo:** After this: User asks a question, receives a streamed response with citations linking to source videos and technique pages

 ## Tasks
+- [x] **T01: Built ChatService with retrieve-prompt-stream pipeline, POST /api/v1/chat SSE endpoint, and 6 passing integration tests covering SSE format, citations, creator forwarding, validation, and error handling** — Create the backend chat engine: a ChatService class that retrieves context via SearchService.search(), assembles a numbered context block for the LLM system prompt, streams completion tokens via openai.AsyncOpenAI with stream=True, and yields SSE events. Create a FastAPI router at POST /api/v1/chat that accepts {query: str, creator?: str}, validates query length (1-1000 chars), and returns a StreamingResponse with text/event-stream content type. SSE protocol: first a 'sources' event with citation metadata array, then 'token' events for each streamed chunk, then a 'done' event with cascade_tier. On LLM error mid-stream, emit an 'error' event. Wire the router into main.py. Write integration tests in tests/test_chat.py that mock both SearchService.search() and the OpenAI streaming response to verify: (1) valid SSE format with all three event types, (2) citation numbering matches sources, (3) creator param forwarded to search, (4) empty query returns 422, (5) LLM error produces SSE error event.
+  - Estimate: 1h30m
+  - Files: backend/chat_service.py, backend/routers/chat.py, backend/main.py, backend/tests/test_chat.py
+  - Verify: cd /home/aux/projects/content-to-kb-automator/backend && python -m py_compile chat_service.py && python -m py_compile routers/chat.py && python -m pytest tests/test_chat.py -v
+- [ ] **T02: Build frontend chat page with SSE streaming and citation display** — Create the user-facing chat interface. Build an SSE client function in api/chat.ts that POSTs to /api/v1/chat using raw fetch(), reads the response body as a ReadableStream, and parses SSE events (sources, token, done, error). Build ChatPage.tsx with: (1) a text input and submit button, (2) streaming message display that accumulates token events into rendered text, (3) a citation source list rendered from the sources event with links to /techniques/:slug (using section_anchor for deep links), (4) loading and error states. Style with ChatPage.module.css matching the existing dark theme (use CSS variables from the app). Add a lazy-loaded /chat route in App.tsx. Add a 'Chat' navigation link in the header bar visible on all pages. The citation display should parse [N] markers in the streamed text and render them as superscript links to the source list, reusing the regex pattern from utils/citations.tsx but with chat-specific source items instead of KeyMomentSummary.
+  - Estimate: 1h
+  - Files: frontend/src/api/chat.ts, frontend/src/pages/ChatPage.tsx, frontend/src/pages/ChatPage.module.css, frontend/src/App.tsx
+  - Verify: cd /home/aux/projects/content-to-kb-automator/frontend && npm run build
--- a/.gsd/milestones/M021/slices/S03/S03-RESEARCH.md
+++ b/.gsd/milestones/M021/slices/S03/S03-RESEARCH.md
@ -0,0 +1,132 @@
+# S03 Research: [B] Chat Engine MVP
+
+## Summary
+
+This slice builds a chat completion endpoint that accepts a user question, retrieves relevant knowledge via the existing search cascade (S01/S02), prompts an LLM to generate an encyclopedic answer with inline citations, and streams the response back to the frontend via Server-Sent Events. The frontend gets a new chat page with a message input and streaming display.
+
+**Depth: Targeted** — The LLM integration pattern is established (`pipeline/llm_client.py`), the retrieval cascade exists (S02), and the citation parsing utility exists in the frontend (`utils/citations.tsx`). The main novelty is: (1) async streaming via OpenAI SDK + SSE, (2) the RAG prompt that grounds answers in retrieved context, (3) a new API endpoint + frontend page.
+
+## Requirement Coverage
+
+- **R015 (active)** — Creator-scoped search narrows results to relevant creator context, reducing time-to-answer. Chat engine directly serves this by using the S02 cascade with an optional `creator` param so chat answers are scoped.
+
+## Recommendation
+
+Build the backend chat service first (retrieval → prompt → streaming generation), then the API endpoint with SSE streaming, then the frontend chat page. This is a 3-task slice.
+
+## Implementation Landscape
+
+### What Exists
+
+| Component | Location | Relevance |
+|-----------|----------|-----------|
+| **SearchService** with 4-tier cascade | `backend/search_service.py` | Retrieval engine — call `search()` with optional `creator` to get relevant chunks with `cascade_tier` |
+| **SearchResultItem** schema | `backend/schemas.py:232` | Each result has `title`, `slug`, `creator_name`, `topic_category`, `summary`, `match_context`, `section_anchor`, `section_heading` — all usable as citation metadata |
+| **LLM config** | `backend/config.py` | `llm_api_url`, `llm_api_key`, `llm_model` ("fyn-llm-agent-chat") — OpenAI-compatible endpoint already configured |
+| **Sync LLMClient** | `backend/pipeline/llm_client.py` | Sync client for Celery pipeline — **not reusable** for async streaming. Need new async streaming client using `openai.AsyncOpenAI` (already imported in search_service.py for embeddings) |
+| **Citation parser** | `frontend/src/utils/citations.tsx` | Parses `[N]` and `[N,M]` citation markers into anchor links. Reusable for chat citations if we output the same `[N]` format |
+| **API client** | `frontend/src/api/client.ts` | `request()` helper for JSON endpoints — chat streaming needs a raw `fetch()` with `EventSource` or `ReadableStream` reader instead |
+| **App routing** | `frontend/src/App.tsx` | React Router with lazy-loaded pages. New `/chat` route follows the same pattern |
+| **KeyMoment model** | `backend/models.py:231` | Has `source_video_id`, `start_time`, `end_time` — enables "watch at timestamp" citations |
+| **TechniquePageVideo** | `backend/models.py:371` | Links technique pages to source videos — enables "from video X" citation links |
+
+### What Needs Building
+
+#### 1. Backend: Chat Service (`backend/chat_service.py`)
+
+New async service with these responsibilities:
+
+- **Retrieval**: Call `SearchService.search()` with the user query (and optional `creator` for scoping). Collect top-N results as context chunks.
+- **Context assembly**: Format retrieved results into a numbered context block for the LLM prompt. Each chunk gets a `[N]` reference number, its title, creator, and summary/content.
+- **System prompt**: Encyclopedic mode — instruct the LLM to answer using *only* the provided context, cite sources with `[N]` markers, and say "I don't have information about that" if context is insufficient.
+- **Streaming generation**: Use `openai.AsyncOpenAI.chat.completions.create(stream=True)` to get an async iterator of delta chunks. The `openai>=1.0` SDK supports this natively via `async for chunk in stream:`.
+- **Citation metadata**: Return the numbered source list alongside the stream so the frontend can resolve `[N]` markers to actual links.
+
+Key design decisions:
+- **No conversation history for MVP** — single-turn Q&A. The roadmap says "encyclopedic mode," which is reference lookup, not conversational.
+- **Reuse SearchService directly** — don't duplicate retrieval logic. The cascade already handles creator scoping.
+- **Temperature**: Use a low temperature (0.1-0.3) for factual grounded responses, not the 0.0 used for pipeline JSON extraction.
+
+#### 2. Backend: Chat API Endpoint (`backend/routers/chat.py`)
+
+New router mounted at `/api/v1/chat`.
+
+- **POST `/api/v1/chat`** — accepts `{ query: string, creator?: string }`, returns `StreamingResponse` with `text/event-stream` content type.
+- SSE format: `data: {"type": "token", "content": "..."}\n\n` for streamed text, `data: {"type": "sources", "items": [...]}\n\n` for the citation source list (sent before streaming begins), `data: {"type": "done"}\n\n` at end.
+- FastAPI's `StreamingResponse` with `media_type="text/event-stream"` handles SSE natively — **no need for `sse-starlette`** dependency.
+- Wire into `main.py` router list.
+
+#### 3. Frontend: Chat Page + API
+
+- **`frontend/src/api/chat.ts`**: Function that calls POST `/api/v1/chat` using raw `fetch()`, reads the response body as a `ReadableStream`, and yields parsed SSE events.
+- **`frontend/src/pages/ChatPage.tsx`**: Input field, submit handler, streaming message display, citation links resolved from the sources event. Mobile-friendly layout.
+- **Route**: Add `/chat` to `App.tsx`, lazy-loaded.
+- **Navigation**: Add "Ask" or "Chat" link to the nav bar.
+- Reuse `parseCitations()` from `utils/citations.tsx` — the `[N]` format matches. Need to adapt it to work with chat source items (which are SearchResultItems, not KeyMomentSummary).
+
+### Dependencies & Constraints
+
+- **`openai>=1.0` SDK** already in requirements.txt — supports `AsyncOpenAI` with `stream=True` natively. No new dependency needed.
+- **FastAPI `StreamingResponse`** — built-in, no extra dependency.
+- **`sse-starlette`** — NOT needed. FastAPI's `StreamingResponse` with proper SSE formatting (`data: ...\n\n`) works for simple SSE. The `sse-starlette` package adds complexity (EventSourceResponse, ping intervals) that isn't needed for a simple streaming endpoint.
+- **LLM model**: `fyn-llm-agent-chat` is the configured chat model. For chat responses, this is appropriate. Config already has `llm_api_url` and `llm_api_key`.
+- **No new DB tables needed** — chat is stateless for MVP (no conversation history storage).
+- **No new Alembic migration** — no schema changes.
+
+### Natural Task Seams
+
+1. **T01: Chat service + endpoint** — `chat_service.py` (retrieval, prompt, streaming) + `routers/chat.py` (SSE endpoint) + wire into `main.py`. Verifiable with `curl` against the SSE endpoint.
+2. **T02: Frontend chat page** — `api/chat.ts` (SSE client) + `pages/ChatPage.tsx` + route in `App.tsx` + nav link. Verifiable with `npm run build`.
+3. **T03: Integration tests** — `tests/test_chat.py` with mocked LLM responses verifying the endpoint returns valid SSE events with correct citation format.
+
+### Risks
+
+- **LLM response quality** — The model may hallucinate or fail to follow citation instructions. Mitigated by clear system prompt with explicit citation format examples.
+- **Streaming error handling** — If the LLM stream errors mid-response, the SSE connection needs a clean error event. Handle with try/except around the async stream iterator.
+- **Context window limits** — If search returns many long results, the context could exceed the model's context window. Mitigate by truncating summaries and limiting to top-5 results.
+
+### Verification Strategy
+
+- **T01**: `curl -X POST http://localhost:8000/api/v1/chat -H 'Content-Type: application/json' -d '{"query":"what is sidechain compression"}'` should return SSE events with `type: sources`, `type: token`, and `type: done`.
+- **T02**: `cd frontend && npm run build` succeeds with no errors. Chat page renders at `/chat`.
+- **T03**: `pytest tests/test_chat.py` — mock the OpenAI streaming response and SearchService, verify SSE event format and citation numbering.
+
+### SSE Protocol Detail
+
+```
+data: {"type":"sources","items":[{"title":"...","slug":"...","creator_name":"...","url":"/techniques/..."}]}
+
+data: {"type":"token","content":"Side"}
+data: {"type":"token","content":"chain"}
+data: {"type":"token","content":" compression"}
+data: {"type":"token","content":" [1]"}
+
+data: {"type":"done","cascade_tier":"global"}
+```
+
+Frontend reads with:
+```typescript
+const response = await fetch("/api/v1/chat", { method: "POST", body: JSON.stringify({ query }), headers: { "Content-Type": "application/json" } });
+const reader = response.body!.getReader();
+const decoder = new TextDecoder();
+// Parse SSE lines, accumulate tokens, resolve [N] citations from sources
+```
+
+### System Prompt Sketch
+
+```
+You are Chrysopedia, an encyclopedic knowledge assistant for music production techniques.
+Answer the user's question using ONLY the provided context. If the context doesn't contain
+enough information, say "I don't have enough information about that topic yet."
+
+Cite your sources using [N] markers where N corresponds to the numbered source.
+Be concise and technical. Focus on practical knowledge producers can use immediately.
+
+## Sources
+[1] {title} by {creator_name} — {summary}
+[2] {title} by {creator_name} — {summary}
+...
+
+## Question
+{user_query}
+```
--- a/.gsd/milestones/M021/slices/S03/tasks/T01-PLAN.md
+++ b/.gsd/milestones/M021/slices/S03/tasks/T01-PLAN.md
@ -0,0 +1,32 @@
+---
+estimated_steps: 1
+estimated_files: 4
+skills_used: []
+---
+
+# T01: Build chat service, SSE endpoint, and integration tests
+
+Create the backend chat engine: a ChatService class that retrieves context via SearchService.search(), assembles a numbered context block for the LLM system prompt, streams completion tokens via openai.AsyncOpenAI with stream=True, and yields SSE events. Create a FastAPI router at POST /api/v1/chat that accepts {query: str, creator?: str}, validates query length (1-1000 chars), and returns a StreamingResponse with text/event-stream content type. SSE protocol: first a 'sources' event with citation metadata array, then 'token' events for each streamed chunk, then a 'done' event with cascade_tier. On LLM error mid-stream, emit an 'error' event. Wire the router into main.py. Write integration tests in tests/test_chat.py that mock both SearchService.search() and the OpenAI streaming response to verify: (1) valid SSE format with all three event types, (2) citation numbering matches sources, (3) creator param forwarded to search, (4) empty query returns 422, (5) LLM error produces SSE error event.
+
+## Inputs
+
+- ``backend/search_service.py` — SearchService.search() with creator param and cascade_tier in return dict`
+- ``backend/schemas.py` — SearchResultItem fields (title, slug, creator_name, topic_category, summary, section_anchor, section_heading) used as citation metadata`
+- ``backend/config.py` — Settings with llm_api_url, llm_api_key, llm_model for AsyncOpenAI client`
+- ``backend/main.py` — Router registration pattern to follow`
+- ``backend/tests/conftest.py` — Test fixtures (client, db_engine) for integration tests`
+
+## Expected Output
+
+- ``backend/chat_service.py` — New ChatService class with retrieve-prompt-stream pipeline`
+- ``backend/routers/chat.py` — New POST /api/v1/chat endpoint with SSE streaming`
+- ``backend/main.py` — Updated with chat router import and include_router call`
+- ``backend/tests/test_chat.py` — Integration tests verifying SSE protocol, citations, error handling`
+
+## Verification
+
+cd /home/aux/projects/content-to-kb-automator/backend && python -m py_compile chat_service.py && python -m py_compile routers/chat.py && python -m pytest tests/test_chat.py -v
+
+## Observability Impact
+
+Adds structured logging: chat query with creator, cascade_tier, source count, latency_ms on each request. LLM stream errors logged with full traceback. SSE error event sent to client on failure. SSE done event includes cascade_tier for downstream inspection.
--- a/.gsd/milestones/M021/slices/S03/tasks/T01-SUMMARY.md
+++ b/.gsd/milestones/M021/slices/S03/tasks/T01-SUMMARY.md
@ -0,0 +1,83 @@
+---
+id: T01
+parent: S03
+milestone: M021
+provides: []
+requires: []
+affects: []
+key_files: ["backend/chat_service.py", "backend/routers/chat.py", "backend/main.py", "backend/tests/test_chat.py"]
+key_decisions: ["Tests use standalone ASGI client with mocked DB to avoid PostgreSQL dependency", "Patch openai.AsyncOpenAI constructor rather than instance attribute for test mocking"]
+patterns_established: []
+drill_down_paths: []
+observability_surfaces: []
+duration: ""
+verification_result: "All verification commands pass: py_compile for chat_service.py and routers/chat.py succeed, pytest tests/test_chat.py -v runs 6/6 tests passing in 0.5s. Tests cover SSE format ordering, citation numbering, creator param forwarding, empty query 422, missing query 422, and LLM error event."
+completed_at: 2026-04-04T05:19:28.873Z
+blocker_discovered: false
+---
+
+# T01: Built ChatService with retrieve-prompt-stream pipeline, POST /api/v1/chat SSE endpoint, and 6 passing integration tests covering SSE format, citations, creator forwarding, validation, and error handling
+
+> Built ChatService with retrieve-prompt-stream pipeline, POST /api/v1/chat SSE endpoint, and 6 passing integration tests covering SSE format, citations, creator forwarding, validation, and error handling
+
+## What Happened
+---
+id: T01
+parent: S03
+milestone: M021
+key_files:
+  - backend/chat_service.py
+  - backend/routers/chat.py
+  - backend/main.py
+  - backend/tests/test_chat.py
+key_decisions:
+  - Tests use standalone ASGI client with mocked DB to avoid PostgreSQL dependency
+  - Patch openai.AsyncOpenAI constructor rather than instance attribute for test mocking
+duration: ""
+verification_result: passed
+completed_at: 2026-04-04T05:19:28.873Z
+blocker_discovered: false
+---
+
+# T01: Built ChatService with retrieve-prompt-stream pipeline, POST /api/v1/chat SSE endpoint, and 6 passing integration tests covering SSE format, citations, creator forwarding, validation, and error handling
+
+**Built ChatService with retrieve-prompt-stream pipeline, POST /api/v1/chat SSE endpoint, and 6 passing integration tests covering SSE format, citations, creator forwarding, validation, and error handling**
+
+## What Happened
+
+Created backend/chat_service.py with a ChatService class that retrieves context via SearchService.search(), builds a numbered context block for the LLM system prompt, and streams completion tokens via openai.AsyncOpenAI with stream=True. The stream yields SSE events: sources (citation metadata), token (streamed chunks), done (cascade_tier), and error (on LLM failure). Created backend/routers/chat.py with POST /api/v1/chat accepting {query, creator?} with Pydantic validation (1-1000 chars). Wired into main.py. Wrote 6 integration tests with a standalone ASGI client that mocks DB, SearchService, and OpenAI — no PostgreSQL required.
+
+## Verification
+
+All verification commands pass: py_compile for chat_service.py and routers/chat.py succeed, pytest tests/test_chat.py -v runs 6/6 tests passing in 0.5s. Tests cover SSE format ordering, citation numbering, creator param forwarding, empty query 422, missing query 422, and LLM error event.
+
+## Verification Evidence
+
+| # | Command | Exit Code | Verdict | Duration |
+|---|---------|-----------|---------|----------|
+| 1 | `python -m py_compile chat_service.py` | 0 | ✅ pass | 500ms |
+| 2 | `python -m py_compile routers/chat.py` | 0 | ✅ pass | 500ms |
+| 3 | `python -m pytest tests/test_chat.py -v` | 0 | ✅ pass (6/6) | 510ms |
+
+
+## Deviations
+
+Tests use a standalone chat_client fixture with mocked DB session instead of conftest.py client fixture (which requires PostgreSQL). Added a 6th test for missing query field beyond the 5 specified.
+
+## Known Issues
+
+None.
+
+## Files Created/Modified
+
+- `backend/chat_service.py`
+- `backend/routers/chat.py`
+- `backend/main.py`
+- `backend/tests/test_chat.py`
+
+
+## Deviations
+Tests use a standalone chat_client fixture with mocked DB session instead of conftest.py client fixture (which requires PostgreSQL). Added a 6th test for missing query field beyond the 5 specified.
+
+## Known Issues
+None.
--- a/.gsd/milestones/M021/slices/S03/tasks/T02-PLAN.md
+++ b/.gsd/milestones/M021/slices/S03/tasks/T02-PLAN.md
@ -0,0 +1,28 @@
+---
+estimated_steps: 1
+estimated_files: 4
+skills_used: []
+---
+
+# T02: Build frontend chat page with SSE streaming and citation display
+
+Create the user-facing chat interface. Build an SSE client function in api/chat.ts that POSTs to /api/v1/chat using raw fetch(), reads the response body as a ReadableStream, and parses SSE events (sources, token, done, error). Build ChatPage.tsx with: (1) a text input and submit button, (2) streaming message display that accumulates token events into rendered text, (3) a citation source list rendered from the sources event with links to /techniques/:slug (using section_anchor for deep links), (4) loading and error states. Style with ChatPage.module.css matching the existing dark theme (use CSS variables from the app). Add a lazy-loaded /chat route in App.tsx. Add a 'Chat' navigation link in the header bar visible on all pages. The citation display should parse [N] markers in the streamed text and render them as superscript links to the source list, reusing the regex pattern from utils/citations.tsx but with chat-specific source items instead of KeyMomentSummary.
+
+## Inputs
+
+- ``backend/routers/chat.py` — SSE protocol shape (sources/token/done/error event types and their JSON payloads)`
+- ``frontend/src/api/client.ts` — BASE url constant and auth token pattern`
+- ``frontend/src/utils/citations.tsx` — Citation regex pattern CITATION_RE and rendering approach to reuse`
+- ``frontend/src/App.tsx` — Existing route and lazy-load patterns to follow`
+- ``frontend/src/pages/SearchResults.tsx` — Reference for dark theme styling patterns and result card layout`
+
+## Expected Output
+
+- ``frontend/src/api/chat.ts` — SSE client function for POST /api/v1/chat`
+- ``frontend/src/pages/ChatPage.tsx` — Chat page component with input, streaming display, citations`
+- ``frontend/src/pages/ChatPage.module.css` — Styles for chat page matching dark theme`
+- ``frontend/src/App.tsx` — Updated with /chat route and Chat nav link`
+
+## Verification
+
+cd /home/aux/projects/content-to-kb-automator/frontend && npm run build
--- a/backend/chat_service.py
+++ b/backend/chat_service.py
@ -0,0 +1,178 @@
+"""Chat service: retrieve context via search, stream LLM response as SSE events.
+
+Assembles a numbered context block from search results, then streams
+completion tokens from an OpenAI-compatible API. Yields SSE-formatted
+events: sources, token, done, and error.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import time
+import traceback
+from typing import Any, AsyncIterator
+
+import openai
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from config import Settings
+from search_service import SearchService
+
+logger = logging.getLogger("chrysopedia.chat")
+
+_SYSTEM_PROMPT_TEMPLATE = """\
+You are Chrysopedia, an expert encyclopedic assistant for music production techniques.
+Answer the user's question using ONLY the numbered sources below. Cite sources by
+writing [N] inline (e.g. [1], [2]) where N is the source number. If the sources
+do not contain enough information, say so honestly — do not invent facts.
+
+Sources:
+{context_block}
+"""
+
+_MAX_CONTEXT_SOURCES = 10
+
+
+class ChatService:
+    """Retrieve context from search, stream an LLM response with citations."""
+
+    def __init__(self, settings: Settings) -> None:
+        self.settings = settings
+        self._search = SearchService(settings)
+        self._openai = openai.AsyncOpenAI(
+            base_url=settings.llm_api_url,
+            api_key=settings.llm_api_key,
+        )
+
+    async def stream_response(
+        self,
+        query: str,
+        db: AsyncSession,
+        creator: str | None = None,
+    ) -> AsyncIterator[str]:
+        """Yield SSE-formatted events for a chat query.
+
+        Protocol:
+        1. ``event: sources\ndata: <json array of citation metadata>\n\n``
+        2. ``event: token\ndata: <text chunk>\n\n`` (repeated)
+        3. ``event: done\ndata: <json with cascade_tier>\n\n``
+        On error: ``event: error\ndata: <json with message>\n\n``
+        """
+        start = time.monotonic()
+
+        # ── 1. Retrieve context via search ──────────────────────────────
+        try:
+            search_result = await self._search.search(
+                query=query,
+                scope="all",
+                limit=_MAX_CONTEXT_SOURCES,
+                db=db,
+                creator=creator,
+            )
+        except Exception:
+            logger.exception("chat_search_error query=%r creator=%r", query, creator)
+            yield _sse("error", {"message": "Search failed"})
+            return
+
+        items: list[dict[str, Any]] = search_result.get("items", [])
+        cascade_tier: str = search_result.get("cascade_tier", "")
+
+        # ── 2. Build citation metadata and context block ────────────────
+        sources = _build_sources(items)
+        context_block = _build_context_block(items)
+
+        logger.info(
+            "chat_search query=%r creator=%r cascade_tier=%s source_count=%d",
+            query, creator, cascade_tier, len(sources),
+        )
+
+        # Emit sources event first
+        yield _sse("sources", sources)
+
+        # ── 3. Stream LLM completion ────────────────────────────────────
+        system_prompt = _SYSTEM_PROMPT_TEMPLATE.format(context_block=context_block)
+
+        try:
+            stream = await self._openai.chat.completions.create(
+                model=self.settings.llm_model,
+                messages=[
+                    {"role": "system", "content": system_prompt},
+                    {"role": "user", "content": query},
+                ],
+                stream=True,
+                temperature=0.3,
+                max_tokens=2048,
+            )
+
+            async for chunk in stream:
+                choice = chunk.choices[0] if chunk.choices else None
+                if choice and choice.delta and choice.delta.content:
+                    yield _sse("token", choice.delta.content)
+
+        except Exception:
+            tb = traceback.format_exc()
+            logger.error("chat_llm_error query=%r\n%s", query, tb)
+            yield _sse("error", {"message": "LLM generation failed"})
+            return
+
+        # ── 4. Done event ───────────────────────────────────────────────
+        latency_ms = (time.monotonic() - start) * 1000
+        logger.info(
+            "chat_done query=%r creator=%r cascade_tier=%s source_count=%d latency_ms=%.1f",
+            query, creator, cascade_tier, len(sources), latency_ms,
+        )
+        yield _sse("done", {"cascade_tier": cascade_tier})
+
+
+# ── Helpers ──────────────────────────────────────────────────────────────────
+
+
+def _sse(event: str, data: Any) -> str:
+    """Format a single SSE event string."""
+    payload = json.dumps(data) if not isinstance(data, str) else data
+    return f"event: {event}\ndata: {payload}\n\n"
+
+
+def _build_sources(items: list[dict[str, Any]]) -> list[dict[str, str]]:
+    """Build a numbered citation metadata list from search result items."""
+    sources: list[dict[str, str]] = []
+    for idx, item in enumerate(items, start=1):
+        sources.append({
+            "number": idx,
+            "title": item.get("title", ""),
+            "slug": item.get("technique_page_slug", "") or item.get("slug", ""),
+            "creator_name": item.get("creator_name", ""),
+            "topic_category": item.get("topic_category", ""),
+            "summary": (item.get("summary", "") or "")[:200],
+            "section_anchor": item.get("section_anchor", ""),
+            "section_heading": item.get("section_heading", ""),
+        })
+    return sources
+
+
+def _build_context_block(items: list[dict[str, Any]]) -> str:
+    """Build a numbered context block string for the LLM system prompt."""
+    if not items:
+        return "(No sources available)"
+
+    lines: list[str] = []
+    for idx, item in enumerate(items, start=1):
+        title = item.get("title", "Untitled")
+        creator = item.get("creator_name", "")
+        summary = item.get("summary", "")
+        section = item.get("section_heading", "")
+
+        parts = [f"[{idx}] {title}"]
+        if creator:
+            parts.append(f"by {creator}")
+        if section:
+            parts.append(f"— {section}")
+        header = " ".join(parts)
+
+        lines.append(header)
+        if summary:
+            lines.append(f"   {summary}")
+        lines.append("")
+
+    return "\n".join(lines)
--- a/backend/main.py
+++ b/backend/main.py
@ -12,7 +12,7 @@ from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware

 from config import get_settings
-from routers import admin, auth, consent, creator_dashboard, creators, health, ingest, pipeline, reports, search, stats, techniques, topics, videos
+from routers import admin, auth, chat, consent, creator_dashboard, creators, health, ingest, pipeline, reports, search, stats, techniques, topics, videos


 def _setup_logging() -> None:
@ -80,6 +80,7 @@ app.include_router(health.router)
 # Versioned API
 app.include_router(admin.router, prefix="/api/v1")
 app.include_router(auth.router, prefix="/api/v1")
+app.include_router(chat.router, prefix="/api/v1")
 app.include_router(consent.router, prefix="/api/v1")
 app.include_router(creator_dashboard.router, prefix="/api/v1")
 app.include_router(creators.router, prefix="/api/v1")
--- a/backend/routers/chat.py
+++ b/backend/routers/chat.py
@ -0,0 +1,60 @@
+"""Chat endpoint — POST /api/v1/chat with SSE streaming response.
+
+Accepts a query and optional creator filter, returns a Server-Sent Events
+stream with sources, token, done, and error events.
+"""
+
+from __future__ import annotations
+
+import logging
+
+from fastapi import APIRouter, Depends
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel, Field
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from chat_service import ChatService
+from config import Settings, get_settings
+from database import get_session
+
+logger = logging.getLogger("chrysopedia.chat.router")
+
+router = APIRouter(prefix="/chat", tags=["chat"])
+
+
+class ChatRequest(BaseModel):
+    """Request body for the chat endpoint."""
+
+    query: str = Field(..., min_length=1, max_length=1000)
+    creator: str | None = None
+
+
+def _get_chat_service(settings: Settings = Depends(get_settings)) -> ChatService:
+    """Build a ChatService from current settings."""
+    return ChatService(settings)
+
+
+@router.post("")
+async def chat(
+    body: ChatRequest,
+    db: AsyncSession = Depends(get_session),
+    service: ChatService = Depends(_get_chat_service),
+) -> StreamingResponse:
+    """Stream a chat response as Server-Sent Events.
+
+    SSE protocol:
+    - ``event: sources`` — citation metadata array (sent first)
+    - ``event: token``   — streamed text chunk (repeated)
+    - ``event: done``    — completion metadata with cascade_tier
+    - ``event: error``   — error message (on failure)
+    """
+    logger.info("chat_request query=%r creator=%r", body.query, body.creator)
+
+    return StreamingResponse(
+        service.stream_response(query=body.query, db=db, creator=body.creator),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "X-Accel-Buffering": "no",
+        },
+    )
--- a/backend/tests/test_chat.py
+++ b/backend/tests/test_chat.py
@ -0,0 +1,300 @@
+"""Integration tests for the chat SSE endpoint.
+
+Mocks SearchService.search() and the OpenAI streaming response to verify:
+1. Valid SSE format with sources, token, and done events
+2. Citation numbering matches the sources array
+3. Creator param forwarded to search
+4. Empty/invalid query returns 422
+5. LLM error produces an SSE error event
+
+These tests use a standalone ASGI client that does NOT require a running
+PostgreSQL instance — the DB session dependency is overridden with a mock.
+"""
+
+from __future__ import annotations
+
+import json
+from typing import Any
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+import pytest_asyncio
+from httpx import ASGITransport, AsyncClient
+
+# Ensure backend/ is on sys.path
+import pathlib
+import sys
+sys.path.insert(0, str(pathlib.Path(__file__).resolve().parent.parent))
+
+from database import get_session  # noqa: E402
+from main import app  # noqa: E402
+
+
+# ── Standalone test client (no DB required) ──────────────────────────────────
+
+@pytest_asyncio.fixture()
+async def chat_client():
+    """Async HTTP test client that mocks out the DB session entirely."""
+    mock_session = AsyncMock()
+
+    async def _mock_get_session():
+        yield mock_session
+
+    app.dependency_overrides[get_session] = _mock_get_session
+
+    transport = ASGITransport(app=app)
+    async with AsyncClient(transport=transport, base_url="http://testserver") as ac:
+        yield ac
+
+    app.dependency_overrides.pop(get_session, None)
+
+
+# ── Helpers ──────────────────────────────────────────────────────────────────
+
+
+def _parse_sse(body: str) -> list[dict[str, Any]]:
+    """Parse SSE text into a list of {event, data} dicts."""
+    events: list[dict[str, Any]] = []
+    current_event: str | None = None
+    current_data: str | None = None
+
+    for line in body.split("\n"):
+        if line.startswith("event: "):
+            current_event = line[len("event: "):]
+        elif line.startswith("data: "):
+            current_data = line[len("data: "):]
+        elif line == "" and current_event is not None and current_data is not None:
+            try:
+                parsed = json.loads(current_data)
+            except json.JSONDecodeError:
+                parsed = current_data
+            events.append({"event": current_event, "data": parsed})
+            current_event = None
+            current_data = None
+
+    return events
+
+
+def _fake_search_result(
+    items: list[dict[str, Any]] | None = None,
+    cascade_tier: str = "global",
+) -> dict[str, Any]:
+    """Build a fake SearchService.search() return value."""
+    if items is None:
+        items = [
+            {
+                "title": "Snare Compression",
+                "slug": "snare-compression",
+                "technique_page_slug": "snare-compression",
+                "creator_name": "Keota",
+                "topic_category": "Mixing",
+                "summary": "How to compress a snare drum for punch and presence.",
+                "section_anchor": "",
+                "section_heading": "",
+                "type": "technique_page",
+                "score": 0.9,
+            },
+            {
+                "title": "Parallel Processing",
+                "slug": "parallel-processing",
+                "technique_page_slug": "parallel-processing",
+                "creator_name": "Skope",
+                "topic_category": "Mixing",
+                "summary": "Using parallel compression for dynamics control.",
+                "section_anchor": "bus-setup",
+                "section_heading": "Bus Setup",
+                "type": "technique_page",
+                "score": 0.85,
+            },
+        ]
+    return {
+        "items": items,
+        "partial_matches": [],
+        "total": len(items),
+        "query": "snare compression",
+        "fallback_used": False,
+        "cascade_tier": cascade_tier,
+    }
+
+
+def _mock_openai_stream(chunks: list[str]):
+    """Create a mock async iterator that yields OpenAI-style stream chunks."""
+
+    class FakeChoice:
+        def __init__(self, text: str | None):
+            self.delta = MagicMock()
+            self.delta.content = text
+
+    class FakeChunk:
+        def __init__(self, text: str | None):
+            self.choices = [FakeChoice(text)]
+
+    class FakeStream:
+        def __init__(self, chunks: list[str]):
+            self._chunks = chunks
+            self._index = 0
+
+        def __aiter__(self):
+            return self
+
+        async def __anext__(self):
+            if self._index >= len(self._chunks):
+                raise StopAsyncIteration
+            chunk = FakeChunk(self._chunks[self._index])
+            self._index += 1
+            return chunk
+
+    return FakeStream(chunks)
+
+
+def _mock_openai_stream_error():
+    """Create a mock async iterator that raises mid-stream."""
+
+    class FakeStream:
+        def __init__(self):
+            self._yielded = False
+
+        def __aiter__(self):
+            return self
+
+        async def __anext__(self):
+            if not self._yielded:
+                self._yielded = True
+                raise RuntimeError("LLM connection lost")
+            raise StopAsyncIteration
+
+    return FakeStream()
+
+
+# ── Tests ────────────────────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_chat_sse_format_and_events(chat_client):
+    """SSE stream contains sources, token(s), and done events in order."""
+    search_result = _fake_search_result()
+    token_chunks = ["Snare compression ", "uses [1] to add ", "punch. See also [2]."]
+
+    mock_openai_client = MagicMock()
+    mock_openai_client.chat.completions.create = AsyncMock(
+        return_value=_mock_openai_stream(token_chunks)
+    )
+
+    with (
+        patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
+        patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
+    ):
+        resp = await chat_client.post("/api/v1/chat", json={"query": "snare compression"})
+
+    assert resp.status_code == 200
+    assert "text/event-stream" in resp.headers.get("content-type", "")
+
+    events = _parse_sse(resp.text)
+    event_types = [e["event"] for e in events]
+
+    # Must have sources first, then tokens, then done
+    assert event_types[0] == "sources"
+    assert "token" in event_types
+    assert event_types[-1] == "done"
+
+    # Sources event is a list
+    sources_data = events[0]["data"]
+    assert isinstance(sources_data, list)
+    assert len(sources_data) == 2
+
+    # Done event has cascade_tier
+    done_data = events[-1]["data"]
+    assert "cascade_tier" in done_data
+
+
+@pytest.mark.asyncio
+async def test_chat_citation_numbering(chat_client):
+    """Citation numbers in sources array match 1-based indexing."""
+    search_result = _fake_search_result()
+
+    mock_openai_client = MagicMock()
+    mock_openai_client.chat.completions.create = AsyncMock(
+        return_value=_mock_openai_stream(["hello"])
+    )
+
+    with (
+        patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
+        patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
+    ):
+        resp = await chat_client.post("/api/v1/chat", json={"query": "compression"})
+
+    events = _parse_sse(resp.text)
+    sources = events[0]["data"]
+
+    assert sources[0]["number"] == 1
+    assert sources[0]["title"] == "Snare Compression"
+    assert sources[1]["number"] == 2
+    assert sources[1]["title"] == "Parallel Processing"
+    assert sources[1]["section_anchor"] == "bus-setup"
+
+
+@pytest.mark.asyncio
+async def test_chat_creator_forwarded_to_search(chat_client):
+    """Creator parameter is passed through to SearchService.search()."""
+    search_result = _fake_search_result()
+
+    mock_openai_client = MagicMock()
+    mock_openai_client.chat.completions.create = AsyncMock(
+        return_value=_mock_openai_stream(["ok"])
+    )
+
+    with (
+        patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result) as mock_search,
+        patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
+    ):
+        resp = await chat_client.post(
+            "/api/v1/chat",
+            json={"query": "drum mixing", "creator": "keota"},
+        )
+
+    assert resp.status_code == 200
+    mock_search.assert_called_once()
+    call_kwargs = mock_search.call_args.kwargs
+    assert call_kwargs.get("creator") == "keota"
+
+
+@pytest.mark.asyncio
+async def test_chat_empty_query_returns_422(chat_client):
+    """An empty query string should fail Pydantic validation with 422."""
+    resp = await chat_client.post("/api/v1/chat", json={"query": ""})
+    assert resp.status_code == 422
+
+
+@pytest.mark.asyncio
+async def test_chat_missing_query_returns_422(chat_client):
+    """Missing query field should fail with 422."""
+    resp = await chat_client.post("/api/v1/chat", json={})
+    assert resp.status_code == 422
+
+
+@pytest.mark.asyncio
+async def test_chat_llm_error_produces_error_event(chat_client):
+    """When the LLM raises mid-stream, an error SSE event is emitted."""
+    search_result = _fake_search_result()
+
+    mock_openai_client = MagicMock()
+    mock_openai_client.chat.completions.create = AsyncMock(
+        return_value=_mock_openai_stream_error()
+    )
+
+    with (
+        patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
+        patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
+    ):
+        resp = await chat_client.post("/api/v1/chat", json={"query": "test error"})
+
+    assert resp.status_code == 200  # SSE streams always return 200
+
+    events = _parse_sse(resp.text)
+    event_types = [e["event"] for e in events]
+
+    assert "sources" in event_types
+    assert "error" in event_types
+
+    error_event = next(e for e in events if e["event"] == "error")
+    assert "message" in error_event["data"]