This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Chat Engine

Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03, expanded with multi-turn memory in M022/S04 and chat widget in M022/S03.

Architecture

User types question in ChatPage or ChatWidget
        │
        ▼
POST /api/v1/chat  { query, creator?, conversation_id? }
        │
        ▼
ChatService.stream(query, creator?, conversation_id?)
        │
        ├─ 1. Load history: Redis chrysopedia:chat:{conversation_id}
        │
        ├─ 2. Retrieve: SearchService.search(query, creator)
        │     └─ Uses 4-tier cascade if creator provided (see [[Search-Retrieval]])
        │
        ├─ 3. Prompt: System prompt + history + numbered context + user message
        │     └─ Sources formatted as [1] Title — Summary for citation mapping
        │
        ├─ 4. Stream: openai.AsyncOpenAI with stream=True
        │     └─ Tokens streamed as SSE events in real-time
        │
        ├─ 5. Save history: Append user message + assistant response to Redis
        │
        ▼
SSE response → ChatPage/ChatWidget renders tokens + citation links

SSE Protocol

The chat endpoint returns a text/event-stream response with four event types in strict order:

Event	Payload	When
`sources`	`[{title, slug, creator_name, summary, source_video_id, start_time, end_time, video_filename}]`	First — citation metadata for link rendering. Video fields added in M024/S05 for timestamp badges.
`token`	`string` (text chunk)	Repeated — streamed LLM completion tokens
`done`	`{cascade_tier, conversation_id}`	Once — signals completion, includes retrieval tier and conversation ID
`error`	`{message: string}`	On failure — emitted if LLM errors mid-stream

The cascade_tier in the done event reveals which tier of the retrieval cascade served the context. The conversation_id enables the frontend to thread follow-up messages.

Multi-Turn Conversation Memory (M022/S04)

Redis Storage

Key pattern: chrysopedia:chat:{conversation_id}
Format: Single JSON string containing a list of {role, content} message dicts
TTL: 1 hour, refreshed on each interaction
Cap: 10 turn pairs (20 messages) — oldest pairs trimmed when exceeded

Conversation Flow

Client sends conversation_id in POST body (or omits for new conversation)
Server auto-generates UUID when conversation_id is omitted
History loaded from Redis and injected between system prompt and user message
Assistant response accumulated during streaming
User message + assistant response appended to history in Redis
conversation_id returned in SSE done event for threading

Citation Format

The LLM is instructed to reference sources using numbered citations [N] in its response. The frontend parses these into superscript links:

[1] → links to /techniques/:slug for the corresponding source
Multiple citations supported: [1][3] or [1,3]
Citation regex: /\[(\d+)\]/g parsed by shared parseChatCitations() utility (M024/S05)
Timestamp badges — when start_time is defined, source cards show a badge linking to /watch/:id?t=N (M024/S05)
Video filename — displayed as subtle metadata on source cards (M024/S05)

API Endpoint

POST /api/v1/chat

Field	Type	Required	Validation
`query`	string	Yes	1–1000 characters
`creator`	string	No	Creator UUID or slug for scoped retrieval
`conversation_id`	string	No	UUID for multi-turn threading. Auto-generated if omitted.

Response: text/event-stream (SSE)

Error responses:

422 — Empty or missing query, query exceeds 1000 chars

Backend: ChatService

Located in backend/chat_service.py. The retrieve-prompt-stream pipeline:

Load History — _load_history() reads from Redis key chrysopedia:chat:{conversation_id}. Returns empty list if key absent.
Retrieve — Calls SearchService.search() with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
Prompt — Builds message array: system prompt → conversation history → numbered context block → user message. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with [N] notation, and stay grounded in the provided context.
Stream — Opens an async streaming completion via openai.AsyncOpenAI. Yields SSE events as tokens arrive.
Save History — _save_history() appends the user message and accumulated assistant response to Redis. Trims to 10 turn pairs if exceeded. Refreshes TTL to 1 hour.

Error handling: If the LLM fails mid-stream (after some tokens have been sent), an error event is emitted so the frontend can display a failure message rather than leaving the response hanging.

Frontend: ChatPage

Route: /chat (lazy-loaded, code-split)

Components

Multi-message conversation UI — Messages array with conversation bubble layout
Conversation threading — conversationId state, "New conversation" button to reset
Streaming message display — Accumulates tokens with blinking cursor animation during streaming
Typing indicator — Three-dot animation while streaming
Citation markers — [N] parsed to superscript links targeting /techniques/:slug (per-message)
Source list — Numbered sources with creator attribution displayed below each response
Auto-scroll — Scrolls to bottom as new tokens arrive

SSE Client

Located in frontend/src/api/chat.ts. Uses fetch() + ReadableStream with typed callbacks:

streamChat(query, {
  onSources: (sources) => void,
  onToken: (token) => void,
  onDone: (data: ChatDoneMeta) => void,
  onError: (error) => void,
}, creatorName?, conversationId?)

ChatDoneMeta type includes cascade_tier and conversation_id fields.

Frontend: ChatWidget (M022/S03)

Floating chat bubble on creator detail pages. Fixed-position bottom-right.

Behavior

Bubble → click → slide-up panel with conversation UI
Creator-scoped: passes creatorName to streamChat() for retrieval cascade
Suggested questions generated client-side from technique titles and categories
Typing indicator — three-dot animation during streaming
Citation links — parsed from response, linked to technique pages
Responsive — full-width below 640px, 400px panel on desktop
Conversation threading — conversationId generated via crypto.randomUUID() on first send, threaded through streamChat(), updated from done event
Reset on close — messages and conversationId cleared when panel closes

Personality Interpolation (M023/S02, S04)

The chat engine modulates the system prompt based on (0.0–1.0) sent in the request. When a creator is specified, their JSONB is loaded and progressively injected into the system prompt.

5-Tier System (D044)

Weight Range	Tier	Profile Fields Included	Instruction
< 0.2	None	—	Pure encyclopedic (no personality block)
0.2–0.39	Subtle Reference	Basic tone	"Subtly reference this creator's style"
0.4–0.59	Creator Tone	+ descriptors, explanation approach	"Adopt this creator's teaching tone"
0.6–0.79	Creator Voice	+ signature phrases (count scaled with weight)	"Channel this creator's voice"
0.8–0.89	Full Voice	+ vocabulary, style markers	"Speak in this creator's voice"
0.9–1.0	Full Embodiment	+ summary paragraph	"Fully embody this creator"

Temperature Scaling

Linear:

weight=0.0 → temperature 0.3 (encyclopedic, precise)
weight=1.0 → temperature 0.5 (still grounded, slight creative variance)

Request Format

Graceful Fallback

If the creator has no (null JSONB), the system falls back to pure encyclopedic mode regardless of weight value. DB errors during profile fetch are caught and logged — never crash the stream.

Personality Interpolation (M023/S02, S04)

The chat engine modulates the system prompt based on personality_weight (0.0–1.0) sent in the request. When a creator is specified, their personality_profile JSONB is loaded and progressively injected into the system prompt.

5-Tier System (D044)

Weight Range	Tier	Profile Fields Included	Instruction
< 0.2	None	—	Pure encyclopedic (no personality block)
0.2–0.39	Subtle Reference	Basic tone	"Subtly reference this creator's style"
0.4–0.59	Creator Tone	+ descriptors, explanation approach	"Adopt this creator's teaching tone"
0.6–0.79	Creator Voice	+ signature phrases (count scaled with weight)	"Channel this creator's voice"
0.8–0.89	Full Voice	+ vocabulary, style markers	"Speak in this creator's voice"
0.9–1.0	Full Embodiment	+ summary paragraph	"Fully embody this creator"

Temperature Scaling

Linear: temperature = 0.3 + weight * 0.2

weight=0.0 → temperature 0.3 (encyclopedic, precise)
weight=1.0 → temperature 0.5 (still grounded, slight creative variance)

Request Format

{
  "query": "How do I make a reese bass?",
  "creator": "creator-slug",
  "personality_weight": 0.7,
  "conversation_id": "uuid"
}

Graceful Fallback

If the creator has no personality_profile (null JSONB), the system falls back to pure encyclopedic mode regardless of weight value. DB errors during profile fetch are caught and logged — never crash the stream.

Citation Metadata Propagation (M024/S05)

End-to-end video metadata flow from search results through SSE to frontend source cards.

Backend

search_service.py — _enrich_qdrant_results() and _keyword_search_and() now batch-fetch SourceVideo filenames and include source_video_id, start_time, end_time, video_filename in result dicts. Non-key_moment types get empty/None values for uniform dict shape.
chat_service.py — _build_sources() passes all four video fields through to SSE source events.

Frontend

ChatSource interface (in api/chat.ts) extended with source_video_id, start_time, end_time, video_filename fields.
utils/chatCitations.tsx — shared parseChatCitations() replaces duplicate implementations in ChatPage and ChatWidget. Accepts CSS module styles as Record<string, string>.
utils/formatTime.ts — shared hour-aware time formatter used across timestamp badges and player controls.
Source cards now show: timestamp badge (links to /watch/:id?t=N when start_time defined) and video filename metadata.

LLM Fallback Resilience (M025/S08)

ChatService maintains two AsyncOpenAI clients: primary (DGX endpoint) and fallback (Ollama). When the primary create() call fails with APIConnectionError, APITimeoutError, or InternalServerError, the entire streaming call is retried with the fallback client. The SSE done event includes fallback_used: true/false so the frontend and usage logging know which model actually served the response.

Config: LLM_FALLBACK_URL and LLM_FALLBACK_MODEL in docker-compose.yml
Logging: chat_llm_fallback WARNING when primary fails and fallback activates
Usage tracking: ChatUsageLog.model records actual model name (primary or fallback)

Refined System Prompt (M025/S09)

The system prompt was rewritten from 5 lines to a structured template covering:

Citation density: Cite every factual claim inline with [N] markers
Response format: Short paragraphs, bullet lists for step-by-step, bold key terms
Domain terminology: Music production context awareness
Conflicting sources: Present both perspectives with attribution
Response length: 2-4 paragraphs default, adjust to query complexity

Kept under 20 lines using markdown headers for structure. All 26 existing chat tests pass unchanged.

Chat Quality Evaluation Toolkit (M025/S09)

A 5-dimension LLM-as-judge evaluation framework:

Scorer (backend/pipeline/quality/chat_scorer.py): Grades responses on citation_accuracy, response_structure, domain_expertise, source_grounding, personality_fidelity
Eval Harness (backend/pipeline/quality/chat_eval.py): SSE-parsing runner that calls the live chat endpoint and feeds responses to the scorer
Test Suite (backend/pipeline/quality/fixtures/chat_test_suite.yaml): 10 queries across technical, conceptual, creator-scoped, and cross-creator categories
CLI: python -m pipeline.quality chat_eval subcommand for automated evaluation runs
Baseline report: Documented in S09-QUALITY-REPORT.md with JSON results

Key Files

backend/chat_service.py — ChatService with history load/save, retrieve-prompt-stream pipeline
backend/routers/chat.py — POST /api/v1/chat endpoint with conversation_id support
backend/tests/test_chat.py — 13 tests (6 streaming + 7 conversation memory)
frontend/src/api/chat.ts — SSE client with conversationId param and ChatDoneMeta type
frontend/src/pages/ChatPage.tsx — Multi-message conversation UI
frontend/src/pages/ChatPage.module.css — Conversation bubble layout styles
frontend/src/components/ChatWidget.tsx — Floating chat widget component
frontend/src/components/ChatWidget.module.css — Widget styles (38 custom property refs)
frontend/src/utils/chatCitations.tsx — Shared citation parser (M024/S05)
frontend/src/utils/formatTime.ts — Shared time formatter (M024/S05)

Design Decisions

Redis JSON string — Conversation history stored as single JSON value (atomic read/write) rather than Redis list type
Auto-generate conversation_id — Server creates UUID when client omits it, ensuring consistent done event shape
Widget resets on close — Clean slate UX; no persistence across open/close cycles
Client-side suggested questions — Generated from technique titles/categories without API call
5-tier interpolation — Progressive field inclusion replaces 3-tier step function (D044 supersedes D043)
5-tier interpolation — Progressive field inclusion replaces 3-tier step function (D044 supersedes D043)
Shared citation parsing — parseChatCitations() in utils/chatCitations.tsx replaces duplicate implementations (M024/S05)
Standalone ASGI test client — Tests use mocked DB to avoid PostgreSQL dependency

See also: Search-Retrieval, API-Surface, Frontend

Chrysopedia Wiki

Architecture

Features

Reference

Operations