Table of Contents
- Chat Engine
- Architecture
- SSE Protocol
- Multi-Turn Conversation Memory (M022/S04)
- Citation Format
- API Endpoint
- Backend: ChatService
- Frontend: ChatPage
- Frontend: ChatWidget (M022/S03)
- Personality Interpolation (M023/S02, S04)
- Personality Interpolation (M023/S02, S04)
- Citation Metadata Propagation (M024/S05)
- LLM Fallback Resilience (M025/S08)
- Refined System Prompt (M025/S09)
- Chat Quality Evaluation Toolkit (M025/S09)
- Key Files
- Design Decisions
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Chat Engine
Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03, expanded with multi-turn memory in M022/S04 and chat widget in M022/S03.
Architecture
User types question in ChatPage or ChatWidget
│
▼
POST /api/v1/chat { query, creator?, conversation_id? }
│
▼
ChatService.stream(query, creator?, conversation_id?)
│
├─ 1. Load history: Redis chrysopedia:chat:{conversation_id}
│
├─ 2. Retrieve: SearchService.search(query, creator)
│ └─ Uses 4-tier cascade if creator provided (see [[Search-Retrieval]])
│
├─ 3. Prompt: System prompt + history + numbered context + user message
│ └─ Sources formatted as [1] Title — Summary for citation mapping
│
├─ 4. Stream: openai.AsyncOpenAI with stream=True
│ └─ Tokens streamed as SSE events in real-time
│
├─ 5. Save history: Append user message + assistant response to Redis
│
▼
SSE response → ChatPage/ChatWidget renders tokens + citation links
SSE Protocol
The chat endpoint returns a text/event-stream response with four event types in strict order:
| Event | Payload | When |
|---|---|---|
sources |
[{title, slug, creator_name, summary, source_video_id, start_time, end_time, video_filename}] |
First — citation metadata for link rendering. Video fields added in M024/S05 for timestamp badges. |
token |
string (text chunk) |
Repeated — streamed LLM completion tokens |
done |
{cascade_tier, conversation_id} |
Once — signals completion, includes retrieval tier and conversation ID |
error |
{message: string} |
On failure — emitted if LLM errors mid-stream |
The cascade_tier in the done event reveals which tier of the retrieval cascade served the context. The conversation_id enables the frontend to thread follow-up messages.
Multi-Turn Conversation Memory (M022/S04)
Redis Storage
- Key pattern:
chrysopedia:chat:{conversation_id} - Format: Single JSON string containing a list of
{role, content}message dicts - TTL: 1 hour, refreshed on each interaction
- Cap: 10 turn pairs (20 messages) — oldest pairs trimmed when exceeded
Conversation Flow
- Client sends
conversation_idin POST body (or omits for new conversation) - Server auto-generates UUID when
conversation_idis omitted - History loaded from Redis and injected between system prompt and user message
- Assistant response accumulated during streaming
- User message + assistant response appended to history in Redis
conversation_idreturned in SSEdoneevent for threading
Citation Format
The LLM is instructed to reference sources using numbered citations [N] in its response. The frontend parses these into superscript links:
[1]→ links to/techniques/:slugfor the corresponding source- Multiple citations supported:
[1][3]or[1,3] - Citation regex:
/\[(\d+)\]/gparsed by sharedparseChatCitations()utility (M024/S05) - Timestamp badges — when
start_timeis defined, source cards show a badge linking to/watch/:id?t=N(M024/S05) - Video filename — displayed as subtle metadata on source cards (M024/S05)
API Endpoint
POST /api/v1/chat
| Field | Type | Required | Validation |
|---|---|---|---|
query |
string | Yes | 1–1000 characters |
creator |
string | No | Creator UUID or slug for scoped retrieval |
conversation_id |
string | No | UUID for multi-turn threading. Auto-generated if omitted. |
Response: text/event-stream (SSE)
Error responses:
422— Empty or missing query, query exceeds 1000 chars
Backend: ChatService
Located in backend/chat_service.py. The retrieve-prompt-stream pipeline:
- Load History —
_load_history()reads from Redis keychrysopedia:chat:{conversation_id}. Returns empty list if key absent. - Retrieve — Calls
SearchService.search()with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier. - Prompt — Builds message array: system prompt → conversation history → numbered context block → user message. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with
[N]notation, and stay grounded in the provided context. - Stream — Opens an async streaming completion via
openai.AsyncOpenAI. Yields SSE events as tokens arrive. - Save History —
_save_history()appends the user message and accumulated assistant response to Redis. Trims to 10 turn pairs if exceeded. Refreshes TTL to 1 hour.
Error handling: If the LLM fails mid-stream (after some tokens have been sent), an error event is emitted so the frontend can display a failure message rather than leaving the response hanging.
Frontend: ChatPage
Route: /chat (lazy-loaded, code-split)
Components
- Multi-message conversation UI — Messages array with conversation bubble layout
- Conversation threading —
conversationIdstate, "New conversation" button to reset - Streaming message display — Accumulates tokens with blinking cursor animation during streaming
- Typing indicator — Three-dot animation while streaming
- Citation markers —
[N]parsed to superscript links targeting/techniques/:slug(per-message) - Source list — Numbered sources with creator attribution displayed below each response
- Auto-scroll — Scrolls to bottom as new tokens arrive
SSE Client
Located in frontend/src/api/chat.ts. Uses fetch() + ReadableStream with typed callbacks:
streamChat(query, {
onSources: (sources) => void,
onToken: (token) => void,
onDone: (data: ChatDoneMeta) => void,
onError: (error) => void,
}, creatorName?, conversationId?)
ChatDoneMeta type includes cascade_tier and conversation_id fields.
Frontend: ChatWidget (M022/S03)
Floating chat bubble on creator detail pages. Fixed-position bottom-right.
Behavior
- Bubble → click → slide-up panel with conversation UI
- Creator-scoped: passes
creatorNametostreamChat()for retrieval cascade - Suggested questions generated client-side from technique titles and categories
- Typing indicator — three-dot animation during streaming
- Citation links — parsed from response, linked to technique pages
- Responsive — full-width below 640px, 400px panel on desktop
- Conversation threading —
conversationIdgenerated viacrypto.randomUUID()on first send, threaded throughstreamChat(), updated from done event - Reset on close — messages and conversationId cleared when panel closes
Personality Interpolation (M023/S02, S04)
The chat engine modulates the system prompt based on (0.0–1.0) sent in the request. When a creator is specified, their JSONB is loaded and progressively injected into the system prompt.
5-Tier System (D044)
| Weight Range | Tier | Profile Fields Included | Instruction |
|---|---|---|---|
| < 0.2 | None | — | Pure encyclopedic (no personality block) |
| 0.2–0.39 | Subtle Reference | Basic tone | "Subtly reference this creator's style" |
| 0.4–0.59 | Creator Tone | + descriptors, explanation approach | "Adopt this creator's teaching tone" |
| 0.6–0.79 | Creator Voice | + signature phrases (count scaled with weight) | "Channel this creator's voice" |
| 0.8–0.89 | Full Voice | + vocabulary, style markers | "Speak in this creator's voice" |
| 0.9–1.0 | Full Embodiment | + summary paragraph | "Fully embody this creator" |
Temperature Scaling
Linear:
- weight=0.0 → temperature 0.3 (encyclopedic, precise)
- weight=1.0 → temperature 0.5 (still grounded, slight creative variance)
Request Format
Graceful Fallback
If the creator has no (null JSONB), the system falls back to pure encyclopedic mode regardless of weight value. DB errors during profile fetch are caught and logged — never crash the stream.
Personality Interpolation (M023/S02, S04)
The chat engine modulates the system prompt based on personality_weight (0.0–1.0) sent in the request. When a creator is specified, their personality_profile JSONB is loaded and progressively injected into the system prompt.
5-Tier System (D044)
| Weight Range | Tier | Profile Fields Included | Instruction |
|---|---|---|---|
| < 0.2 | None | — | Pure encyclopedic (no personality block) |
| 0.2–0.39 | Subtle Reference | Basic tone | "Subtly reference this creator's style" |
| 0.4–0.59 | Creator Tone | + descriptors, explanation approach | "Adopt this creator's teaching tone" |
| 0.6–0.79 | Creator Voice | + signature phrases (count scaled with weight) | "Channel this creator's voice" |
| 0.8–0.89 | Full Voice | + vocabulary, style markers | "Speak in this creator's voice" |
| 0.9–1.0 | Full Embodiment | + summary paragraph | "Fully embody this creator" |
Temperature Scaling
Linear: temperature = 0.3 + weight * 0.2
- weight=0.0 → temperature 0.3 (encyclopedic, precise)
- weight=1.0 → temperature 0.5 (still grounded, slight creative variance)
Request Format
{
"query": "How do I make a reese bass?",
"creator": "creator-slug",
"personality_weight": 0.7,
"conversation_id": "uuid"
}
Graceful Fallback
If the creator has no personality_profile (null JSONB), the system falls back to pure encyclopedic mode regardless of weight value. DB errors during profile fetch are caught and logged — never crash the stream.
Citation Metadata Propagation (M024/S05)
End-to-end video metadata flow from search results through SSE to frontend source cards.
Backend
search_service.py—_enrich_qdrant_results()and_keyword_search_and()now batch-fetchSourceVideofilenames and includesource_video_id,start_time,end_time,video_filenamein result dicts. Non-key_moment types get empty/None values for uniform dict shape.chat_service.py—_build_sources()passes all four video fields through to SSE source events.
Frontend
ChatSourceinterface (inapi/chat.ts) extended withsource_video_id,start_time,end_time,video_filenamefields.utils/chatCitations.tsx— sharedparseChatCitations()replaces duplicate implementations in ChatPage and ChatWidget. Accepts CSS module styles asRecord<string, string>.utils/formatTime.ts— shared hour-aware time formatter used across timestamp badges and player controls.- Source cards now show: timestamp badge (links to
/watch/:id?t=Nwhenstart_timedefined) and video filename metadata.
LLM Fallback Resilience (M025/S08)
ChatService maintains two AsyncOpenAI clients: primary (DGX endpoint) and fallback (Ollama). When the primary create() call fails with APIConnectionError, APITimeoutError, or InternalServerError, the entire streaming call is retried with the fallback client. The SSE done event includes fallback_used: true/false so the frontend and usage logging know which model actually served the response.
- Config:
LLM_FALLBACK_URLandLLM_FALLBACK_MODELin docker-compose.yml - Logging:
chat_llm_fallbackWARNING when primary fails and fallback activates - Usage tracking:
ChatUsageLog.modelrecords actual model name (primary or fallback)
Refined System Prompt (M025/S09)
The system prompt was rewritten from 5 lines to a structured template covering:
- Citation density: Cite every factual claim inline with
[N]markers - Response format: Short paragraphs, bullet lists for step-by-step, bold key terms
- Domain terminology: Music production context awareness
- Conflicting sources: Present both perspectives with attribution
- Response length: 2-4 paragraphs default, adjust to query complexity
Kept under 20 lines using markdown headers for structure. All 26 existing chat tests pass unchanged.
Chat Quality Evaluation Toolkit (M025/S09)
A 5-dimension LLM-as-judge evaluation framework:
- Scorer (
backend/pipeline/quality/chat_scorer.py): Grades responses on citation_accuracy, response_structure, domain_expertise, source_grounding, personality_fidelity - Eval Harness (
backend/pipeline/quality/chat_eval.py): SSE-parsing runner that calls the live chat endpoint and feeds responses to the scorer - Test Suite (
backend/pipeline/quality/fixtures/chat_test_suite.yaml): 10 queries across technical, conceptual, creator-scoped, and cross-creator categories - CLI:
python -m pipeline.quality chat_evalsubcommand for automated evaluation runs - Baseline report: Documented in S09-QUALITY-REPORT.md with JSON results
Key Files
backend/chat_service.py— ChatService with history load/save, retrieve-prompt-stream pipelinebackend/routers/chat.py— POST /api/v1/chat endpoint with conversation_id supportbackend/tests/test_chat.py— 13 tests (6 streaming + 7 conversation memory)frontend/src/api/chat.ts— SSE client with conversationId param and ChatDoneMeta typefrontend/src/pages/ChatPage.tsx— Multi-message conversation UIfrontend/src/pages/ChatPage.module.css— Conversation bubble layout stylesfrontend/src/components/ChatWidget.tsx— Floating chat widget componentfrontend/src/components/ChatWidget.module.css— Widget styles (38 custom property refs)frontend/src/utils/chatCitations.tsx— Shared citation parser (M024/S05)frontend/src/utils/formatTime.ts— Shared time formatter (M024/S05)
Design Decisions
- Redis JSON string — Conversation history stored as single JSON value (atomic read/write) rather than Redis list type
- Auto-generate conversation_id — Server creates UUID when client omits it, ensuring consistent
doneevent shape - Widget resets on close — Clean slate UX; no persistence across open/close cycles
- Client-side suggested questions — Generated from technique titles/categories without API call
- 5-tier interpolation — Progressive field inclusion replaces 3-tier step function (D044 supersedes D043)
- 5-tier interpolation — Progressive field inclusion replaces 3-tier step function (D044 supersedes D043)
- Shared citation parsing —
parseChatCitations()inutils/chatCitations.tsxreplaces duplicate implementations (M024/S05) - Standalone ASGI test client — Tests use mocked DB to avoid PostgreSQL dependency
See also: Search-Retrieval, API-Surface, Frontend
Chrysopedia Wiki
Architecture
Features
- Chat-Engine
- Search-Retrieval
- Highlights
- Personality-Profiles
- Posts (via Post Editor)
Reference
Operations