test: Added multi-turn conversation memory with Redis-backed history (1…

- "backend/chat_service.py"
- "backend/routers/chat.py"
- "backend/tests/test_chat.py"

GSD-Task: S04/T01
This commit is contained in:
jlightner 2026-04-04 07:50:30 +00:00
parent 3226709382
commit d13d6c3aa1
12 changed files with 778 additions and 28 deletions

View file

@ -8,7 +8,7 @@ Creator-facing tools take shape: shorts queue, follow system, chat widget (UI on
|----|-------|------|---------|------|------------|
| S01 | [A] Highlight Reel + Shorts Queue UI | medium | — | ✅ | Creator reviews auto-detected highlights and short candidates in a review queue — approve, trim, discard |
| S02 | [A] Follow System + Tier UI (Demo Placeholders) | medium | — | ✅ | Users can follow creators. Tier config page has styled Coming Soon payment placeholders. |
| S03 | [A] Chat Widget Shell (UI Only) | low | — | | Chat bubble on creator profile pages with conversation UI, typing indicator, suggested questions |
| S03 | [A] Chat Widget Shell (UI Only) | low | — | | Chat bubble on creator profile pages with conversation UI, typing indicator, suggested questions |
| S04 | [B] Multi-Turn Conversation Memory | medium | — | ⬜ | Multi-turn conversations maintain context across messages using Redis-backed history |
| S05 | [B] Highlight Detection v2 (Audio Signals) | medium | — | ⬜ | Highlight detection uses audio energy analysis (librosa) alongside transcript signals for improved scoring |
| S06 | [B] Personality Profile Extraction | high | — | ⬜ | Personality profiles extracted for 3+ creators showing distinct vocabulary, tone, and style markers |

View file

@ -0,0 +1,78 @@
---
id: S03
parent: M022
milestone: M022
provides:
- ChatWidget component mountable on any page with creatorName + techniques props
- Creator-scoped chat UI ready for multi-turn memory (S04)
requires:
[]
affects:
- S04
- S07
key_files:
- frontend/src/components/ChatWidget.tsx
- frontend/src/components/ChatWidget.module.css
- frontend/src/pages/CreatorDetail.tsx
key_decisions:
- Suggestions generated client-side from technique titles and categories — no API call needed
- Messages stored as local state array with per-message sources/done tracking
- Citation parsing duplicated from ChatPage to keep widget self-contained
patterns_established:
- Fixed-position floating widget pattern: bubble → slide-up panel with responsive breakpoint at 640px
- Creator-scoped chat via streamChat(query, callbacks, creatorName) — same SSE protocol as ChatPage
observability_surfaces:
- none
drill_down_paths:
- .gsd/milestones/M022/slices/S03/tasks/T01-SUMMARY.md
duration: ""
verification_result: passed
completed_at: 2026-04-04T07:43:02.824Z
blocker_discovered: false
---
# S03: [A] Chat Widget Shell (UI Only)
**Floating chat widget on creator profile pages with streaming SSE responses, suggested questions, typing indicator, citation links, and responsive layout.**
## What Happened
Single-task slice delivering the ChatWidget component — a fixed-position chat bubble (bottom-right) that expands into a conversation panel on creator profile pages. The widget calls `streamChat()` scoped to the creator, renders tokens in real-time with a blinking cursor, shows a three-dot typing indicator during streaming, and parses citations into superscript links to technique pages. Suggested questions are generated client-side from the creator's technique titles and categories (no extra API call). Messages accumulate in local state (no persistence). Responsive: full-width below 640px, 400px panel on desktop. CSS module uses 38 `var(--color-*)` references for full theme consistency. Mounted in CreatorDetail.tsx via props.
## Verification
TypeScript compilation (`npx tsc --noEmit`) passed with zero errors. Production build (`npm run build`) succeeded — 118 modules transformed in 2.12s. All three expected files exist: ChatWidget.tsx (11KB), ChatWidget.module.css (8.6KB), CreatorDetail.tsx (imports and mounts widget).
## Requirements Advanced
None.
## Requirements Validated
None.
## New Requirements Surfaced
None.
## Requirements Invalidated or Re-scoped
None.
## Deviations
None.
## Known Limitations
Messages are local state only — no persistence across page navigations or refreshes. Citation parsing logic is duplicated from ChatPage rather than shared.
## Follow-ups
Consider extracting shared citation parsing into a utility when ChatPage and ChatWidget diverge or need updates. Multi-turn memory (S04) will make the widget conversations more useful.
## Files Created/Modified
- `frontend/src/components/ChatWidget.tsx` — New floating chat widget component with streaming SSE, citations, suggested questions, typing indicator
- `frontend/src/components/ChatWidget.module.css` — CSS module for chat widget — 38 custom property refs, responsive layout, animations
- `frontend/src/pages/CreatorDetail.tsx` — Added ChatWidget import and mount with creator name + techniques props

View file

@ -0,0 +1,60 @@
# S03: [A] Chat Widget Shell (UI Only) — UAT
**Milestone:** M022
**Written:** 2026-04-04T07:43:02.824Z
## UAT: Chat Widget Shell (UI Only)
### Preconditions
- Frontend built and running (e.g., `http://ub01:8096`)
- At least one creator with techniques exists in the database
### Test 1: Chat Bubble Visibility
1. Navigate to any creator detail page (e.g., `/creators/copycatt`)
2. **Expected:** A circular chat bubble icon is visible in the bottom-right corner of the viewport
3. Scroll down — bubble remains fixed in position
### Test 2: Panel Open/Close
1. Click the chat bubble
2. **Expected:** Panel slides up from the bubble — shows creator name in header, close button, input field, and 3 suggested questions
3. Click the close button (×)
4. **Expected:** Panel collapses back to bubble
5. Click bubble again to reopen, press Escape key
6. **Expected:** Panel collapses back to bubble
### Test 3: Suggested Questions
1. Open the chat panel on a creator page
2. **Expected:** 3 suggested questions displayed, derived from the creator's technique titles/categories
3. Click one suggested question
4. **Expected:** Question appears as a user message, streaming response begins
### Test 4: Streaming Response
1. Type a question in the input field and press Enter (or click send)
2. **Expected:** Input is disabled during streaming. Typing indicator (three bouncing dots) appears. Tokens render incrementally with a blinking cursor
3. Wait for response to complete
4. **Expected:** Typing indicator disappears. Input re-enables. If citations exist, superscript links appear in the response text. Sources list appears below the response
### Test 5: Citation Links
1. After receiving a response with citations (e.g., ask about a specific technique)
2. **Expected:** Citation numbers render as superscript links (e.g., ¹, ²)
3. Click a citation link
4. **Expected:** Navigates to the referenced technique page
### Test 6: Multi-Message Conversation
1. Send 3+ messages in sequence
2. **Expected:** All messages and responses accumulate in a scrollable list. Panel auto-scrolls to latest message
### Test 7: Responsive Layout
1. Resize browser to < 640px width (or use mobile viewport)
2. Open the chat panel
3. **Expected:** Panel is full-width instead of 400px fixed width
### Test 8: Creator Scoping
1. Open chat on Creator A's page, ask a question
2. Navigate to Creator B's page
3. **Expected:** Chat bubble appears. Opening it shows a fresh state (no messages from Creator A). Suggested questions reflect Creator B's techniques
### Edge Cases
- **Empty input:** Pressing Enter with no text should not send a message
- **Rapid clicks:** Clicking the bubble rapidly should toggle open/close without glitches
- **No techniques:** If a creator somehow has zero techniques, suggested questions section should handle gracefully

View file

@ -0,0 +1,30 @@
{
"schemaVersion": 1,
"taskId": "T01",
"unitId": "M022/S03/T01",
"timestamp": 1775288520209,
"passed": false,
"discoverySource": "task-plan",
"checks": [
{
"command": "cd frontend",
"exitCode": 0,
"durationMs": 6,
"verdict": "pass"
},
{
"command": "npx tsc --noEmit",
"exitCode": 1,
"durationMs": 882,
"verdict": "fail"
},
{
"command": "npm run build",
"exitCode": 254,
"durationMs": 96,
"verdict": "fail"
}
],
"retryAttempt": 1,
"maxRetries": 2
}

View file

@ -1,6 +1,14 @@
# S04: [B] Multi-Turn Conversation Memory
**Goal:** Add conversation memory to chat engine — Redis-backed context windows with TTL
**Goal:** Multi-turn conversations maintain context across messages using Redis-backed history, with conversation_id threading through API and both frontend chat surfaces.
**Demo:** After this: Multi-turn conversations maintain context across messages using Redis-backed history
## Tasks
- [x] **T01: Added multi-turn conversation memory with Redis-backed history (10-pair cap, 1h TTL), conversation_id threading through API and SSE done event, and 7 new tests** — Add multi-turn conversation memory to the backend. The ChatRequest model gains an optional `conversation_id` field. ChatService.stream_response() loads history from Redis before the LLM call, injects it into the messages array, accumulates the full assistant response during streaming, and saves both user+assistant messages to Redis after the done event. History is capped at 10 turn pairs. Redis key is `chrysopedia:chat:{conversation_id}` as a JSON string with 1-hour TTL refreshed on each interaction. The `done` SSE event includes the `conversation_id`. Omitting `conversation_id` preserves existing single-turn behavior. Includes 6+ new tests using the existing standalone ASGI test pattern with mocked Redis.
- Estimate: 1.5h
- Files: backend/routers/chat.py, backend/chat_service.py, backend/tests/test_chat.py
- Verify: cd backend && python -m pytest tests/test_chat.py -v 2>&1 | tail -30
- [ ] **T02: Thread conversation_id through frontend chat API and both chat UIs** — Update the frontend to support multi-turn conversations. (1) `streamChat()` in `api/chat.ts` gains a `conversationId` param sent in the POST body and parsed from the `done` event. (2) ChatWidget already has a local messages array — add `conversationId` state (generated on first send via `crypto.randomUUID()`), thread it through `streamChat()` calls, update it from `done` event response, reset on panel close. (3) ChatPage converts from single-response to multi-message UI: replace single `responseText`/`sources` state with a `messages[]` array (same shape as ChatWidget), add `conversationId` state, render message history with existing `parseChatCitations()`, add a 'New conversation' button. Reuse ChatWidget's message rendering pattern.
- Estimate: 1.5h
- Files: frontend/src/api/chat.ts, frontend/src/components/ChatWidget.tsx, frontend/src/pages/ChatPage.tsx, frontend/src/pages/ChatPage.module.css
- Verify: cd frontend && npm run build 2>&1 | tail -20

View file

@ -0,0 +1,101 @@
# S04 Research: Multi-Turn Conversation Memory
## Summary
The current chat system is single-turn: each request sends `[system_prompt, user_message]` to the LLM with no prior context. Both the ChatWidget (creator-scoped floating panel) and ChatPage (standalone page) discard history on each submission. Adding multi-turn memory requires: (1) Redis-backed conversation history storage, (2) conversation_id threading through the API, (3) injecting prior turns into the LLM messages array, (4) frontend conversation_id management.
This is a well-understood pattern applied to existing, stable infrastructure. Redis is already in the stack (used for caching and pipeline data). The OpenAI-compatible API accepts standard `messages` arrays. Risk is low.
## Recommendation
Implement Redis-backed conversation memory with these design choices:
- **Conversation ID**: UUID4 generated client-side, sent as optional field in ChatRequest. Absent = single-turn (backward compatible).
- **Storage**: Redis list at `chrysopedia:chat:{conversation_id}` — each entry is a JSON object `{role, content}`. TTL of 1 hour (conversations are ephemeral, not archival).
- **History injection**: Before calling the LLM, load history from Redis, prepend to messages array as `[system, ...history, user]`. Cap at last N turns (e.g., 10 message pairs = 20 messages) to stay within context window limits.
- **Write-after-stream**: After LLM streaming completes (done event), append both the user message and full assistant response to Redis. This means partial/errored responses don't pollute history.
- **Context refresh**: Re-run search for each turn (not cached from prior turns). The system prompt with fresh sources changes each turn — only the conversation flow persists.
## Implementation Landscape
### Files to Modify
**Backend (core changes):**
| File | What Changes |
|------|-------------|
| `backend/chat_service.py` | Add `conversation_id` param to `stream_response()`. Load history from Redis before LLM call. Accumulate full response text during streaming. Append user+assistant messages to Redis after done. |
| `backend/routers/chat.py` | Add `conversation_id: str | None = None` to `ChatRequest`. Pass to service. Return `conversation_id` in done event so frontend can track it. |
| `backend/redis_client.py` | No changes needed — `get_redis()` already returns async client. |
**Frontend (conversation threading):**
| File | What Changes |
|------|-------------|
| `frontend/src/api/chat.ts` | Add `conversation_id` to request body and `streamChat()` params. Parse `conversation_id` from done event. |
| `frontend/src/components/ChatWidget.tsx` | Generate `conversation_id` (uuid) on first message send. Thread through subsequent `streamChat()` calls. Reset on panel close or explicit "New conversation" action. |
| `frontend/src/pages/ChatPage.tsx` | Convert from single-response to multi-message UI (similar to ChatWidget's existing message array pattern). Generate and thread `conversation_id`. |
**Tests:**
| File | What Changes |
|------|-------------|
| `backend/tests/test_chat.py` | Add tests for: conversation_id round-trip, history loaded from Redis, history capped at max turns, no conversation_id = single-turn, TTL set on Redis key. |
### Key Code Points
**ChatService.stream_response()** (backend/chat_service.py:56-117): The LLM call at line 96-105 currently sends exactly 2 messages. Multi-turn adds history between system and user:
```python
messages=[
{"role": "system", "content": system_prompt},
*history, # ← injected from Redis
{"role": "user", "content": query},
]
```
**ChatRequest** (backend/routers/chat.py:24-27): Add optional `conversation_id` field. Pydantic validates it.
**streamChat()** (frontend/src/api/chat.ts:42): Already accepts `query` and `creator`. Add `conversationId` param, include in POST body.
**ChatWidget messages state** (frontend/src/components/ChatWidget.tsx:115): Already maintains a local `messages[]` array with user/assistant roles. The widget already looks multi-turn visually — it just doesn't send history to the backend. Threading `conversation_id` through `streamChat()` calls is the only change needed here.
**ChatPage** (frontend/src/pages/ChatPage.tsx): Currently single-response (one query → one response, replacing previous). Needs conversion to a message-list UI. Can reuse the ChatWidget's message pattern (it's ~30 lines of state management + map rendering). This is the largest frontend change.
### Redis Key Design
```
Key: chrysopedia:chat:{conversation_id}
Type: String (JSON array of message objects)
TTL: 3600 seconds (1 hour)
Value: [{"role":"user","content":"..."}, {"role":"assistant","content":"..."}]
```
Using a single JSON string (not Redis list) because:
- Reads and writes are atomic (no partial list corruption)
- Matches the existing Redis usage pattern in the codebase (JSON string with `set`/`get`)
- History is always loaded in full, never by index
### Constraints & Guardrails
- **Context window**: The LLM model (`fyn-llm-agent-chat`) has a token limit. Cap history at 10 turn pairs (20 messages). If history exceeds this, keep only the most recent N pairs.
- **System prompt changes per turn**: Each turn re-runs search and rebuilds the system prompt with fresh sources. This means the system prompt isn't in history — it's always current. Only user/assistant pairs are stored.
- **Backward compatibility**: `conversation_id` is optional. Omitting it produces the existing single-turn behavior. No migration needed.
- **No authentication coupling**: Conversations are keyed by UUID, not user. Anyone with the conversation_id can continue it. This is fine for the current single-admin + public-read model.
- **TTL refresh on each interaction**: Reset TTL on every read/write so active conversations don't expire mid-use.
### Natural Task Seams
1. **Backend conversation memory** — ChatService + Redis history (load/save/cap). ChatRequest model update. This is the core slice and should be built first. Verifiable with curl/tests.
2. **Backend tests** — Standalone ASGI tests with mocked Redis (existing test pattern). Covers round-trip, history injection, cap, TTL.
3. **Frontend conversation threading** — ChatWidget gets `conversationId` state, ChatPage converts to multi-message UI, `streamChat()` API updated. Verifiable with build + manual test.
### Dependencies
- `uuid` — Python stdlib, already available
- `redis.asyncio` — already installed and used (`backend/redis_client.py`)
- Frontend: no new dependencies. `crypto.randomUUID()` available in all modern browsers for generating conversation IDs.
### Don't Hand-Roll
- **Redis conversation store**: Don't build a custom abstraction layer. Direct `redis.get`/`redis.set` with JSON is the established pattern in this codebase (see `routers/search.py:192-198`, `routers/techniques.py:132-138`). Keep it consistent.
- **Message history format**: Don't invent a custom format. Use the OpenAI messages array format directly `[{role, content}]` — it maps 1:1 to what the LLM API expects.

View file

@ -0,0 +1,26 @@
---
estimated_steps: 1
estimated_files: 3
skills_used: []
---
# T01: Add Redis-backed conversation memory to ChatService and chat router
Add multi-turn conversation memory to the backend. The ChatRequest model gains an optional `conversation_id` field. ChatService.stream_response() loads history from Redis before the LLM call, injects it into the messages array, accumulates the full assistant response during streaming, and saves both user+assistant messages to Redis after the done event. History is capped at 10 turn pairs. Redis key is `chrysopedia:chat:{conversation_id}` as a JSON string with 1-hour TTL refreshed on each interaction. The `done` SSE event includes the `conversation_id`. Omitting `conversation_id` preserves existing single-turn behavior. Includes 6+ new tests using the existing standalone ASGI test pattern with mocked Redis.
## Inputs
- ``backend/chat_service.py` — current ChatService.stream_response() with single-turn LLM call`
- ``backend/routers/chat.py` — current ChatRequest model and chat endpoint`
- ``backend/tests/test_chat.py` — existing test patterns (standalone ASGI client, mock OpenAI stream, SSE parser)`
- ``backend/redis_client.py` — get_redis() async client helper`
## Expected Output
- ``backend/routers/chat.py` — ChatRequest with optional conversation_id field`
- ``backend/chat_service.py` — stream_response() with Redis history load/save, turn cap, TTL refresh, response accumulation`
- ``backend/tests/test_chat.py` — 6+ new tests for conversation memory (round-trip, history injection, cap, TTL, single-turn fallback, done event includes conversation_id)`
## Verification
cd backend && python -m pytest tests/test_chat.py -v 2>&1 | tail -30

View file

@ -0,0 +1,80 @@
---
id: T01
parent: S04
milestone: M022
provides: []
requires: []
affects: []
key_files: ["backend/chat_service.py", "backend/routers/chat.py", "backend/tests/test_chat.py"]
key_decisions: ["Redis injected via constructor param rather than Depends for simpler testing", "Auto-generate UUID conversation_id when omitted for consistent done event shape", "History stored as single JSON string with list slice cap rather than Redis list type"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "cd backend && python -m pytest tests/test_chat.py -v — 13 passed in 0.74s (6 existing + 7 new conversation memory tests)"
completed_at: 2026-04-04T07:50:23.566Z
blocker_discovered: false
---
# T01: Added multi-turn conversation memory with Redis-backed history (10-pair cap, 1h TTL), conversation_id threading through API and SSE done event, and 7 new tests
> Added multi-turn conversation memory with Redis-backed history (10-pair cap, 1h TTL), conversation_id threading through API and SSE done event, and 7 new tests
## What Happened
---
id: T01
parent: S04
milestone: M022
key_files:
- backend/chat_service.py
- backend/routers/chat.py
- backend/tests/test_chat.py
key_decisions:
- Redis injected via constructor param rather than Depends for simpler testing
- Auto-generate UUID conversation_id when omitted for consistent done event shape
- History stored as single JSON string with list slice cap rather than Redis list type
duration: ""
verification_result: passed
completed_at: 2026-04-04T07:50:23.566Z
blocker_discovered: false
---
# T01: Added multi-turn conversation memory with Redis-backed history (10-pair cap, 1h TTL), conversation_id threading through API and SSE done event, and 7 new tests
**Added multi-turn conversation memory with Redis-backed history (10-pair cap, 1h TTL), conversation_id threading through API and SSE done event, and 7 new tests**
## What Happened
Updated ChatService with _load_history/_save_history methods using Redis JSON storage keyed by chrysopedia:chat:{conversation_id}. History is injected between system prompt and current user message in the LLM messages array. Response is accumulated during streaming and saved with the user message after the done event. History capped at 10 turn pairs, TTL refreshed to 1 hour on each interaction. ChatRequest model gained optional conversation_id field. Auto-generates UUID when omitted. Done SSE event now includes conversation_id for frontend threading. Added mock_redis fixture and 7 new tests covering round-trip save, history injection, cap enforcement, TTL refresh, auto-ID generation, and single-turn fallback.
## Verification
cd backend && python -m pytest tests/test_chat.py -v — 13 passed in 0.74s (6 existing + 7 new conversation memory tests)
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `cd backend && python -m pytest tests/test_chat.py -v` | 0 | ✅ pass | 4500ms |
## Deviations
Removed _get_chat_service Depends factory in favor of direct construction with Redis injection. chat_client fixture updated to accept mock_redis and patch routers.chat.get_redis.
## Known Issues
None.
## Files Created/Modified
- `backend/chat_service.py`
- `backend/routers/chat.py`
- `backend/tests/test_chat.py`
## Deviations
Removed _get_chat_service Depends factory in favor of direct construction with Redis injection. chat_client fixture updated to accept mock_redis and patch routers.chat.get_redis.
## Known Issues
None.

View file

@ -0,0 +1,28 @@
---
estimated_steps: 1
estimated_files: 4
skills_used: []
---
# T02: Thread conversation_id through frontend chat API and both chat UIs
Update the frontend to support multi-turn conversations. (1) `streamChat()` in `api/chat.ts` gains a `conversationId` param sent in the POST body and parsed from the `done` event. (2) ChatWidget already has a local messages array — add `conversationId` state (generated on first send via `crypto.randomUUID()`), thread it through `streamChat()` calls, update it from `done` event response, reset on panel close. (3) ChatPage converts from single-response to multi-message UI: replace single `responseText`/`sources` state with a `messages[]` array (same shape as ChatWidget), add `conversationId` state, render message history with existing `parseChatCitations()`, add a 'New conversation' button. Reuse ChatWidget's message rendering pattern.
## Inputs
- ``frontend/src/api/chat.ts` — current streamChat() function and ChatCallbacks interface`
- ``frontend/src/components/ChatWidget.tsx` — existing multi-message UI with local messages[] state`
- ``frontend/src/pages/ChatPage.tsx` — current single-response ChatPage`
- ``frontend/src/pages/ChatPage.module.css` — existing ChatPage styles`
- ``backend/routers/chat.py` — updated ChatRequest with conversation_id (from T01)`
## Expected Output
- ``frontend/src/api/chat.ts` — streamChat() accepts conversationId, sends in body, parses from done event`
- ``frontend/src/components/ChatWidget.tsx` — conversation_id state threaded through messages, reset on close`
- ``frontend/src/pages/ChatPage.tsx` — multi-message UI with conversation threading and new-conversation button`
- ``frontend/src/pages/ChatPage.module.css` — styles for multi-message layout`
## Verification
cd frontend && npm run build 2>&1 | tail -20

View file

@ -3,6 +3,12 @@
Assembles a numbered context block from search results, then streams
completion tokens from an OpenAI-compatible API. Yields SSE-formatted
events: sources, token, done, and error.
Multi-turn memory: When a conversation_id is provided, prior messages are
loaded from Redis, injected into the LLM messages array, and the new
user+assistant turn is appended after streaming completes. History is
capped at 10 turn pairs (20 messages) and expires after 1 hour of
inactivity.
"""
from __future__ import annotations
@ -11,6 +17,7 @@ import json
import logging
import time
import traceback
import uuid
from typing import Any, AsyncIterator
import openai
@ -32,35 +39,86 @@ Sources:
"""
_MAX_CONTEXT_SOURCES = 10
_MAX_TURN_PAIRS = 10
_HISTORY_TTL_SECONDS = 3600 # 1 hour
def _redis_key(conversation_id: str) -> str:
return f"chrysopedia:chat:{conversation_id}"
class ChatService:
"""Retrieve context from search, stream an LLM response with citations."""
def __init__(self, settings: Settings) -> None:
def __init__(self, settings: Settings, redis=None) -> None:
self.settings = settings
self._search = SearchService(settings)
self._openai = openai.AsyncOpenAI(
base_url=settings.llm_api_url,
api_key=settings.llm_api_key,
)
self._redis = redis
async def _load_history(self, conversation_id: str) -> list[dict[str, str]]:
"""Load conversation history from Redis. Returns empty list on miss."""
if not self._redis:
return []
try:
raw = await self._redis.get(_redis_key(conversation_id))
if raw:
return json.loads(raw)
except Exception:
logger.warning("chat_history_load_error cid=%s", conversation_id, exc_info=True)
return []
async def _save_history(
self,
conversation_id: str,
history: list[dict[str, str]],
user_msg: str,
assistant_msg: str,
) -> None:
"""Append the new turn pair and persist to Redis with TTL refresh."""
if not self._redis:
return
history.append({"role": "user", "content": user_msg})
history.append({"role": "assistant", "content": assistant_msg})
# Cap at _MAX_TURN_PAIRS (keep most recent)
if len(history) > _MAX_TURN_PAIRS * 2:
history = history[-_MAX_TURN_PAIRS * 2:]
try:
await self._redis.set(
_redis_key(conversation_id),
json.dumps(history),
ex=_HISTORY_TTL_SECONDS,
)
except Exception:
logger.warning("chat_history_save_error cid=%s", conversation_id, exc_info=True)
async def stream_response(
self,
query: str,
db: AsyncSession,
creator: str | None = None,
conversation_id: str | None = None,
) -> AsyncIterator[str]:
"""Yield SSE-formatted events for a chat query.
Protocol:
1. ``event: sources\ndata: <json array of citation metadata>\n\n``
2. ``event: token\ndata: <text chunk>\n\n`` (repeated)
3. ``event: done\ndata: <json with cascade_tier>\n\n``
3. ``event: done\ndata: <json with cascade_tier, conversation_id>\n\n``
On error: ``event: error\ndata: <json with message>\n\n``
"""
start = time.monotonic()
# Assign conversation_id if not provided (single-turn becomes trackable)
if conversation_id is None:
conversation_id = str(uuid.uuid4())
# ── 0. Load conversation history ────────────────────────────────
history = await self._load_history(conversation_id)
# ── 1. Retrieve context via search ──────────────────────────────
try:
search_result = await self._search.search(
@ -83,8 +141,8 @@ class ChatService:
context_block = _build_context_block(items)
logger.info(
"chat_search query=%r creator=%r cascade_tier=%s source_count=%d",
query, creator, cascade_tier, len(sources),
"chat_search query=%r creator=%r cascade_tier=%s source_count=%d cid=%s",
query, creator, cascade_tier, len(sources), conversation_id,
)
# Emit sources event first
@ -93,13 +151,19 @@ class ChatService:
# ── 3. Stream LLM completion ────────────────────────────────────
system_prompt = _SYSTEM_PROMPT_TEMPLATE.format(context_block=context_block)
messages: list[dict[str, str]] = [
{"role": "system", "content": system_prompt},
]
# Inject conversation history between system prompt and current query
messages.extend(history)
messages.append({"role": "user", "content": query})
accumulated_response = ""
try:
stream = await self._openai.chat.completions.create(
model=self.settings.llm_model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query},
],
messages=messages,
stream=True,
temperature=0.3,
max_tokens=2048,
@ -108,21 +172,26 @@ class ChatService:
async for chunk in stream:
choice = chunk.choices[0] if chunk.choices else None
if choice and choice.delta and choice.delta.content:
yield _sse("token", choice.delta.content)
text = choice.delta.content
accumulated_response += text
yield _sse("token", text)
except Exception:
tb = traceback.format_exc()
logger.error("chat_llm_error query=%r\n%s", query, tb)
logger.error("chat_llm_error query=%r cid=%s\n%s", query, conversation_id, tb)
yield _sse("error", {"message": "LLM generation failed"})
return
# ── 4. Done event ───────────────────────────────────────────────
# ── 4. Save conversation history ────────────────────────────────
await self._save_history(conversation_id, history, query, accumulated_response)
# ── 5. Done event ───────────────────────────────────────────────
latency_ms = (time.monotonic() - start) * 1000
logger.info(
"chat_done query=%r creator=%r cascade_tier=%s source_count=%d latency_ms=%.1f",
query, creator, cascade_tier, len(sources), latency_ms,
"chat_done query=%r creator=%r cascade_tier=%s source_count=%d latency_ms=%.1f cid=%s",
query, creator, cascade_tier, len(sources), latency_ms, conversation_id,
)
yield _sse("done", {"cascade_tier": cascade_tier})
yield _sse("done", {"cascade_tier": cascade_tier, "conversation_id": conversation_id})
# ── Helpers ──────────────────────────────────────────────────────────────────

View file

@ -16,6 +16,7 @@ from sqlalchemy.ext.asyncio import AsyncSession
from chat_service import ChatService
from config import Settings, get_settings
from database import get_session
from redis_client import get_redis
logger = logging.getLogger("chrysopedia.chat.router")
@ -27,31 +28,35 @@ class ChatRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=1000)
creator: str | None = None
def _get_chat_service(settings: Settings = Depends(get_settings)) -> ChatService:
"""Build a ChatService from current settings."""
return ChatService(settings)
conversation_id: str | None = None
@router.post("")
async def chat(
body: ChatRequest,
db: AsyncSession = Depends(get_session),
service: ChatService = Depends(_get_chat_service),
settings: Settings = Depends(get_settings),
) -> StreamingResponse:
"""Stream a chat response as Server-Sent Events.
SSE protocol:
- ``event: sources`` citation metadata array (sent first)
- ``event: token`` streamed text chunk (repeated)
- ``event: done`` completion metadata with cascade_tier
- ``event: done`` completion metadata with cascade_tier, conversation_id
- ``event: error`` error message (on failure)
"""
logger.info("chat_request query=%r creator=%r", body.query, body.creator)
logger.info("chat_request query=%r creator=%r cid=%r", body.query, body.creator, body.conversation_id)
redis = await get_redis()
service = ChatService(settings, redis=redis)
return StreamingResponse(
service.stream_response(query=body.query, db=db, creator=body.creator),
service.stream_response(
query=body.query,
db=db,
creator=body.creator,
conversation_id=body.conversation_id,
),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",

View file

@ -6,6 +6,7 @@ Mocks SearchService.search() and the OpenAI streaming response to verify:
3. Creator param forwarded to search
4. Empty/invalid query returns 422
5. LLM error produces an SSE error event
6. Multi-turn conversation memory via Redis (load/save/cap/TTL)
These tests use a standalone ASGI client that does NOT require a running
PostgreSQL instance the DB session dependency is overridden with a mock.
@ -33,8 +34,28 @@ from main import app # noqa: E402
# ── Standalone test client (no DB required) ──────────────────────────────────
@pytest_asyncio.fixture()
async def chat_client():
"""Async HTTP test client that mocks out the DB session entirely."""
async def mock_redis():
"""In-memory mock of an async Redis client (get/set)."""
store: dict[str, tuple[str, int | None]] = {}
mock = AsyncMock()
async def _get(key):
entry = store.get(key)
return entry[0] if entry else None
async def _set(key, value, ex=None):
store[key] = (value, ex)
mock.get = AsyncMock(side_effect=_get)
mock.set = AsyncMock(side_effect=_set)
mock._store = store # exposed for test assertions
return mock
@pytest_asyncio.fixture()
async def chat_client(mock_redis):
"""Async HTTP test client that mocks out the DB session and Redis."""
mock_session = AsyncMock()
async def _mock_get_session():
@ -44,7 +65,8 @@ async def chat_client():
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://testserver") as ac:
yield ac
with patch("routers.chat.get_redis", new_callable=AsyncMock, return_value=mock_redis):
yield ac
app.dependency_overrides.pop(get_session, None)
@ -298,3 +320,246 @@ async def test_chat_llm_error_produces_error_event(chat_client):
error_event = next(e for e in events if e["event"] == "error")
assert "message" in error_event["data"]
# ── Multi-turn conversation memory tests ─────────────────────────────────────
def _chat_request_with_mocks(query, conversation_id=None, creator=None, token_chunks=None):
"""Helper: build a request JSON and patched mocks for a single chat call."""
if token_chunks is None:
token_chunks = ["ok"]
body: dict[str, Any] = {"query": query}
if conversation_id is not None:
body["conversation_id"] = conversation_id
if creator is not None:
body["creator"] = creator
return body, token_chunks
@pytest.mark.asyncio
async def test_conversation_done_event_includes_conversation_id(chat_client):
"""The done SSE event includes conversation_id — both when provided and auto-generated."""
search_result = _fake_search_result()
mock_openai_client = MagicMock()
mock_openai_client.chat.completions.create = AsyncMock(
return_value=_mock_openai_stream(["hello"])
)
# With explicit conversation_id
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post(
"/api/v1/chat",
json={"query": "test", "conversation_id": "conv-abc-123"},
)
events = _parse_sse(resp.text)
done_data = next(e for e in events if e["event"] == "done")["data"]
assert done_data["conversation_id"] == "conv-abc-123"
@pytest.mark.asyncio
async def test_conversation_auto_generates_id_when_omitted(chat_client):
"""When conversation_id is not provided, the done event still includes one (auto-generated)."""
search_result = _fake_search_result()
mock_openai_client = MagicMock()
mock_openai_client.chat.completions.create = AsyncMock(
return_value=_mock_openai_stream(["world"])
)
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post("/api/v1/chat", json={"query": "no cid"})
events = _parse_sse(resp.text)
done_data = next(e for e in events if e["event"] == "done")["data"]
# Auto-generated UUID format
cid = done_data["conversation_id"]
assert isinstance(cid, str)
assert len(cid) == 36 # UUID4 format
@pytest.mark.asyncio
async def test_conversation_history_saved_to_redis(chat_client, mock_redis):
"""After a successful chat, user+assistant messages are saved to Redis."""
search_result = _fake_search_result()
mock_openai_client = MagicMock()
mock_openai_client.chat.completions.create = AsyncMock(
return_value=_mock_openai_stream(["The answer ", "is 42."])
)
cid = "conv-save-test"
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post(
"/api/v1/chat",
json={"query": "what is the answer", "conversation_id": cid},
)
assert resp.status_code == 200
# Verify Redis received the history
redis_key = f"chrysopedia:chat:{cid}"
assert redis_key in mock_redis._store
stored_json, ttl = mock_redis._store[redis_key]
history = json.loads(stored_json)
assert len(history) == 2
assert history[0] == {"role": "user", "content": "what is the answer"}
assert history[1] == {"role": "assistant", "content": "The answer is 42."}
assert ttl == 3600
@pytest.mark.asyncio
async def test_conversation_history_injected_into_llm_messages(chat_client, mock_redis):
"""Prior conversation history is injected between system prompt and user message."""
search_result = _fake_search_result()
# Pre-populate Redis with conversation history
cid = "conv-inject-test"
prior_history = [
{"role": "user", "content": "what is reverb"},
{"role": "assistant", "content": "Reverb simulates acoustic spaces."},
]
mock_redis._store[f"chrysopedia:chat:{cid}"] = (json.dumps(prior_history), 3600)
captured_messages = []
mock_openai_client = MagicMock()
async def _capture_create(**kwargs):
captured_messages.extend(kwargs.get("messages", []))
return _mock_openai_stream(["follow-up answer"])
mock_openai_client.chat.completions.create = AsyncMock(side_effect=_capture_create)
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post(
"/api/v1/chat",
json={"query": "how do I use it on drums", "conversation_id": cid},
)
assert resp.status_code == 200
# Messages should be: system, prior_user, prior_assistant, current_user
assert len(captured_messages) == 4
assert captured_messages[0]["role"] == "system"
assert captured_messages[1] == {"role": "user", "content": "what is reverb"}
assert captured_messages[2] == {"role": "assistant", "content": "Reverb simulates acoustic spaces."}
assert captured_messages[3] == {"role": "user", "content": "how do I use it on drums"}
@pytest.mark.asyncio
async def test_conversation_history_capped_at_10_pairs(chat_client, mock_redis):
"""History is capped at 10 turn pairs (20 messages). Oldest turns are dropped."""
search_result = _fake_search_result()
cid = "conv-cap-test"
# Pre-populate with 10 turn pairs (20 messages) — at the cap
prior_history = []
for i in range(10):
prior_history.append({"role": "user", "content": f"question {i}"})
prior_history.append({"role": "assistant", "content": f"answer {i}"})
mock_redis._store[f"chrysopedia:chat:{cid}"] = (json.dumps(prior_history), 3600)
mock_openai_client = MagicMock()
mock_openai_client.chat.completions.create = AsyncMock(
return_value=_mock_openai_stream(["capped reply"])
)
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post(
"/api/v1/chat",
json={"query": "turn 11", "conversation_id": cid},
)
assert resp.status_code == 200
# After adding turn 11, history should still be 20 messages (10 pairs)
redis_key = f"chrysopedia:chat:{cid}"
stored_json, _ = mock_redis._store[redis_key]
history = json.loads(stored_json)
assert len(history) == 20 # 10 pairs
# Oldest pair (question 0 / answer 0) should be dropped
assert history[0] == {"role": "user", "content": "question 1"}
# New pair should be at the end
assert history[-2] == {"role": "user", "content": "turn 11"}
assert history[-1] == {"role": "assistant", "content": "capped reply"}
@pytest.mark.asyncio
async def test_conversation_ttl_refreshed_on_interaction(chat_client, mock_redis):
"""Each interaction refreshes the Redis TTL to 1 hour."""
search_result = _fake_search_result()
cid = "conv-ttl-test"
# Pre-populate with a short simulated TTL
prior_history = [
{"role": "user", "content": "old message"},
{"role": "assistant", "content": "old reply"},
]
mock_redis._store[f"chrysopedia:chat:{cid}"] = (json.dumps(prior_history), 100)
mock_openai_client = MagicMock()
mock_openai_client.chat.completions.create = AsyncMock(
return_value=_mock_openai_stream(["refreshed"])
)
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post(
"/api/v1/chat",
json={"query": "refresh test", "conversation_id": cid},
)
assert resp.status_code == 200
# TTL should be refreshed to 3600
redis_key = f"chrysopedia:chat:{cid}"
_, ttl = mock_redis._store[redis_key]
assert ttl == 3600
@pytest.mark.asyncio
async def test_single_turn_fallback_no_redis_history(chat_client, mock_redis):
"""When no conversation_id is provided, no history is loaded and behavior matches single-turn."""
search_result = _fake_search_result()
captured_messages = []
mock_openai_client = MagicMock()
async def _capture_create(**kwargs):
captured_messages.extend(kwargs.get("messages", []))
return _mock_openai_stream(["standalone answer"])
mock_openai_client.chat.completions.create = AsyncMock(side_effect=_capture_create)
with (
patch("chat_service.SearchService.search", new_callable=AsyncMock, return_value=search_result),
patch("chat_service.openai.AsyncOpenAI", return_value=mock_openai_client),
):
resp = await chat_client.post("/api/v1/chat", json={"query": "standalone question"})
assert resp.status_code == 200
# Should only have system + user (no history injected since auto-generated cid is fresh)
assert len(captured_messages) == 2
assert captured_messages[0]["role"] == "system"
assert captured_messages[1]["role"] == "user"