test: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, resp…

- "backend/chat_service.py"

GSD-Task: S09/T02
This commit is contained in:
jlightner 2026-04-04 14:45:09 +00:00
parent 846db2aad5
commit 3cbb614654
4 changed files with 107 additions and 5 deletions

View file

@ -20,7 +20,7 @@ Steps:
- Estimate: 2h - Estimate: 2h
- Files: backend/pipeline/quality/chat_scorer.py, backend/pipeline/quality/chat_eval.py, backend/pipeline/quality/fixtures/chat_test_suite.yaml, backend/pipeline/quality/__main__.py - Files: backend/pipeline/quality/chat_scorer.py, backend/pipeline/quality/chat_eval.py, backend/pipeline/quality/fixtures/chat_test_suite.yaml, backend/pipeline/quality/__main__.py
- Verify: cd backend && python -c 'from pipeline.quality.chat_scorer import ChatScoreRunner, ChatScoreResult; from pipeline.quality.chat_eval import ChatEvalRunner; print("OK")' - Verify: cd backend && python -c 'from pipeline.quality.chat_scorer import ChatScoreRunner, ChatScoreResult; from pipeline.quality.chat_eval import ChatEvalRunner; print("OK")'
- [ ] **T02: Refine chat system prompt and verify no test regressions** — Improve the `_SYSTEM_PROMPT_TEMPLATE` in `backend/chat_service.py` based on the gaps identified in research: the current prompt is 5 lines with no guidance on citation density, response structure, domain awareness, conflicting source handling, or response length. - [x] **T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged** — Improve the `_SYSTEM_PROMPT_TEMPLATE` in `backend/chat_service.py` based on the gaps identified in research: the current prompt is 5 lines with no guidance on citation density, response structure, domain awareness, conflicting source handling, or response length.
The refined prompt should: The refined prompt should:
- Guide citation density: cite every factual claim, prefer inline citations [N] immediately after the claim - Guide citation density: cite every factual claim, prefer inline citations [N] immediately after the claim

View file

@ -0,0 +1,16 @@
{
"schemaVersion": 1,
"taskId": "T01",
"unitId": "M025/S09/T01",
"timestamp": 1775313832904,
"passed": true,
"discoverySource": "task-plan",
"checks": [
{
"command": "cd backend",
"exitCode": 0,
"durationMs": 14,
"verdict": "pass"
}
]
}

View file

@ -0,0 +1,74 @@
---
id: T02
parent: S09
milestone: M025
provides: []
requires: []
affects: []
key_files: ["backend/chat_service.py"]
key_decisions: ["Kept prompt under 20 lines using markdown headers for structure rather than prose paragraphs"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "cd backend && python -m pytest tests/test_chat.py -v — 26 passed in 1.37s"
completed_at: 2026-04-04T14:45:01.092Z
blocker_discovered: false
---
# T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
> Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
## What Happened
---
id: T02
parent: S09
milestone: M025
key_files:
- backend/chat_service.py
key_decisions:
- Kept prompt under 20 lines using markdown headers for structure rather than prose paragraphs
duration: ""
verification_result: passed
completed_at: 2026-04-04T14:45:01.092Z
blocker_discovered: false
---
# T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
**Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged**
## What Happened
Replaced the 5-line system prompt with a structured prompt addressing citation density, response format, domain terminology, conflicting source handling, and response length. No test changes needed — all 26 tests verify behavioral properties, not prompt wording.
## Verification
cd backend && python -m pytest tests/test_chat.py -v — 26 passed in 1.37s
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `cd backend && python -m pytest tests/test_chat.py -v` | 0 | ✅ pass | 1370ms |
## Deviations
None.
## Known Issues
None.
## Files Created/Modified
- `backend/chat_service.py`
## Deviations
None.
## Known Issues
None.

View file

@ -31,10 +31,22 @@ from search_service import SearchService
logger = logging.getLogger("chrysopedia.chat") logger = logging.getLogger("chrysopedia.chat")
_SYSTEM_PROMPT_TEMPLATE = """\ _SYSTEM_PROMPT_TEMPLATE = """\
You are Chrysopedia, an expert encyclopedic assistant for music production techniques. You are Chrysopedia, an expert assistant for music production techniques \
Answer the user's question using ONLY the numbered sources below. Cite sources by synthesis, sound design, mixing, sampling, and audio processing.
writing [N] inline (e.g. [1], [2]) where N is the source number. If the sources
do not contain enough information, say so honestly do not invent facts. ## Rules
- Use ONLY the numbered sources below. Do not invent facts.
- Cite every factual claim inline with [N] immediately after the claim \
(e.g. "Parallel compression adds sustain [2] while preserving transients [1].").
- When sources disagree, present both perspectives with their citations.
- If the sources lack enough information, say so honestly.
## Response format
- Aim for 24 short paragraphs. Expand only when the question warrants detail.
- Use bullet lists for steps, signal chains, or parameter lists.
- **Bold** key terms on first mention.
- Use audio/synthesis/mixing terminology naturally do not over-explain \
standard concepts (e.g. LFO, sidechain, wet/dry) unless the user asks.
Sources: Sources:
{context_block} {context_block}