test: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, resp…
- "backend/chat_service.py" GSD-Task: S09/T02
This commit is contained in:
parent
846db2aad5
commit
3cbb614654
4 changed files with 107 additions and 5 deletions
|
|
@ -20,7 +20,7 @@ Steps:
|
||||||
- Estimate: 2h
|
- Estimate: 2h
|
||||||
- Files: backend/pipeline/quality/chat_scorer.py, backend/pipeline/quality/chat_eval.py, backend/pipeline/quality/fixtures/chat_test_suite.yaml, backend/pipeline/quality/__main__.py
|
- Files: backend/pipeline/quality/chat_scorer.py, backend/pipeline/quality/chat_eval.py, backend/pipeline/quality/fixtures/chat_test_suite.yaml, backend/pipeline/quality/__main__.py
|
||||||
- Verify: cd backend && python -c 'from pipeline.quality.chat_scorer import ChatScoreRunner, ChatScoreResult; from pipeline.quality.chat_eval import ChatEvalRunner; print("OK")'
|
- Verify: cd backend && python -c 'from pipeline.quality.chat_scorer import ChatScoreRunner, ChatScoreResult; from pipeline.quality.chat_eval import ChatEvalRunner; print("OK")'
|
||||||
- [ ] **T02: Refine chat system prompt and verify no test regressions** — Improve the `_SYSTEM_PROMPT_TEMPLATE` in `backend/chat_service.py` based on the gaps identified in research: the current prompt is 5 lines with no guidance on citation density, response structure, domain awareness, conflicting source handling, or response length.
|
- [x] **T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged** — Improve the `_SYSTEM_PROMPT_TEMPLATE` in `backend/chat_service.py` based on the gaps identified in research: the current prompt is 5 lines with no guidance on citation density, response structure, domain awareness, conflicting source handling, or response length.
|
||||||
|
|
||||||
The refined prompt should:
|
The refined prompt should:
|
||||||
- Guide citation density: cite every factual claim, prefer inline citations [N] immediately after the claim
|
- Guide citation density: cite every factual claim, prefer inline citations [N] immediately after the claim
|
||||||
|
|
|
||||||
16
.gsd/milestones/M025/slices/S09/tasks/T01-VERIFY.json
Normal file
16
.gsd/milestones/M025/slices/S09/tasks/T01-VERIFY.json
Normal file
|
|
@ -0,0 +1,16 @@
|
||||||
|
{
|
||||||
|
"schemaVersion": 1,
|
||||||
|
"taskId": "T01",
|
||||||
|
"unitId": "M025/S09/T01",
|
||||||
|
"timestamp": 1775313832904,
|
||||||
|
"passed": true,
|
||||||
|
"discoverySource": "task-plan",
|
||||||
|
"checks": [
|
||||||
|
{
|
||||||
|
"command": "cd backend",
|
||||||
|
"exitCode": 0,
|
||||||
|
"durationMs": 14,
|
||||||
|
"verdict": "pass"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
74
.gsd/milestones/M025/slices/S09/tasks/T02-SUMMARY.md
Normal file
74
.gsd/milestones/M025/slices/S09/tasks/T02-SUMMARY.md
Normal file
|
|
@ -0,0 +1,74 @@
|
||||||
|
---
|
||||||
|
id: T02
|
||||||
|
parent: S09
|
||||||
|
milestone: M025
|
||||||
|
provides: []
|
||||||
|
requires: []
|
||||||
|
affects: []
|
||||||
|
key_files: ["backend/chat_service.py"]
|
||||||
|
key_decisions: ["Kept prompt under 20 lines using markdown headers for structure rather than prose paragraphs"]
|
||||||
|
patterns_established: []
|
||||||
|
drill_down_paths: []
|
||||||
|
observability_surfaces: []
|
||||||
|
duration: ""
|
||||||
|
verification_result: "cd backend && python -m pytest tests/test_chat.py -v — 26 passed in 1.37s"
|
||||||
|
completed_at: 2026-04-04T14:45:01.092Z
|
||||||
|
blocker_discovered: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
|
||||||
|
|
||||||
|
> Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
|
||||||
|
|
||||||
|
## What Happened
|
||||||
|
---
|
||||||
|
id: T02
|
||||||
|
parent: S09
|
||||||
|
milestone: M025
|
||||||
|
key_files:
|
||||||
|
- backend/chat_service.py
|
||||||
|
key_decisions:
|
||||||
|
- Kept prompt under 20 lines using markdown headers for structure rather than prose paragraphs
|
||||||
|
duration: ""
|
||||||
|
verification_result: passed
|
||||||
|
completed_at: 2026-04-04T14:45:01.092Z
|
||||||
|
blocker_discovered: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# T02: Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged
|
||||||
|
|
||||||
|
**Rewrote _SYSTEM_PROMPT_TEMPLATE with citation density rules, response structure guidance, domain-aware terminology handling, and conflicting-source instructions — all 26 chat tests pass unchanged**
|
||||||
|
|
||||||
|
## What Happened
|
||||||
|
|
||||||
|
Replaced the 5-line system prompt with a structured prompt addressing citation density, response format, domain terminology, conflicting source handling, and response length. No test changes needed — all 26 tests verify behavioral properties, not prompt wording.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
cd backend && python -m pytest tests/test_chat.py -v — 26 passed in 1.37s
|
||||||
|
|
||||||
|
## Verification Evidence
|
||||||
|
|
||||||
|
| # | Command | Exit Code | Verdict | Duration |
|
||||||
|
|---|---------|-----------|---------|----------|
|
||||||
|
| 1 | `cd backend && python -m pytest tests/test_chat.py -v` | 0 | ✅ pass | 1370ms |
|
||||||
|
|
||||||
|
|
||||||
|
## Deviations
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `backend/chat_service.py`
|
||||||
|
|
||||||
|
|
||||||
|
## Deviations
|
||||||
|
None.
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
None.
|
||||||
|
|
@ -31,10 +31,22 @@ from search_service import SearchService
|
||||||
logger = logging.getLogger("chrysopedia.chat")
|
logger = logging.getLogger("chrysopedia.chat")
|
||||||
|
|
||||||
_SYSTEM_PROMPT_TEMPLATE = """\
|
_SYSTEM_PROMPT_TEMPLATE = """\
|
||||||
You are Chrysopedia, an expert encyclopedic assistant for music production techniques.
|
You are Chrysopedia, an expert assistant for music production techniques — \
|
||||||
Answer the user's question using ONLY the numbered sources below. Cite sources by
|
synthesis, sound design, mixing, sampling, and audio processing.
|
||||||
writing [N] inline (e.g. [1], [2]) where N is the source number. If the sources
|
|
||||||
do not contain enough information, say so honestly — do not invent facts.
|
## Rules
|
||||||
|
- Use ONLY the numbered sources below. Do not invent facts.
|
||||||
|
- Cite every factual claim inline with [N] immediately after the claim \
|
||||||
|
(e.g. "Parallel compression adds sustain [2] while preserving transients [1].").
|
||||||
|
- When sources disagree, present both perspectives with their citations.
|
||||||
|
- If the sources lack enough information, say so honestly.
|
||||||
|
|
||||||
|
## Response format
|
||||||
|
- Aim for 2–4 short paragraphs. Expand only when the question warrants detail.
|
||||||
|
- Use bullet lists for steps, signal chains, or parameter lists.
|
||||||
|
- **Bold** key terms on first mention.
|
||||||
|
- Use audio/synthesis/mixing terminology naturally — do not over-explain \
|
||||||
|
standard concepts (e.g. LFO, sidechain, wet/dry) unless the user asks.
|
||||||
|
|
||||||
Sources:
|
Sources:
|
||||||
{context_block}
|
{context_block}
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue