feat: Created composition prompt with merge rules, citation re-indexing…

- "prompts/stage5_compose.txt"
- ".gsd/milestones/M014/slices/S02/tasks/T01-SUMMARY.md"

GSD-Task: S02/T01
This commit is contained in:
jlightner 2026-04-03 01:03:01 +00:00
parent 44197f550c
commit 709d14802c
12 changed files with 892 additions and 2 deletions

View file

@ -29,3 +29,4 @@
| D021 | M011 | scope | Which UI/UX assessment findings to implement in M011 | 12 of 16 findings approved; F01 (beginner paths), F02 (YouTube links), F03 (hide admin), F15 (CTA label) denied | User triaged each finding individually. Denied F01 because audience knows what they want. Denied F02 because no video URLs / don't want to link out. Denied F03 because admin dropdown is fine as-is. Denied F15 as low-value. | Yes | human | | D021 | M011 | scope | Which UI/UX assessment findings to implement in M011 | 12 of 16 findings approved; F01 (beginner paths), F02 (YouTube links), F03 (hide admin), F15 (CTA label) denied | User triaged each finding individually. Denied F01 because audience knows what they want. Denied F02 because no video URLs / don't want to link out. Denied F03 because admin dropdown is fine as-is. Denied F15 as low-value. | Yes | human |
| D022 | | requirement | R025 | validated | useDocumentTitle hook called in all 10 pages. Static pages set fixed titles, dynamic pages (SubTopicPage, CreatorDetail, TechniquePage, SearchResults) update title when async data loads. | Yes | agent | | D022 | | requirement | R025 | validated | useDocumentTitle hook called in all 10 pages. Static pages set fixed titles, dynamic pages (SubTopicPage, CreatorDetail, TechniquePage, SearchResults) update title when async data loads. | Yes | agent |
| D023 | M012/S01 | architecture | Qdrant embedding text enrichment strategy | Prepend creator_name and join topic_tags into embedding text for technique pages and key moments. Batch-resolve creator names at stage 6 start. | Semantic search now surfaces results for creator-name queries and tag-specific queries. Batch resolution avoids N+1 lookups during embedding. Reindex-all endpoint enables one-shot re-embedding after text composition changes. | Yes | agent | | D023 | M012/S01 | architecture | Qdrant embedding text enrichment strategy | Prepend creator_name and join topic_tags into embedding text for technique pages and key moments. Batch-resolve creator names at stage 6 start. | Semantic search now surfaces results for creator-name queries and tag-specific queries. Batch resolution avoids N+1 lookups during embedding. Reindex-all endpoint enables one-shot re-embedding after text composition changes. | Yes | agent |
| D024 | M014/S01 | architecture | Content model for sections with subsections | Sections with subsections use empty-string content field; substance lives in subsections | Avoids duplication between section-level content and subsection content. The section heading serves as H2 container; all prose lives in subsection content fields. Sections without subsections use the section content field directly. | Yes | agent |

View file

@ -6,7 +6,7 @@ Restructure technique pages to be broader (per-creator+category across videos),
## Slice Overview ## Slice Overview
| ID | Slice | Risk | Depends | Done | After this | | ID | Slice | Risk | Depends | Done | After this |
|----|-------|------|---------|------|------------| |----|-------|------|---------|------|------------|
| S01 | Synthesis Prompt v5 — Nested Sections + Citations | high | — | | Run test harness with new prompt → output has list-of-objects body_sections with H2/H3 nesting, citation markers on key claims, broader page scope. | | S01 | Synthesis Prompt v5 — Nested Sections + Citations | high | — | | Run test harness with new prompt → output has list-of-objects body_sections with H2/H3 nesting, citation markers on key claims, broader page scope. |
| S02 | Composition Prompt + Test Harness Compose Mode | high | S01 | ⬜ | Run test harness --compose mode with existing page + new moments → merged output with deduplication, new sections, updated citations. | | S02 | Composition Prompt + Test Harness Compose Mode | high | S01 | ⬜ | Run test harness --compose mode with existing page + new moments → merged output with deduplication, new sections, updated citations. |
| S03 | Data Model + Migration | low | — | ⬜ | Alembic migration runs clean. API response includes body_sections_format and source_videos fields. | | S03 | Data Model + Migration | low | — | ⬜ | Alembic migration runs clean. API response includes body_sections_format and source_videos fields. |
| S04 | Pipeline Compose-or-Create Logic | high | S01, S02, S03 | ⬜ | Process two COPYCATT videos. Second video's moments composed into existing page. technique_page_videos has both video IDs. | | S04 | Pipeline Compose-or-Create Logic | high | S01, S02, S03 | ⬜ | Process two COPYCATT videos. Second video's moments composed into existing page. technique_page_videos has both video IDs. |

View file

@ -0,0 +1,102 @@
---
id: S01
parent: M014
milestone: M014
provides:
- BodySection/BodySubSection Pydantic models for downstream rendering (S05) and DB storage (S03)
- citation_utils.extract_citations() and validate_citations() for pipeline quality checks
- stage5_synthesis.txt v5 prompt with v2 output format for pipeline execution (S04)
- test_harness v2 format support for composition testing (S02)
requires:
[]
affects:
- S02 — uses updated test harness and prompt for compose mode testing
- S04 — pipeline uses the v2 prompt and parses v2 output via BodySection models
- S05 — frontend renders list[BodySection] with TOC and citation links
key_files:
- backend/pipeline/schemas.py
- backend/pipeline/citation_utils.py
- backend/pipeline/test_citation_utils.py
- backend/pipeline/test_harness.py
- backend/pipeline/test_harness_v2_format.py
- prompts/stage5_synthesis.txt
key_decisions:
- D024: Citations use 0-based indices matching moment list position
- D024: Sections with subsections use empty-string content; substance lives in subsections
- Word count includes section.content + subsection.content; heading text excluded
- Citation report uses CITE log tag for grep-friendly structured output
patterns_established:
- body_sections v2 format: list[BodySection] with heading/content/subsections, discriminated by body_sections_format='v2'
- Citation validation as reusable utility — validate_citations() returns a dict with coverage_pct, invalid_indices, uncited_moments for any consumer
- Prompt backup convention: .bak with timestamp suffix before destructive edits
observability_surfaces:
- CITE log tag in test harness output for per-page citation coverage reporting
drill_down_paths:
- .gsd/milestones/M014/slices/S01/tasks/T01-SUMMARY.md
- .gsd/milestones/M014/slices/S01/tasks/T02-SUMMARY.md
- .gsd/milestones/M014/slices/S01/tasks/T03-SUMMARY.md
duration: ""
verification_result: passed
completed_at: 2026-04-03T00:55:48.823Z
blocker_discovered: false
---
# S01: Synthesis Prompt v5 — Nested Sections + Citations
**Introduced body_sections v2 format (list-of-objects with H2/H3 nesting and inline [N] citation markers), backed by Pydantic schema, citation validation utility, updated synthesis prompt, and v2-aware test harness — all verified with 28 passing tests.**
## What Happened
This slice replaced the flat dict-based body_sections format with a structured list-of-objects format supporting H2/H3 nesting and inline citation markers.
**T01 — Schema & Citation Utils:** Added BodySection and BodySubSection Pydantic models to schemas.py. BodySection has heading, content, and optional subsections list. Changed SynthesizedPage.body_sections from dict to list[BodySection] and added body_sections_format='v2' discriminator field. Created citation_utils.py with extract_citations() (regex parser for [N] and [N,M] markers) and validate_citations() (returns coverage stats, invalid indices, uncited moments). 15 unit tests cover all edge cases including multi-citation markers, out-of-range indices, subsection citations, and zero-moment edge case.
**T02 — Prompt Rewrite:** Backed up the existing prompt, then made four targeted edits to stage5_synthesis.txt: (1) H3 subsection guidance in the body sections structure section, (2) new Citation rules section defining [N] and [N,M] format with placement conventions, (3) rewrote the Output format JSON example with v2 structure showing both flat and nested sections with inline citations, (4) updated Field rules to describe the new format. All non-output sections (voice, tone, synthesis philosophy) preserved unchanged.
**T03 — Test Harness Update:** Updated two locations in test_harness.py where body_sections was iterated as a dict — now walks BodySection objects for word counting and metadata. Added subsection count reporting and per-page citation coverage logging with a CITE log tag for grep-friendly output. Created test_harness_v2_format.py with 13 tests covering word counting (flat and nested), section counting, citation integration, and SynthesisResult round-trip serialization.
## Verification
All three task verification commands pass:
- `python -m pytest backend/pipeline/test_citation_utils.py -v` — 15/15 passed (0.08s)
- `python -m pytest backend/pipeline/test_harness_v2_format.py -v` — 13/13 passed (0.08s)
- Prompt structure assertion (body_sections_format, Citation rules, subsections present) — passed
Total: 28 tests passing, prompt validates structurally.
## Requirements Advanced
None.
## Requirements Validated
None.
## New Requirements Surfaced
None.
## Requirements Invalidated or Re-scoped
None.
## Deviations
None.
## Known Limitations
The v2 format is schema-validated and test-harnessed but has not been run against a live LLM yet — that happens when the pipeline compose logic (S04) exercises the prompt end-to-end.
## Follow-ups
None.
## Files Created/Modified
- `backend/pipeline/schemas.py` — Added BodySubSection, BodySection models; changed SynthesizedPage.body_sections to list[BodySection]; added body_sections_format='v2'
- `backend/pipeline/citation_utils.py` — New file: extract_citations() and validate_citations() functions
- `backend/pipeline/test_citation_utils.py` — New file: 15 unit tests for citation utilities
- `backend/pipeline/test_harness.py` — Updated word-count and metadata logic from dict iteration to BodySection object walking; added citation coverage reporting
- `backend/pipeline/test_harness_v2_format.py` — New file: 13 tests for v2 format word counting, section counting, citation integration, round-trip
- `prompts/stage5_synthesis.txt` — Added Citation rules section, H3 subsection guidance, rewrote Output format to v2 list-of-objects with citations, updated Field rules

View file

@ -0,0 +1,101 @@
# S01: Synthesis Prompt v5 — Nested Sections + Citations — UAT
**Milestone:** M014
**Written:** 2026-04-03T00:55:48.824Z
## UAT: S01 — Synthesis Prompt v5 — Nested Sections + Citations
### Preconditions
- Working directory: `/home/aux/projects/content-to-kb-automator`
- Python 3.12+ with pytest available
- All slice files committed
---
### Test 1: BodySection schema models exist and validate
**Steps:**
1. Run: `python -c "from pipeline.schemas import BodySection, BodySubSection; s = BodySubSection(heading='Sub', content='text'); b = BodySection(heading='Main', content='body', subsections=[s]); print(b.model_dump())"`
**Expected:** Outputs a dict with heading, content, and subsections list containing one subsection object. No import or validation errors.
---
### Test 2: SynthesizedPage uses v2 format
**Steps:**
1. Run: `python -c "from pipeline.schemas import SynthesizedPage; p = SynthesizedPage(title='T', slug='t', summary='s', body_sections=[], moment_indices=[], tags=[]); assert p.body_sections_format == 'v2'; assert isinstance(p.body_sections, list); print('v2 format confirmed')"`
**Expected:** Prints "v2 format confirmed". body_sections is list type, body_sections_format defaults to 'v2'.
---
### Test 3: Citation extraction parses all marker formats
**Steps:**
1. Run: `python -c "from pipeline.citation_utils import extract_citations; assert extract_citations('claim [0] and [2,3] here') == [0, 2, 3]; assert extract_citations('no markers') == []; assert extract_citations('[1][1][5]') == [1, 5]; print('extraction OK')"`
**Expected:** All assertions pass. Single markers, multi-markers, deduplication, and empty input handled correctly.
---
### Test 4: Citation validation returns coverage stats
**Steps:**
1. Run: `python -c "from pipeline.citation_utils import validate_citations; from pipeline.schemas import BodySection; secs = [BodySection(heading='H', content='claim [0] and [1] here')]; r = validate_citations(secs, 3); assert r['valid'] == True; assert r['total_citations'] == 2; assert r['uncited_moments'] == [2]; assert abs(r['coverage_pct'] - 66.67) < 0.1; print('validation OK')"`
**Expected:** valid=True (no out-of-range), 2 citations found, moment index 2 uncited, ~66.67% coverage.
---
### Test 5: Full citation test suite passes
**Steps:**
1. Run: `cd /home/aux/projects/content-to-kb-automator && python -m pytest backend/pipeline/test_citation_utils.py -v`
**Expected:** 15 tests pass, 0 failures.
---
### Test 6: Full harness v2 format test suite passes
**Steps:**
1. Run: `cd /home/aux/projects/content-to-kb-automator && python -m pytest backend/pipeline/test_harness_v2_format.py -v`
**Expected:** 13 tests pass, 0 failures.
---
### Test 7: Prompt file contains v2 structure
**Steps:**
1. Run: `python -c "t=open('prompts/stage5_synthesis.txt').read(); assert 'body_sections_format' in t; assert 'Citation rules' in t; assert '\"subsections\"' in t; assert '\"heading\"' in t; print('Prompt v5 OK')"`
**Expected:** All assertions pass — prompt contains v2 JSON structure, citation rules, and subsection definitions.
---
### Test 8: Prompt backup exists
**Steps:**
1. Run: `ls prompts/stage5_synthesis.*.bak`
**Expected:** At least one .bak file exists (timestamped backup of pre-v5 prompt).
---
### Edge Case 9: Out-of-range citation index flagged
**Steps:**
1. Run: `python -c "from pipeline.citation_utils import validate_citations; from pipeline.schemas import BodySection; secs = [BodySection(heading='H', content='bad ref [99]')]; r = validate_citations(secs, 3); assert r['valid'] == False; assert 99 in r['invalid_indices']; print('out-of-range caught')"`
**Expected:** valid=False, invalid_indices contains 99.
---
### Edge Case 10: Empty sections produce zero coverage
**Steps:**
1. Run: `python -c "from pipeline.citation_utils import validate_citations; r = validate_citations([], 5); assert r['total_citations'] == 0; assert r['coverage_pct'] == 0.0; print('empty OK')"`
**Expected:** Zero citations, 0% coverage, no crashes on empty input.

View file

@ -0,0 +1,28 @@
{
"schemaVersion": 1,
"taskId": "T03",
"unitId": "M014/S01/T03",
"timestamp": 1775177667487,
"passed": true,
"discoverySource": "task-plan",
"checks": [
{
"command": "cd /home/aux/projects/content-to-kb-automator",
"exitCode": 0,
"durationMs": 7,
"verdict": "pass"
},
{
"command": "python -m pytest backend/pipeline/test_harness_v2_format.py -v",
"exitCode": 0,
"durationMs": 329,
"verdict": "pass"
},
{
"command": "python -m pytest backend/pipeline/test_citation_utils.py -v",
"exitCode": 0,
"durationMs": 287,
"verdict": "pass"
}
]
}

View file

@ -1,6 +1,87 @@
# S02: Composition Prompt + Test Harness Compose Mode # S02: Composition Prompt + Test Harness Compose Mode
**Goal:** Create composition prompt for merging new moments into existing pages. Add compose mode to test harness. Fix missing stage5_merge.txt. **Goal:** Composition prompt (stage5_compose.txt) and test harness --compose mode enable offline testing of merging new video moments into existing technique pages, producing unified v2 output with correct citation re-indexing and deduplication.
**Demo:** After this: Run test harness --compose mode with existing page + new moments → merged output with deduplication, new sections, updated citations. **Demo:** After this: Run test harness --compose mode with existing page + new moments → merged output with deduplication, new sections, updated citations.
## Tasks ## Tasks
- [x] **T01: Created composition prompt with merge rules, citation re-indexing, deduplication guidance, and two valid v2 JSON examples** — Create the composition prompt that instructs the LLM to merge new video moments into an existing technique page. The prompt receives three inputs: existing page JSON, new moments list, and creator name. Must define merge semantics (preserve existing prose, integrate new material into existing sections or create new ones), deduplication rules (enrich rather than repeat), citation re-indexing scheme (existing [0]-[N-1], new [N]-[N+M-1]), and output format (same SynthesisResult/SynthesizedPage v2 schema as synthesis).
Steps:
1. Read `prompts/stage5_synthesis.txt` to understand voice/tone rules and v2 output format — the compose prompt must reference the same writing standards but NOT duplicate them (the compose prompt is self-contained).
2. Write `prompts/stage5_compose.txt` with these sections:
- Role/context preamble explaining the composition task
- Input format: `<existing_page>`, `<existing_moments>`, `<new_moments>`, `<creator>` XML tags
- Merge rules: preserve existing prose verbatim where correct; add new sections for genuinely new territory; enrich existing sections with new detail where topics overlap
- Deduplication guidance with examples of 'enriching' vs 'duplicating'
- Citation re-indexing rules: existing moments keep indices [0]-[N-1], new moments get [N]-[N+M-1], composed output uses the unified index space
- Section ordering: maintain workflow-order rule from synthesis
- Output format: same SynthesisResult JSON with v2 body_sections (list[BodySection])
- Include a concrete JSON example showing composed output with mixed old/new citations
3. Validate the prompt structurally: check that the JSON example parses as valid SynthesisResult, citation rules section mentions the dual-index scheme, merge rules section is present.
- Estimate: 45m
- Files: prompts/stage5_compose.txt, prompts/stage5_synthesis.txt
- Verify: python3 -c "
import json, pathlib, re
p = pathlib.Path('prompts/stage5_compose.txt')
assert p.exists(), 'prompt file missing'
txt = p.read_text()
assert len(txt) > 2000, f'prompt too short: {len(txt)} chars'
for section in ['existing_page', 'new_moments', 'existing_moments', 'creator', 'Citation', 'dedup', 'Merge']:
assert any(s.lower() in txt.lower() for s in [section]), f'missing section: {section}'
# Check JSON example is valid
json_match = re.search(r'```json\n(.*?)```', txt, re.DOTALL)
assert json_match, 'no JSON example found'
json.loads(json_match.group(1))
print('All structural checks passed')
"
- [ ] **T02: Add compose subcommand to test harness** — Add a `compose` subcommand to `backend/pipeline/test_harness.py` that loads an existing page JSON + new moments fixture, builds a compose user prompt, calls the LLM via the compose prompt, and validates the output. Also extract the compose user-prompt builder as a testable function.
Steps:
1. Read `backend/pipeline/test_harness.py` fully to understand the existing `run` and `promote` subcommand patterns.
2. Add a `build_compose_prompt()` function that takes:
- `existing_page: dict` (SynthesizedPage as dict, from prior synthesis output)
- `existing_moments: list[tuple[MockKeyMoment, dict]]` (the original moments the page was built from)
- `new_moments: list[tuple[MockKeyMoment, dict]]` (new video's moments to compose in)
- `creator_name: str`
Returns: the user prompt string with `<existing_page>`, `<existing_moments>`, `<new_moments>`, and `<creator>` XML tags. Existing moments get indices [0]-[N-1], new moments get [N]-[N+M-1]. Uses `build_moments_text()` for formatting moment lists.
3. Add a `run_compose()` function (parallel to `run_synthesis()`) that:
- Loads the existing page JSON and validates as SynthesizedPage
- Loads the new fixture via existing `load_fixture()`
- Filters new moments by category to match `existing_page.topic_category`
- Needs the existing moments for citation context — accept them as a separate fixture or extract from the existing page's moment_indices + a moments fixture
- Calls `build_compose_prompt()` to build the user prompt
- Calls LLM with compose system prompt
- Parses response as SynthesisResult
- Runs citation validation with moment_count = len(existing_moments) + len(new_moments)
- Logs compose-specific metrics: word count before/after, sections before/after, new sections added
4. Register `compose` subcommand in the `main()` argparse with args: `--existing-page` (required), `--fixture` (required), `--existing-fixture` (required, original moments fixture for citation context), `--prompt`, `--output`, `--category`, `--model`, `--modality`.
5. Wire the subcommand handler to call `run_compose()` and write output JSON.
6. Verify: `cd backend && python -m pipeline.test_harness compose --help` exits 0.
- Estimate: 1h
- Files: backend/pipeline/test_harness.py
- Verify: cd backend && python -m pipeline.test_harness compose --help
- [ ] **T03: Unit tests for compose-mode prompt building and validation** — Write unit tests for the compose harness plumbing — no LLM calls. Tests cover prompt construction, citation re-indexing math, category filtering, word count comparison, and edge cases.
Steps:
1. Read the updated `backend/pipeline/test_harness.py` (from T02) to understand `build_compose_prompt()` signature and `run_compose()` interface.
2. Read `backend/pipeline/test_harness_v2_format.py` for test patterns (helper functions, BodySection construction, etc.).
3. Create `backend/pipeline/test_harness_compose.py` with these test classes:
- `TestBuildComposePrompt`:
- Existing page + 3 old moments + 2 new moments → prompt contains all XML tags, old moments indexed [0]-[2], new moments indexed [3]-[4]
- Creator name appears in `<creator>` tag
- Existing page JSON appears in `<existing_page>` tag and is valid JSON
- Moment text uses same format as `build_moments_text()`
- `TestCitationReindexing`:
- 5 old + 3 new moments → valid citation range is [0]-[7]
- validate_citations() with moment_count=8 accepts citations in [0]-[7]
- validate_citations() with moment_count=8 rejects citation [8]
- `TestCategoryFiltering`:
- Build fixture with moments in 2 categories → compose with existing page of category A → only category A new moments used
- `TestEdgeCases`:
- Empty new moments list → prompt still valid, just no new_moments content
- Single new moment → prompt has one entry at index [N]
- Existing page with no subsections → handled correctly
4. Run tests: `cd backend && python -m pytest pipeline/test_harness_compose.py -v`
- Estimate: 45m
- Files: backend/pipeline/test_harness_compose.py
- Verify: cd backend && python -m pytest pipeline/test_harness_compose.py -v

View file

@ -0,0 +1,163 @@
# S02 Research: Composition Prompt + Test Harness Compose Mode
## Summary
This slice adds the ability to compose new video content into existing technique pages — the core multi-source capability. Two deliverables: (1) a `stage5_compose.txt` prompt that instructs the LLM to merge new moments into an existing page, and (2) a `--compose` mode for the test harness that exercises this prompt offline.
The work is structurally parallel to S01 (prompt + test harness update + tests) but the LLM task is harder: composition requires deduplication, section merging, citation re-indexing, and preserving the quality of existing prose while integrating new material.
## Recommendation
**Targeted research.** The patterns are established by S01 (Pydantic schemas, test harness structure, citation utils). The new work is the composition prompt design and the harness plumbing to feed existing page + new moments to the LLM.
## Implementation Landscape
### What Exists
| File | Role | Relevance |
|------|------|-----------|
| `backend/pipeline/test_harness.py` | Offline synthesis test runner | Needs `compose` subcommand added alongside existing `run` and `promote` |
| `backend/pipeline/schemas.py` | `SynthesizedPage`, `BodySection`, `SynthesisResult` | Composition output should use the same `SynthesisResult` schema — no new output schema needed |
| `backend/pipeline/citation_utils.py` | `validate_citations()`, `extract_citations()` | Reused as-is for compose output validation |
| `backend/pipeline/test_harness_v2_format.py` | v2 format tests | Pattern to follow for compose-mode tests |
| `prompts/stage5_synthesis.txt` | v5 synthesis prompt (from S01) | Compose prompt references same voice/tone/structure rules but has different instructions |
| `backend/pipeline/stages.py` `_merge_pages_by_slug()` | Merges chunk-split pages by slug via LLM | Closest existing pattern. Uses `stage5_merge.txt` prompt (which doesn't exist on disk yet — it's referenced but never created). Compose is similar but merges new moments into an existing page rather than merging partial pages from the same video. |
| `backend/pipeline/llm_client.py` | `LLMClient.complete()`, `estimate_max_tokens()` | Used by test harness already; compose mode uses the same interface |
| `backend/pipeline/export_fixture.py` | Exports video moments to fixture JSON | May need extension to also export existing page JSON for compose fixtures |
### What Needs Building
#### 1. Composition Prompt (`prompts/stage5_compose.txt`)
The LLM receives:
- `<existing_page>`: Full JSON of the current `SynthesizedPage` (body_sections, citations, moment_indices, etc.)
- `<new_moments>`: The new moments from a second video, same format as synthesis input
- `<creator>`: Creator name
The prompt must instruct the LLM to:
- **Preserve existing prose quality** — don't rewrite sections that are fine
- **Integrate new material** — add to existing sections where new moments cover the same sub-topic, or create new sections for genuinely new territory
- **Deduplicate** — if new moments cover ground already in the page, enrich rather than repeat
- **Re-index citations** — existing citations [0]-[N] refer to the original moment list. New moments get indices [N+1]-[N+M]. The composed output must use a unified index space. The prompt must explain the indexing scheme clearly.
- **Maintain section ordering** — workflow-order rule from synthesis prompt still applies
- **Output format** — same `SynthesisResult` / `SynthesizedPage` with v2 body_sections
Key design decision: The compose prompt should receive the existing page's `moment_indices` mapped to their moment data so the LLM understands what's already covered. This means the user prompt includes both old moments (for context) and new moments (to integrate), with clear labeling.
#### 2. Test Harness Compose Mode
Add a `compose` subcommand to `test_harness.py`:
```
python -m pipeline.test_harness compose \
--existing-page /tmp/existing_page.json \
--fixture fixtures/new_video.json \
--prompt prompts/stage5_compose.txt \
--output /tmp/composed.json
```
Arguments:
- `--existing-page`: JSON file containing a single `SynthesizedPage` (output from a prior synthesis run)
- `--fixture`: New moments fixture (same format as `run` mode)
- `--category`: Optional filter (only compose for matching category)
- `--prompt`: Compose prompt file (defaults to `stage5_compose.txt`)
- `--output`: Output path
- `--model`, `--modality`: LLM overrides (same as `run`)
The compose runner needs to:
1. Load the existing page JSON and validate it as `SynthesizedPage`
2. Load the new fixture moments (same as `run` mode)
3. Filter new moments by category to match the existing page's `topic_category`
4. Build a user prompt with `<existing_page>`, `<new_moments>`, and `<creator>` tags
5. Call the LLM with the compose system prompt
6. Parse the response as `SynthesisResult` (the LLM returns an updated version of the page)
7. Run citation validation on the output, with moment_count = old_moments + new_moments
8. Log deduplication metrics: how many old sections survived, how many new sections appeared, word count delta
#### 3. Tests (`backend/pipeline/test_harness_compose.py`)
Unit tests for compose-specific logic (no LLM calls):
- User prompt construction: existing page + new moments formatted correctly
- Citation re-indexing math: old moments 0-4, new moments 5-9, validate combined
- Word count comparison: before/after composition
- Category matching: only compose moments matching the page's topic_category
- Edge cases: empty new moments, single new moment, existing page with no subsections
### Seam Analysis
The work divides cleanly into three tasks:
**T01: Composition prompt** — Write `prompts/stage5_compose.txt`. Self-contained file. Can reference the synthesis prompt's voice/tone section but must have its own instructions for merge semantics, deduplication, and citation re-indexing. Verifiable by structural checks (required sections present, example output valid).
**T02: Test harness compose subcommand** — Add `compose` subcommand to `test_harness.py`. Depends on T01 only for the default prompt path. The harness code itself is pure Python with no LLM dependency in tests. Needs a helper to build the compose user prompt (existing page JSON + new moments text).
**T03: Compose-mode tests** — Unit tests for the compose plumbing. Depends on T02 code existing. Tests the prompt construction, citation math, category filtering, and word count diffing. No LLM calls.
### Citation Re-indexing Strategy
This is the trickiest part. In compose mode:
- The existing page has citations [0]-[N-1] referring to original moments
- New moments are numbered [N]-[N+M-1] in the unified space
- The LLM needs to understand both sets so it can:
- Preserve existing [0]-[N-1] citations in untouched prose
- Add [N]-[N+M-1] citations in new/updated prose
- Potentially add cross-references (new moment enriches an existing section → add [N+k] alongside existing [j])
The user prompt should present:
```
<existing_moments>
[0] Title: ... (already cited in existing page)
[1] Title: ...
...
[N-1] Title: ...
</existing_moments>
<new_moments>
[N] Title: ... (new content to integrate)
[N+1] Title: ...
...
[N+M-1] Title: ...
</new_moments>
<existing_page>
{full page JSON}
</existing_page>
```
This way the LLM sees the complete index space and can cite any moment by its global index.
### Deduplication Signal
The prompt should instruct the LLM to compare new moment summaries against existing section content. If a new moment covers material already present:
- Don't add a new section
- Optionally add the new moment's citation to the existing passage if it provides a second source
- Only add new prose if the new moment contains genuinely new detail (different settings, different context, additional nuance)
This is a judgment call for the LLM — the prompt needs clear examples of "enriching" vs "duplicating."
### Risks
1. **LLM may rewrite too aggressively** — Given the full page, the LLM might want to rewrite everything. The prompt must be explicit: preserve existing prose verbatim where it's correct, only modify sections where new information changes or enriches them.
2. **Citation index confusion** — The LLM needs to correctly distinguish old indices from new ones. Clear labeling in the user prompt is critical.
3. **Output size** — Composed pages are larger (more moments, more sections). Token limits may need adjustment. `estimate_max_tokens` should handle this since it's based on input size.
### Don't Hand-Roll
- Citation validation: use existing `validate_citations()` from `citation_utils.py`
- Word counting: use the same pattern from `test_harness_v2_format.py`
- LLM interaction: use existing `LLMClient.complete()` and `estimate_max_tokens()`
- Schema validation: use existing `SynthesisResult.model_validate_json()`
### Verification Strategy
- **T01 (prompt):** Structural assertions — required sections present, example output parses as valid SynthesisResult, citation rules section explains the dual-index scheme
- **T02 (harness):** `python -m pipeline.test_harness compose --help` exits 0; compose user prompt builder produces expected format with existing page and new moments
- **T03 (tests):** `python -m pytest backend/pipeline/test_harness_compose.py -v` — all pass
### Notes for Planner
- The `compose` subcommand mirrors the `run` subcommand structure but with different inputs. Keep the pattern consistent.
- The compose prompt is the highest-risk deliverable — it drives whether the LLM produces good composed output. The test harness is mechanical.
- S04 (Pipeline Compose-or-Create Logic) will use both the compose prompt and the test harness patterns when wiring into the real pipeline. The harness is the offline proving ground.
- The existing `_merge_pages_by_slug` in stages.py merges partial pages from chunk splitting. Composition is conceptually different (existing page + new moments) but the LLM call pattern is the same.
- No new Pydantic schemas needed — compose output is the same `SynthesisResult` with `SynthesizedPage` objects.

View file

@ -0,0 +1,48 @@
---
estimated_steps: 13
estimated_files: 2
skills_used: []
---
# T01: Write stage5_compose.txt composition prompt
Create the composition prompt that instructs the LLM to merge new video moments into an existing technique page. The prompt receives three inputs: existing page JSON, new moments list, and creator name. Must define merge semantics (preserve existing prose, integrate new material into existing sections or create new ones), deduplication rules (enrich rather than repeat), citation re-indexing scheme (existing [0]-[N-1], new [N]-[N+M-1]), and output format (same SynthesisResult/SynthesizedPage v2 schema as synthesis).
Steps:
1. Read `prompts/stage5_synthesis.txt` to understand voice/tone rules and v2 output format — the compose prompt must reference the same writing standards but NOT duplicate them (the compose prompt is self-contained).
2. Write `prompts/stage5_compose.txt` with these sections:
- Role/context preamble explaining the composition task
- Input format: `<existing_page>`, `<existing_moments>`, `<new_moments>`, `<creator>` XML tags
- Merge rules: preserve existing prose verbatim where correct; add new sections for genuinely new territory; enrich existing sections with new detail where topics overlap
- Deduplication guidance with examples of 'enriching' vs 'duplicating'
- Citation re-indexing rules: existing moments keep indices [0]-[N-1], new moments get [N]-[N+M-1], composed output uses the unified index space
- Section ordering: maintain workflow-order rule from synthesis
- Output format: same SynthesisResult JSON with v2 body_sections (list[BodySection])
- Include a concrete JSON example showing composed output with mixed old/new citations
3. Validate the prompt structurally: check that the JSON example parses as valid SynthesisResult, citation rules section mentions the dual-index scheme, merge rules section is present.
## Inputs
- ``prompts/stage5_synthesis.txt` — voice/tone rules and v2 output format to reference`
- ``backend/pipeline/schemas.py` — SynthesisResult/SynthesizedPage/BodySection schema definitions`
## Expected Output
- ``prompts/stage5_compose.txt` — complete composition prompt with merge rules, citation re-indexing, dedup guidance, and v2 output example`
## Verification
python3 -c "
import json, pathlib, re
p = pathlib.Path('prompts/stage5_compose.txt')
assert p.exists(), 'prompt file missing'
txt = p.read_text()
assert len(txt) > 2000, f'prompt too short: {len(txt)} chars'
for section in ['existing_page', 'new_moments', 'existing_moments', 'creator', 'Citation', 'dedup', 'Merge']:
assert any(s.lower() in txt.lower() for s in [section]), f'missing section: {section}'
# Check JSON example is valid
json_match = re.search(r'```json\n(.*?)```', txt, re.DOTALL)
assert json_match, 'no JSON example found'
json.loads(json_match.group(1))
print('All structural checks passed')
"

View file

@ -0,0 +1,78 @@
---
id: T01
parent: S02
milestone: M014
provides: []
requires: []
affects: []
key_files: ["prompts/stage5_compose.txt", ".gsd/milestones/M014/slices/S02/tasks/T01-SUMMARY.md"]
key_decisions: ["Composition prompt is self-contained — carries its own writing standards rather than importing from synthesis prompt at runtime", "Offset-based citation scheme: existing keep [0]-[N-1], new get [N]-[N+M-1], no renumbering"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "Ran structural validation: file exists, 13053 chars (>2000), all 7 required sections present, both JSON blocks parse and validate with correct v2 structure (pages key, body_sections_format, moment_indices)."
completed_at: 2026-04-03T01:02:57.677Z
blocker_discovered: false
---
# T01: Created composition prompt with merge rules, citation re-indexing, deduplication guidance, and two valid v2 JSON examples
> Created composition prompt with merge rules, citation re-indexing, deduplication guidance, and two valid v2 JSON examples
## What Happened
---
id: T01
parent: S02
milestone: M014
key_files:
- prompts/stage5_compose.txt
- .gsd/milestones/M014/slices/S02/tasks/T01-SUMMARY.md
key_decisions:
- Composition prompt is self-contained — carries its own writing standards rather than importing from synthesis prompt at runtime
- Offset-based citation scheme: existing keep [0]-[N-1], new get [N]-[N+M-1], no renumbering
duration: ""
verification_result: passed
completed_at: 2026-04-03T01:02:57.678Z
blocker_discovered: false
---
# T01: Created composition prompt with merge rules, citation re-indexing, deduplication guidance, and two valid v2 JSON examples
**Created composition prompt with merge rules, citation re-indexing, deduplication guidance, and two valid v2 JSON examples**
## What Happened
Wrote prompts/stage5_compose.txt — the LLM prompt for composing new video moments into an existing technique page. Defines four XML input tags, offset-based citation re-indexing (existing [0]-[N-1], new [N]-[N+M-1]), detailed merge/enrich/dedup rules with concrete examples, workflow-ordered section placement, and v2 SynthesisResult output format. Two JSON examples included, both validated structurally.
## Verification
Ran structural validation: file exists, 13053 chars (>2000), all 7 required sections present, both JSON blocks parse and validate with correct v2 structure (pages key, body_sections_format, moment_indices).
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `python3 structural validation (sections + JSON parse)` | 0 | ✅ pass | 300ms |
| 2 | `python3 dual JSON block validation (v2 schema)` | 0 | ✅ pass | 200ms |
## Deviations
None.
## Known Issues
None.
## Files Created/Modified
- `prompts/stage5_compose.txt`
- `.gsd/milestones/M014/slices/S02/tasks/T01-SUMMARY.md`
## Deviations
None.
## Known Issues
None.

View file

@ -0,0 +1,46 @@
---
estimated_steps: 22
estimated_files: 1
skills_used: []
---
# T02: Add compose subcommand to test harness
Add a `compose` subcommand to `backend/pipeline/test_harness.py` that loads an existing page JSON + new moments fixture, builds a compose user prompt, calls the LLM via the compose prompt, and validates the output. Also extract the compose user-prompt builder as a testable function.
Steps:
1. Read `backend/pipeline/test_harness.py` fully to understand the existing `run` and `promote` subcommand patterns.
2. Add a `build_compose_prompt()` function that takes:
- `existing_page: dict` (SynthesizedPage as dict, from prior synthesis output)
- `existing_moments: list[tuple[MockKeyMoment, dict]]` (the original moments the page was built from)
- `new_moments: list[tuple[MockKeyMoment, dict]]` (new video's moments to compose in)
- `creator_name: str`
Returns: the user prompt string with `<existing_page>`, `<existing_moments>`, `<new_moments>`, and `<creator>` XML tags. Existing moments get indices [0]-[N-1], new moments get [N]-[N+M-1]. Uses `build_moments_text()` for formatting moment lists.
3. Add a `run_compose()` function (parallel to `run_synthesis()`) that:
- Loads the existing page JSON and validates as SynthesizedPage
- Loads the new fixture via existing `load_fixture()`
- Filters new moments by category to match `existing_page.topic_category`
- Needs the existing moments for citation context — accept them as a separate fixture or extract from the existing page's moment_indices + a moments fixture
- Calls `build_compose_prompt()` to build the user prompt
- Calls LLM with compose system prompt
- Parses response as SynthesisResult
- Runs citation validation with moment_count = len(existing_moments) + len(new_moments)
- Logs compose-specific metrics: word count before/after, sections before/after, new sections added
4. Register `compose` subcommand in the `main()` argparse with args: `--existing-page` (required), `--fixture` (required), `--existing-fixture` (required, original moments fixture for citation context), `--prompt`, `--output`, `--category`, `--model`, `--modality`.
5. Wire the subcommand handler to call `run_compose()` and write output JSON.
6. Verify: `cd backend && python -m pipeline.test_harness compose --help` exits 0.
## Inputs
- ``backend/pipeline/test_harness.py` — existing harness with run/promote subcommands`
- ``backend/pipeline/schemas.py` — SynthesizedPage, SynthesisResult for validation`
- ``backend/pipeline/citation_utils.py` — validate_citations() for output checking`
- ``prompts/stage5_compose.txt` — default compose prompt path (from T01)`
## Expected Output
- ``backend/pipeline/test_harness.py` — updated with compose subcommand, build_compose_prompt() function, run_compose() function`
## Verification
cd backend && python -m pipeline.test_harness compose --help

View file

@ -0,0 +1,45 @@
---
estimated_steps: 21
estimated_files: 1
skills_used: []
---
# T03: Unit tests for compose-mode prompt building and validation
Write unit tests for the compose harness plumbing — no LLM calls. Tests cover prompt construction, citation re-indexing math, category filtering, word count comparison, and edge cases.
Steps:
1. Read the updated `backend/pipeline/test_harness.py` (from T02) to understand `build_compose_prompt()` signature and `run_compose()` interface.
2. Read `backend/pipeline/test_harness_v2_format.py` for test patterns (helper functions, BodySection construction, etc.).
3. Create `backend/pipeline/test_harness_compose.py` with these test classes:
- `TestBuildComposePrompt`:
- Existing page + 3 old moments + 2 new moments → prompt contains all XML tags, old moments indexed [0]-[2], new moments indexed [3]-[4]
- Creator name appears in `<creator>` tag
- Existing page JSON appears in `<existing_page>` tag and is valid JSON
- Moment text uses same format as `build_moments_text()`
- `TestCitationReindexing`:
- 5 old + 3 new moments → valid citation range is [0]-[7]
- validate_citations() with moment_count=8 accepts citations in [0]-[7]
- validate_citations() with moment_count=8 rejects citation [8]
- `TestCategoryFiltering`:
- Build fixture with moments in 2 categories → compose with existing page of category A → only category A new moments used
- `TestEdgeCases`:
- Empty new moments list → prompt still valid, just no new_moments content
- Single new moment → prompt has one entry at index [N]
- Existing page with no subsections → handled correctly
4. Run tests: `cd backend && python -m pytest pipeline/test_harness_compose.py -v`
## Inputs
- ``backend/pipeline/test_harness.py` — build_compose_prompt() and run_compose() from T02`
- ``backend/pipeline/schemas.py` — SynthesizedPage, BodySection for test fixtures`
- ``backend/pipeline/citation_utils.py` — validate_citations() for citation math tests`
- ``backend/pipeline/test_harness_v2_format.py` — test patterns and helpers to follow`
## Expected Output
- ``backend/pipeline/test_harness_compose.py` — unit tests for compose prompt construction, citation re-indexing, category filtering, edge cases`
## Verification
cd backend && python -m pytest pipeline/test_harness_compose.py -v

197
prompts/stage5_compose.txt Normal file
View file

@ -0,0 +1,197 @@
You are updating an existing Chrysopedia technique page with new source material. A technique page already exists — it was synthesized from earlier video moments. Now new moments from additional videos are available. Your job is to compose a unified page that integrates the new material into the existing page, preserving what's already good while enriching it with genuinely new detail.
This is NOT a rewrite. This is a merge. The existing page has been reviewed and published. Treat its prose as the baseline — preserve its voice, structure, and specific details. Your job is to strengthen it with new information, not to rephrase what's already well-written.
## What you receive
You receive four inputs in XML tags:
1. `<existing_page>` — the current SynthesizedPage JSON (v2 format: body_sections as list of BodySection objects with heading, content, subsections)
2. `<existing_moments>` — JSON array of the key moments that the existing page was built from (indices [0] through [N-1])
3. `<new_moments>` — JSON array of new key moments to integrate (these will be indexed [N] through [N+M-1] in the composed output)
4. `<creator>` — the creator name (same creator as the existing page)
## Citation re-indexing — critical
The composed page uses a unified citation index space across both old and new moments:
- **Existing moments** keep their original indices: `[0]`, `[1]`, ..., `[N-1]` where N is the length of `<existing_moments>`.
- **New moments** are assigned indices starting at N: the first new moment is `[N]`, the second is `[N+1]`, ..., the last is `[N+M-1]` where M is the length of `<new_moments>`.
- All citation markers `[N]` in the existing page prose remain valid and unchanged.
- New citations referencing new moments use the offset indices: if there are 5 existing moments, the first new moment is cited as `[5]`, the second as `[6]`, etc.
- When new information reinforces an existing claim, you may add a new citation alongside the existing one: `[2,7]` means the claim is supported by both existing moment 2 and new moment 7.
**Never renumber existing citations.** The existing `[0]`, `[1]`, etc. in the prose must remain exactly as they are. Only add new `[N+...]` markers where new material contributes.
## Merge rules
### Preserve existing prose
Keep existing section headings, prose, citations, and structure intact unless you have a specific reason to change something. Reasons to change existing prose:
- A new moment directly contradicts an existing claim (explain the contradiction, don't silently overwrite)
- A new moment provides a more specific value for a vague existing claim (update with the specific value and add the new citation)
- A factual error is corrected by new information
### Enrich existing sections with new detail
When new moments cover the same topic as an existing section, integrate the new information INTO that section:
- Add new paragraphs after existing ones (don't insert into the middle of existing paragraphs)
- Add new subsections to existing sections when the new material covers a distinct sub-aspect
- Extend existing paragraphs only by appending new sentences, never by rewriting existing sentences
### Create new sections for genuinely new territory
When new moments cover topics not addressed by any existing section, create new body_sections. Place them in workflow order relative to existing sections (follow the workflow-ordering rule: conceptual framework → core construction → combining/refining → quality checks/validation).
### Section ordering
Maintain the workflow order from the existing page. New sections slot into the correct workflow position — don't just append them at the end. If the existing page has sections for "building layers" and "bus processing", a new section about "shaping individual layers" belongs BETWEEN them, not after "bus processing".
## Deduplication guidance
The most important merge skill is distinguishing enrichment from duplication.
**Enrichment** (DO this): New moment adds a specific value, alternative approach, or deeper explanation to an existing topic.
- Existing: "Use a transient shaper on the click layer to control the attack."
- New moment says: "Set the transient shaper attack to +6dB and sustain to -8dB for maximum snap."
- Composed: "Use a transient shaper on the click layer to control the attack. Set attack to +6dB and sustain to -8dB for maximum snap [7]." ← Added specificity with new citation.
**Enrichment** (DO this): New moment provides a contrasting approach or caveat.
- Existing: "Route the effect at 100% wet on a parallel send."
- New moment says: "In a dense mix, reduce the parallel send to 60-70% wet to avoid phase smearing in the low end."
- Composed: Keep original sentence, add new paragraph: "In denser arrangements, reduce the parallel send to 60-70% wet — full wet routing can introduce phase smearing in the low end that thins out the sub [8]."
**Duplication** (DO NOT do this): New moment restates what's already covered without adding specificity.
- Existing: "OTT at 30% depth on the bus gives you upward compression without crushing dynamics."
- New moment says: "Use OTT on the bus for upward compression, around 30%."
- Action: Skip. The existing prose already covers this with equal or greater specificity. Do not add a redundant sentence.
**The test**: After composing, read each new sentence you added. Does it teach the reader something they couldn't learn from the existing prose? If not, cut it.
## Writing standards
Follow the same voice and tone as the existing page:
- Instructive, not narrative — teach the technique directly
- Direct and confident — no hedging
- Specific and grounded — concrete values, plugin names, settings
- Creator name appears at most once in body_sections content (it's already in the title/summary)
When adding new prose, match the existing page's level of detail and energy. If the existing page is technically dense, your additions should be equally dense. If it uses direct quotes, consider including a quote from the new moments if one is vivid.
## Summary update
Update the summary to reflect the enriched page. The summary should still lead with the most striking insight (which may now come from the new material if it's more compelling). Keep it 2-4 sentences. Don't just append to the existing summary — rewrite it as a unified hook for the composed page.
## Tags and plugins update
- Merge `topic_tags` from existing page and new moments. Deduplicate.
- Merge `plugins` lists. Deduplicate. Keep standard plugin names.
## Signal chains update
- Preserve existing signal chains.
- Add new signal chains only if new moments describe a routing path not already represented.
- If a new moment adds a step to an existing chain, update that chain in place.
## Output format
Return a JSON object with the same SynthesisResult structure (single key "pages" containing a list). The composed page uses the same v2 schema:
```json
{
"pages": [
{
"title": "Technique Name by Creator",
"slug": "technique-name-creator",
"topic_category": "Category",
"topic_tags": ["merged", "deduplicated", "tags"],
"summary": "Updated 2-4 sentence summary reflecting the enriched page.",
"body_sections_format": "v2",
"body_sections": [
{
"heading": "Existing section heading (preserved)",
"content": "Existing prose preserved verbatim. New detail appended with new citation markers [5]. Additional paragraph with enrichment from new moments [6].",
"subsections": [
{
"heading": "Existing subsection (preserved)",
"content": "Original subsection content unchanged [1]."
},
{
"heading": "New subsection from new material",
"content": "New detail that covers a distinct sub-aspect within this section's topic [7]."
}
]
},
{
"heading": "Entirely new section for new territory",
"content": "This section addresses a topic not covered in the existing page. Full prose with citations to new moments [5,8].\n\nSecond paragraph with additional detail [6].",
"subsections": []
}
],
"signal_chains": [],
"plugins": ["Merged", "Plugin", "List"],
"source_quality": "mixed",
"moment_indices": [0, 1, 2, 3, 4, 5, 6, 7, 8]
}
]
}
```
**moment_indices**: Must include ALL indices from both existing and new moments: `[0, 1, ..., N-1, N, N+1, ..., N+M-1]`. Every source moment — old and new — must be accounted for.
**body_sections_format**: Must be `"v2"`.
## Concrete composed example
Given 3 existing moments (indices [0]-[2]) and 2 new moments (indices [3]-[4]):
```json
{
"pages": [
{
"title": "Parallel Bass Processing by Audien",
"slug": "parallel-bass-processing-audien",
"topic_category": "Mixing",
"topic_tags": ["bass", "parallel processing", "saturation", "sub alignment", "stereo width"],
"summary": "Audien's bass chain splits the signal into three parallel paths — clean sub, saturated mid-bass, and a stereo-widened upper layer — then aligns phase at the sum point to avoid cancellation. The saturation path uses Decapitator with the drive backed off to 3.5 to keep harmonics musical rather than aggressive [1,3].",
"body_sections_format": "v2",
"body_sections": [
{
"heading": "Why parallel paths beat inline processing for bass",
"content": "Parallel bass processing routes copies of the bass signal through separate effect chains before summing them, giving independent control over each frequency band's character [0]. Inline processing forces a single chain to handle everything from sub-bass warmth to upper-harmonic presence — any saturation aggressive enough to add grit in the 200-500Hz range will distort the sub fundamental below 80Hz [0].\n\nThe three-path split works as follows: path one is the clean sub (low-passed at 80Hz, no processing beyond a gentle limiter), path two is the saturated mid-bass (bandpassed 80-500Hz through Decapitator), and path three handles stereo width (high-passed at 500Hz through a mid-side widener) [1]. New moments from a follow-up session confirm this routing but add a critical detail: insert a polarity-check plugin on each path before the sum bus [3]. Even small phase drift between the parallel paths can hollow out the 100-200Hz range when summed.",
"subsections": []
},
{
"heading": "Dialing the saturation without killing the sub",
"content": "Decapitator on the mid-bass path at drive 3.5, Style E (tube), with the Punish button off [1]. The low-cut on Decapitator's input is set to 90Hz — slightly above the crossover point — so no sub energy reaches the saturation stage [1].\n\n'If you can hear the saturation as distortion, you've gone too far — it should just feel warmer' [2]. Check in mono: if the bass thins out when you collapse to mono, the saturated path is fighting the clean sub. Pull the drive down or adjust the crossover overlap [2,4].",
"subsections": []
},
{
"heading": "Phase-checking the sum point",
"content": "After setting up the parallel paths, check phase alignment at the sum bus before committing to the mix [3]. Insert a phase correlation meter on the bass sum — the reading should stay above +0.5 during the sustain portion of bass notes. If it dips toward zero or negative, one of the parallel paths has accumulated enough latency to cause partial cancellation [3].\n\nThe fix is simple but easy to forget: add a utility plugin on the faster paths with a sample delay to align all three [4]. Nudge in 1-sample increments while watching the correlation meter. When it peaks, lock it and bypass the meter.",
"subsections": []
}
],
"signal_chains": [
{
"name": "Three-path parallel bass",
"steps": [
"Bass input → 3-band split (80Hz / 500Hz crossovers)",
"Path 1: Low-pass 80Hz → Limiter → Sum bus",
"Path 2: Bandpass 80-500Hz → Decapitator (drive 3.5, Style E, input LC 90Hz) → Sum bus",
"Path 3: High-pass 500Hz → Mid-side widener → Sum bus",
"Sum bus: Phase correlation check → Master chain"
]
}
],
"plugins": ["Soundtoys Decapitator", "FabFilter Pro-Q 3", "Voxengo SPAN"],
"source_quality": "mixed",
"moment_indices": [0, 1, 2, 3, 4]
}
]
}
```
Notice:
- Existing citations `[0]`, `[1]`, `[2]` remain in their original positions.
- New citations `[3]` and `[4]` are added where new moments contribute.
- The first section was enriched with a new paragraph about polarity-checking (from new moment [3]).
- A new third section "Phase-checking the sum point" was created for the genuinely new topic.
- The summary was rewritten to incorporate the new material as a unified hook.
- moment_indices covers all 5 moments: [0, 1, 2, 3, 4].