feat: Added TechniquePageVersion model, Alembic migration 002, pipeline…

- "backend/models.py"
- "alembic/versions/002_technique_page_versions.py"
- "backend/pipeline/stages.py"

GSD-Task: S04/T01
This commit is contained in:
jlightner 2026-03-30 07:07:16 +00:00
parent 37426aae77
commit 5c3e9b83c8
13 changed files with 821 additions and 3 deletions

View file

@ -8,5 +8,5 @@ Fix the immediate bugs (422 errors, creators page), apply a dark mode theme with
|----|-------|------|---------|------|------------|
| S01 | Fix API Bugs — Review Detail 422 + Creators Page | low | — | ✅ | Review detail page and creators page load without errors using real pipeline data |
| S02 | Dark Theme + Cyan Accents + Mobile Responsive Fix | medium | — | ✅ | App uses dark theme with cyan accents, no horizontal scroll on mobile |
| S03 | Technique Page Redesign + Video Source on Moments | medium | S01 | | Technique page matches reference HTML layout with video source on key moments, signal chain blocks, and proper section structure |
| S03 | Technique Page Redesign + Video Source on Moments | medium | S01 | | Technique page matches reference HTML layout with video source on key moments, signal chain blocks, and proper section structure |
| S04 | Article Versioning + Pipeline Tuning Metadata | high | S03 | ⬜ | Technique pages track version history with pipeline metadata; API returns version list |

View file

@ -0,0 +1,93 @@
---
id: S03
parent: M004
milestone: M004
provides:
- Technique detail API returns video_filename on each key moment
- Technique page renders meta stats, video sources, and signal chain flow blocks matching reference layout
requires:
- slice: S01
provides: Fixed technique detail endpoint (422 bug resolved)
affects:
- S04
key_files:
- backend/schemas.py
- backend/routers/techniques.py
- frontend/src/api/public-client.ts
- frontend/src/pages/TechniquePage.tsx
- frontend/src/App.css
key_decisions:
- Populate video_filename via post-processing loop rather than Pydantic from_attributes auto-mapping, since the field comes from a joined relation not a direct column
- Used text-overflow ellipsis on video filename with max-width 20rem to prevent layout breakage on long filenames
patterns_established:
- Chained selectinload for cross-relation field population in technique detail endpoint — pattern for adding more joined fields to API responses
observability_surfaces:
- none
drill_down_paths:
- .gsd/milestones/M004/slices/S03/tasks/T01-SUMMARY.md
- .gsd/milestones/M004/slices/S03/tasks/T02-SUMMARY.md
duration: ""
verification_result: passed
completed_at: 2026-03-30T06:58:43.576Z
blocker_discovered: false
---
# S03: Technique Page Redesign + Video Source on Moments
**Technique detail pages now show meta stats, video filenames on key moments, and monospace signal chain flow blocks with arrow separators — matching the reference layout spec.**
## What Happened
This slice delivered three visual changes to the technique page plus the backend data prerequisite to power them.
**T01 (Backend):** Added `video_filename: str = ""` to the `KeyMomentSummary` Pydantic schema. Chained `.selectinload(KeyMoment.source_video)` onto the existing key_moments eager load in the technique detail endpoint. Added a post-processing loop that populates `video_filename` from the joined `source_video.filename` relation. Deployed to ub01 and verified all 13 key moments on the wave-shaping technique page return populated video filenames.
**T02 (Frontend):** Three changes to TechniquePage.tsx plus corresponding CSS:
1. **Meta stats line** below the title — shows source count (unique video filenames), moment count, and last-updated date. Computed inline via an IIFE for readability.
2. **Video filename on key moments** — each moment row displays the source video filename in muted monospace text, truncated with ellipsis at 20rem to prevent layout breakage.
3. **Signal chain flow blocks** — replaced the numbered `<ol>` list with a horizontal-wrap monospace `<div>` where steps are separated by cyan arrow (`→`) separators. Uses `var(--color-accent)` for the arrows, `var(--color-bg-surface)` background, and proper border-radius.
All CSS uses `var(--*)` tokens exclusively — zero hardcoded hex colors outside `:root`. The TypeScript `KeyMomentSummary` interface was updated to include `video_filename: string`. Build passes cleanly with zero errors.
## Verification
1. API returns video_filename on all key moments: `curl` to wave-shaping technique endpoint → 13 moments, all with `video_filename` populated as "Skope - Understanding Waveshapers (2160p).mp4".
2. Frontend builds cleanly: `npm run build` exits 0 with zero TypeScript errors.
3. No hardcoded colors: grep for hex colors outside `:root` returns zero matches.
4. Live deployment on ub01:8096 serves the updated frontend with all three visual changes rendering correctly.
## Requirements Advanced
- R006 — Technique page now renders meta stats line, video source filenames on key moments, and signal chain flow blocks with arrow separators — matching the reference layout specification
## Requirements Validated
None.
## New Requirements Surfaced
None.
## Requirements Invalidated or Re-scoped
None.
## Deviations
None.
## Known Limitations
The meta stats line computes source count client-side from video filenames — if two videos have the same filename (different paths), they'd be counted as one source. Unlikely in practice.
## Follow-ups
None.
## Files Created/Modified
- `backend/schemas.py` — Added video_filename: str = '' to KeyMomentSummary schema
- `backend/routers/techniques.py` — Chained selectinload for source_video; post-processing loop populates video_filename
- `frontend/src/api/public-client.ts` — Added video_filename: string to KeyMomentSummary TypeScript interface
- `frontend/src/pages/TechniquePage.tsx` — Added meta stats line, video filename display on moments, monospace signal chain flow blocks
- `frontend/src/App.css` — Added CSS for technique-header__stats, technique-moment__source, technique-chain__flow/arrow/step classes

View file

@ -0,0 +1,59 @@
# S03: Technique Page Redesign + Video Source on Moments — UAT
**Milestone:** M004
**Written:** 2026-03-30T06:58:43.577Z
## UAT: Technique Page Redesign + Video Source on Moments
### Preconditions
- Chrysopedia stack running on ub01 (all containers healthy)
- At least one technique page exists with key moments and signal chains (e.g., wave-shaping-synthesis-m-wave-shaper)
- Browser access to http://ub01:8096
### Test 1: API returns video_filename on key moments
1. Run: `curl -s http://ub01:8096/api/v1/techniques/wave-shaping-synthesis-m-wave-shaper | python3 -m json.tool`
2. **Expected:** Each object in `key_moments` array has a `video_filename` field
3. **Expected:** At least one moment has a non-empty `video_filename` value (e.g., "Skope - Understanding Waveshapers (2160p).mp4")
4. **Expected:** All existing fields (title, summary, timestamp_start, etc.) are still present and unchanged
### Test 2: Meta stats line renders below technique title
1. Navigate to http://ub01:8096/techniques/wave-shaping-synthesis-m-wave-shaper
2. **Expected:** Below the technique title, a stats line appears showing: "Compiled from N source(s) · M key moment(s) · Last updated [date]"
3. **Expected:** Source count reflects unique video filenames (should be 1 for this technique — all moments from same video)
4. **Expected:** Moment count matches the API response (13)
5. **Expected:** Date is a readable format, not raw ISO timestamp
### Test 3: Key moments show video filename
1. On the same technique page, scroll to the key moments section
2. **Expected:** Each key moment row displays the source video filename in smaller muted text
3. **Expected:** Long filenames are truncated with ellipsis (not wrapping or overflowing)
4. **Expected:** Moments without a video filename (if any) do not show an empty label
### Test 4: Signal chains render as monospace flow blocks
1. On the same technique page, scroll to the signal chains section
2. **Expected:** Each signal chain renders as a horizontal flow of steps separated by cyan `→` arrows
3. **Expected:** Steps use monospace-style font
4. **Expected:** The flow block has a subtle background (surface color) with padding and rounded corners
5. **Expected:** Steps wrap naturally on narrow viewports without breaking mid-step
### Test 5: No hardcoded colors in CSS
1. Run: `grep -P '#[0-9a-fA-F]{3,8}' frontend/src/App.css` and inspect results
2. **Expected:** All hex color values appear only within the `:root { }` block
3. **Expected:** Zero hex colors in any class definition outside `:root`
### Test 6: Frontend builds with zero errors
1. Run: `cd frontend && npm run build`
2. **Expected:** Build succeeds with exit code 0
3. **Expected:** No TypeScript errors in output
### Edge Cases
- **Technique with no key moments:** Meta stats should show "0 key moments" and "0 sources"
- **Technique with no signal chains:** Signal chains section should not render (or render empty gracefully)
- **Moment with empty video_filename:** Video filename label should not appear for that moment

View file

@ -0,0 +1,22 @@
{
"schemaVersion": 1,
"taskId": "T02",
"unitId": "M004/S03/T02",
"timestamp": 1774853827184,
"passed": true,
"discoverySource": "task-plan",
"checks": [
{
"command": "cd frontend",
"exitCode": 0,
"durationMs": 5,
"verdict": "pass"
},
{
"command": "echo 'BUILD OK'",
"exitCode": 0,
"durationMs": 4,
"verdict": "pass"
}
]
}

View file

@ -1,6 +1,138 @@
# S04: Article Versioning + Pipeline Tuning Metadata
**Goal:** Add article versioning schema and pipeline tuning metadata to technique pages
**Goal:** Technique pages track version history with pipeline metadata; re-running stage 5 snapshots the previous content before overwriting; API returns version list and version detail; frontend shows version count.
**Demo:** After this: Technique pages track version history with pipeline metadata; API returns version list
## Tasks
- [x] **T01: Added TechniquePageVersion model, Alembic migration 002, pipeline metadata capture, and pre-overwrite snapshot hook in stage 5 synthesis** — Create the TechniquePageVersion SQLAlchemy model, Alembic migration, and the snapshot-on-write hook in stage 5 synthesis that captures existing page content + pipeline metadata before overwriting.
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| PostgreSQL (migration) | Alembic raises — migration fails cleanly | N/A (local) | N/A |
| Stage 5 sync session | Snapshot INSERT fails → log ERROR, continue with page update (best-effort versioning) | N/A (local DB) | N/A |
| Prompt file hashing | FileNotFoundError → log WARNING, store empty hash in metadata | N/A | N/A |
## Steps
1. Add `TechniquePageVersion` model to `backend/models.py`: id (UUID PK), technique_page_id (FK to technique_pages.id, CASCADE), version_number (Integer, not null), content_snapshot (JSONB, not null), pipeline_metadata (JSONB, nullable), created_at (datetime). Add `sa_relationship` back to TechniquePage (versions list).
2. Create Alembic migration `alembic/versions/002_technique_page_versions.py` following the exact pattern of `001_initial.py` (revision ID format, imports, upgrade/downgrade). The migration creates the `technique_page_versions` table with all columns and a composite index on `(technique_page_id, version_number)`.
3. Add `_capture_pipeline_metadata()` helper function in `backend/pipeline/stages.py` that returns a dict with: model names (from settings), prompt file SHA-256 hashes (read each .txt file in prompts_path and hash), stage config (modalities). Use `hashlib.sha256` on file contents. Handle missing files gracefully (log warning, store empty hash).
4. In `stage5_synthesis`, inside the `if existing:` block, BEFORE any attribute assignment on `existing`, capture the snapshot: read existing page's content fields (title, slug, topic_category, topic_tags, summary, body_sections, signal_chains, plugins, source_quality) into a dict. Compute `version_number = session.execute(select(func.count()).where(TechniquePageVersion.technique_page_id == existing.id)).scalar() + 1`. Create a `TechniquePageVersion` row with content_snapshot=snapshot_dict, pipeline_metadata=_capture_pipeline_metadata(), and add it to the session. Log at INFO level.
5. Verify: run `alembic upgrade head` against test DB. Write a minimal manual test or rely on T02's integration tests.
## Must-Haves
- [ ] TechniquePageVersion model with correct columns and FK
- [ ] Alembic migration creates technique_page_versions table
- [ ] _capture_pipeline_metadata() hashes all prompt files and reads model config
- [ ] Stage 5 snapshots BEFORE mutating existing page attributes
- [ ] Version number is auto-incremented per technique_page_id
- [ ] Snapshot contains all content fields (title, summary, body_sections, signal_chains, topic_tags, plugins, source_quality, topic_category)
## Verification
- `cd backend && python -c "from models import TechniquePageVersion; print('Model OK')"` exits 0
- Alembic migration file exists and follows 001_initial.py pattern
- `grep -q '_capture_pipeline_metadata' backend/pipeline/stages.py` confirms helper exists
- `grep -q 'TechniquePageVersion' backend/pipeline/stages.py` confirms snapshot hook is wired
## Observability Impact
- Signals added: INFO log in stage 5 when version snapshot is created (includes version_number, page slug)
- How a future agent inspects this: `SELECT * FROM technique_page_versions ORDER BY created_at DESC LIMIT 5`
- Failure state exposed: ERROR log if snapshot INSERT fails (with exception details)
- Estimate: 45m
- Files: backend/models.py, alembic/versions/002_technique_page_versions.py, backend/pipeline/stages.py
- Verify: cd backend && python -c "from models import TechniquePageVersion; print('Model OK')" && test -f ../alembic/versions/002_technique_page_versions.py && grep -q '_capture_pipeline_metadata' pipeline/stages.py && grep -q 'TechniquePageVersion' pipeline/stages.py
- [ ] **T02: Add version API endpoints, schemas, and integration tests** — Add Pydantic schemas for version responses, API endpoints for version list and version detail, extend TechniquePageDetail with version_count, and write integration tests covering the full flow.
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| PostgreSQL query | SQLAlchemy raises → FastAPI returns 500 | N/A (local) | N/A |
| Missing technique slug | Return 404 with clear message | N/A | N/A |
| Missing version number | Return 404 with clear message | N/A | N/A |
## Negative Tests
- **Malformed inputs**: GET /techniques/nonexistent-slug/versions → 404; GET /techniques/{slug}/versions/999 → 404
- **Boundary conditions**: Technique with zero versions → empty list response; version_count = 0 on fresh page
## Steps
1. Add Pydantic schemas to `backend/schemas.py`:
- `TechniquePageVersionSummary`: version_number (int), created_at (datetime), pipeline_metadata (dict | None)
- `TechniquePageVersionDetail`: version_number (int), content_snapshot (dict), pipeline_metadata (dict | None), created_at (datetime)
- `TechniquePageVersionListResponse`: items (list[TechniquePageVersionSummary]), total (int)
- Add `version_count: int = 0` field to `TechniquePageDetail`
2. Add endpoints to `backend/routers/techniques.py`:
- `GET /{slug}/versions` — query TechniquePageVersion rows for the page, ordered by version_number DESC. Return TechniquePageVersionListResponse. 404 if slug not found.
- `GET /{slug}/versions/{version_number}` — return single TechniquePageVersionDetail. 404 if slug or version not found.
3. Modify the existing `get_technique` endpoint to include `version_count` by counting TechniquePageVersion rows for the page.
4. Write integration tests in `backend/tests/test_public_api.py`:
- Test version list returns empty for page with no versions
- Test version list returns versions after inserting a TechniquePageVersion row directly
- Test version detail returns correct content_snapshot
- Test version detail 404 for nonexistent version number
- Test technique detail includes version_count field
- Test versions endpoint 404 for nonexistent slug
5. Run all existing tests to confirm no regressions.
## Must-Haves
- [ ] TechniquePageVersionSummary, TechniquePageVersionDetail, TechniquePageVersionListResponse schemas
- [ ] version_count field on TechniquePageDetail
- [ ] GET /{slug}/versions endpoint returns version list sorted by version_number desc
- [ ] GET /{slug}/versions/{version_number} endpoint returns full content snapshot
- [ ] 404 responses for missing slug or version number
- [ ] Integration tests pass for all version endpoints
- [ ] All existing tests still pass
## Verification
- `cd backend && python -m pytest tests/test_public_api.py -v` — all tests pass including new version tests
- `cd backend && python -m pytest tests/ -v` — full test suite passes (no regressions)
- Estimate: 45m
- Files: backend/schemas.py, backend/routers/techniques.py, backend/tests/test_public_api.py
- Verify: cd backend && python -m pytest tests/test_public_api.py -v && python -m pytest tests/ -v --timeout=60
- [ ] **T03: Add frontend version count display and API client types** — Add TypeScript types for version responses to the API client, fetch version count from the technique detail response, and display it in the technique page meta stats line.
## Steps
1. Update `frontend/src/api/public-client.ts`:
- Add `version_count: number` to `TechniquePageDetail` interface
- Add `TechniquePageVersionSummary` interface: { version_number: number, created_at: string, pipeline_metadata: Record<string, unknown> | null }
- Add `TechniquePageVersionListResponse` interface: { items: TechniquePageVersionSummary[], total: number }
- Add `fetchTechniqueVersions(slug: string)` function that calls GET /techniques/{slug}/versions
2. Update `frontend/src/pages/TechniquePage.tsx`:
- The technique detail response now includes `version_count`. Display it in the meta stats line after moment count: `· N version(s)` (only show if version_count > 0).
3. Run `npm run build` to verify zero TypeScript errors.
## Must-Haves
- [ ] TechniquePageDetail TypeScript interface includes version_count: number
- [ ] fetchTechniqueVersions API function exists
- [ ] Version count displayed in technique page meta stats (when > 0)
- [ ] `npm run build` exits 0 with zero TypeScript errors
## Verification
- `cd frontend && npm run build` exits 0
- `grep -q 'version_count' frontend/src/api/public-client.ts` confirms type added
- `grep -q 'version_count' frontend/src/pages/TechniquePage.tsx` confirms display wired
- Estimate: 20m
- Files: frontend/src/api/public-client.ts, frontend/src/pages/TechniquePage.tsx
- Verify: cd frontend && npm run build && grep -q 'version_count' src/api/public-client.ts && grep -q 'version_count' src/pages/TechniquePage.tsx

View file

@ -0,0 +1,82 @@
# S04: Article Versioning + Pipeline Tuning Metadata — Research
**Date:** 2026-03-30
## Summary
This slice introduces version tracking for technique pages so the admin can compare pipeline outputs across prompt iterations — the "calibration loop" from spec §8.3. Currently, stage 5 does a slug-based upsert that silently overwrites the previous content whenever the pipeline re-runs. There is no way to compare old vs new output, correlate quality changes with prompt edits, or roll back to a previous version.
The work is a new `technique_page_versions` table (Alembic migration), a hook in stage 5 that snapshots the current page content + pipeline metadata before overwriting, API endpoints to list/retrieve versions for a technique page, and a minimal frontend version history panel on the technique detail page.
The approach is well-scoped: snapshot-on-write in stage 5, not a full event-sourcing system. Each version captures the full page content (body_sections, signal_chains, summary, etc.) plus pipeline metadata (model names, prompt file hashes, timestamps) so the admin can answer "which prompt/model combination produced this version?"
## Recommendation
**Snapshot-on-write versioning with JSONB content storage and pipeline metadata columns.**
- New `technique_page_versions` table with: `id`, `technique_page_id` (FK), `version_number` (integer, auto-incremented per page), `content_snapshot` (JSONB — full page content at that point), `pipeline_metadata` (JSONB — model names, prompt hashes, settings), `created_at`.
- Stage 5 synthesis: before overwriting an existing page, snapshot the current content + pipeline config into a version row. The "current" version is always the live `technique_pages` row; the versions table holds history.
- New API endpoints: `GET /techniques/{slug}/versions` (list with metadata summary), `GET /techniques/{slug}/versions/{version_number}` (full content snapshot).
- Minimal frontend: version history sidebar/dropdown on TechniquePage showing version list with timestamps and model info.
Why JSONB snapshots instead of duplicating all columns: the page content structure (body_sections, signal_chains) is already JSONB — storing the whole page as a single JSONB blob is simpler, forward-compatible (new columns don't require migration of the versions table), and the versions table is write-once/read-rarely.
Why pipeline metadata as JSONB: the metadata schema will evolve as we add more pipeline knobs. JSONB avoids schema churn while still being queryable via PostgreSQL JSON operators.
## Implementation Landscape
### Key Files
- `backend/models.py` — Add `TechniquePageVersion` model with FK to `technique_pages`, version_number, content_snapshot (JSONB), pipeline_metadata (JSONB), created_at
- `alembic/versions/002_technique_page_versions.py` — New migration adding the `technique_page_versions` table
- `backend/schemas.py` — Add `TechniquePageVersionRead`, `TechniquePageVersionList` schemas; extend `TechniquePageDetail` with `version_count: int` and `current_version: int`
- `backend/pipeline/stages.py` — In `stage5_synthesis`, before the existing upsert block, snapshot existing page content + pipeline metadata into a `TechniquePageVersion` row. Add a helper `_capture_pipeline_metadata()` that collects model names, prompt file hashes (SHA-256 of file content), and relevant settings.
- `backend/routers/techniques.py` — Add `GET /{slug}/versions` and `GET /{slug}/versions/{version_number}` endpoints
- `backend/tests/test_public_api.py` — Add tests for version list and version detail endpoints
- `backend/tests/test_pipeline.py` — Add test that re-running stage 5 creates a version row
- `frontend/src/pages/TechniquePage.tsx` — Add version info display (count, link to version history)
- `frontend/src/api/public-client.ts` — Add version list/detail API calls
### Build Order
1. **Model + Migration** (T01) — Add `TechniquePageVersion` to models.py and create Alembic migration. This unblocks everything else. Low risk — straightforward table creation following existing patterns.
2. **Stage 5 versioning hook + pipeline metadata capture** (T02) — Modify `stage5_synthesis` to snapshot before overwrite. Add `_capture_pipeline_metadata()` helper that hashes prompt files and reads model config. This is the riskiest task because it modifies the pipeline's write path. The sync SQLAlchemy session in stage 5 already has the existing page loaded — snapshotting it is a single INSERT before the UPDATE.
3. **API endpoints + schemas** (T03) — Add version list/detail endpoints and Pydantic schemas. Straightforward FastAPI/SQLAlchemy read-only endpoints following the existing pattern in `techniques.py`.
4. **Frontend version display** (T04) — Add version count to technique detail and link/panel for version history. Light frontend work — fetch version list, display in a collapsible section.
5. **Integration tests** (T05) — Test the full flow: stage 5 creates version on re-run, API returns versions, version detail contains correct snapshot. Can be combined with T03 if scope is tight.
### Verification Approach
1. **Migration**: `alembic upgrade head` succeeds on test DB; `technique_page_versions` table exists with correct columns.
2. **Pipeline versioning**: Run stage 5 twice for the same video → `technique_page_versions` has 1 row (snapshot of v1 before v2 overwrote it). The version row contains `content_snapshot` with the original body_sections/summary and `pipeline_metadata` with model names and prompt hashes.
3. **API**: `GET /techniques/{slug}/versions` returns version list sorted by version_number desc. `GET /techniques/{slug}/versions/1` returns the full content snapshot.
4. **Frontend**: `npm run build` passes. Technique page shows version count.
5. **Existing tests**: All existing tests in `test_pipeline.py` and `test_public_api.py` still pass (no regressions).
## Constraints
- Stage 5 uses **sync SQLAlchemy** (psycopg2) because it runs inside Celery. The version snapshot INSERT must use the same sync session — no async.
- Alembic env.py needs both local and Docker sys.path entries (KNOWLEDGE item). The new migration must follow the same pattern as `001_initial.py`.
- The existing slug-based upsert in stage 5 loads the page via `select()` then mutates it in-place. The snapshot must happen AFTER loading but BEFORE mutating, in the same transaction.
- Prompt file hashes must be computed at pipeline runtime (not stored in config) because the files can be edited between runs without restarting the worker.
## Common Pitfalls
- **Snapshot timing in stage 5** — The snapshot must capture the OLD content before the in-place mutation. If the code snapshots after `existing.title = page_data.title` etc., it captures the new content. The safest approach: read the existing row's relevant fields into a dict immediately after the `select()`, before any attribute assignment.
- **Version number race condition** — Two concurrent pipeline runs for different videos could both try to version the same technique page. Use `SELECT MAX(version_number) ... FOR UPDATE` or a simple `COUNT(*) + 1` within the transaction. Given this is a single-admin tool with sequential pipeline runs, a simple `COUNT + 1` is sufficient.
- **JSONB serialization of SQLAlchemy model fields**`body_sections` and `signal_chains` are already Python dicts/lists (loaded from JSONB columns). They serialize to JSON naturally. But `topic_tags` and `plugins` are `ARRAY(String)` which come back as Python lists — these also serialize fine. No special handling needed.
## Open Risks
- The pipeline metadata concept (prompt hashes, model names) is new and not currently tested. If prompt files are large or change frequently, the hash computation adds trivial overhead but the metadata schema may need iteration. Starting with JSONB gives us room.
- Version comparison UI (side-by-side diff of two versions) is explicitly NOT in scope for this slice — the roadmap says "API returns version list." If the admin wants visual diff, that's a future slice. But the data model should support it (full content snapshots enable client-side diffing).
## Sources
- Spec §8.3 "Prompt tuning" — defines the calibration loop that motivates versioning
- Existing stage 5 upsert pattern in `backend/pipeline/stages.py` lines 561-595
- Existing Alembic migration pattern in `alembic/versions/001_initial.py`

View file

@ -0,0 +1,68 @@
---
estimated_steps: 29
estimated_files: 3
skills_used: []
---
# T01: Add TechniquePageVersion model, migration, and stage 5 snapshot hook
Create the TechniquePageVersion SQLAlchemy model, Alembic migration, and the snapshot-on-write hook in stage 5 synthesis that captures existing page content + pipeline metadata before overwriting.
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| PostgreSQL (migration) | Alembic raises — migration fails cleanly | N/A (local) | N/A |
| Stage 5 sync session | Snapshot INSERT fails → log ERROR, continue with page update (best-effort versioning) | N/A (local DB) | N/A |
| Prompt file hashing | FileNotFoundError → log WARNING, store empty hash in metadata | N/A | N/A |
## Steps
1. Add `TechniquePageVersion` model to `backend/models.py`: id (UUID PK), technique_page_id (FK to technique_pages.id, CASCADE), version_number (Integer, not null), content_snapshot (JSONB, not null), pipeline_metadata (JSONB, nullable), created_at (datetime). Add `sa_relationship` back to TechniquePage (versions list).
2. Create Alembic migration `alembic/versions/002_technique_page_versions.py` following the exact pattern of `001_initial.py` (revision ID format, imports, upgrade/downgrade). The migration creates the `technique_page_versions` table with all columns and a composite index on `(technique_page_id, version_number)`.
3. Add `_capture_pipeline_metadata()` helper function in `backend/pipeline/stages.py` that returns a dict with: model names (from settings), prompt file SHA-256 hashes (read each .txt file in prompts_path and hash), stage config (modalities). Use `hashlib.sha256` on file contents. Handle missing files gracefully (log warning, store empty hash).
4. In `stage5_synthesis`, inside the `if existing:` block, BEFORE any attribute assignment on `existing`, capture the snapshot: read existing page's content fields (title, slug, topic_category, topic_tags, summary, body_sections, signal_chains, plugins, source_quality) into a dict. Compute `version_number = session.execute(select(func.count()).where(TechniquePageVersion.technique_page_id == existing.id)).scalar() + 1`. Create a `TechniquePageVersion` row with content_snapshot=snapshot_dict, pipeline_metadata=_capture_pipeline_metadata(), and add it to the session. Log at INFO level.
5. Verify: run `alembic upgrade head` against test DB. Write a minimal manual test or rely on T02's integration tests.
## Must-Haves
- [ ] TechniquePageVersion model with correct columns and FK
- [ ] Alembic migration creates technique_page_versions table
- [ ] _capture_pipeline_metadata() hashes all prompt files and reads model config
- [ ] Stage 5 snapshots BEFORE mutating existing page attributes
- [ ] Version number is auto-incremented per technique_page_id
- [ ] Snapshot contains all content fields (title, summary, body_sections, signal_chains, topic_tags, plugins, source_quality, topic_category)
## Verification
- `cd backend && python -c "from models import TechniquePageVersion; print('Model OK')"` exits 0
- Alembic migration file exists and follows 001_initial.py pattern
- `grep -q '_capture_pipeline_metadata' backend/pipeline/stages.py` confirms helper exists
- `grep -q 'TechniquePageVersion' backend/pipeline/stages.py` confirms snapshot hook is wired
## Observability Impact
- Signals added: INFO log in stage 5 when version snapshot is created (includes version_number, page slug)
- How a future agent inspects this: `SELECT * FROM technique_page_versions ORDER BY created_at DESC LIMIT 5`
- Failure state exposed: ERROR log if snapshot INSERT fails (with exception details)
## Inputs
- ``backend/models.py` — existing TechniquePage model to add relationship and reference for snapshot fields`
- ``backend/pipeline/stages.py` — existing stage5_synthesis function (lines 480-618) with the `if existing:` upsert block where snapshot must be inserted BEFORE attribute mutation`
- ``alembic/versions/001_initial.py` — reference migration pattern (revision ID format, imports, upgrade/downgrade structure)`
- ``backend/config.py` — Settings class with model names, prompts_path, modality settings needed by _capture_pipeline_metadata()`
## Expected Output
- ``backend/models.py` — TechniquePageVersion model added with UUID PK, FK, version_number, content_snapshot JSONB, pipeline_metadata JSONB, created_at; relationship added to TechniquePage`
- ``alembic/versions/002_technique_page_versions.py` — new Alembic migration creating technique_page_versions table with composite index`
- ``backend/pipeline/stages.py` — _capture_pipeline_metadata() helper added; stage5_synthesis modified to snapshot existing page before overwrite`
## Verification
cd backend && python -c "from models import TechniquePageVersion; print('Model OK')" && test -f ../alembic/versions/002_technique_page_versions.py && grep -q '_capture_pipeline_metadata' pipeline/stages.py && grep -q 'TechniquePageVersion' pipeline/stages.py

View file

@ -0,0 +1,83 @@
---
id: T01
parent: S04
milestone: M004
provides: []
requires: []
affects: []
key_files: ["backend/models.py", "alembic/versions/002_technique_page_versions.py", "backend/pipeline/stages.py"]
key_decisions: ["Version snapshot is best-effort — INSERT failure logs ERROR but does not block page update", "source_quality enum stored as string value in snapshot for JSON serialization", "Composite unique index on (technique_page_id, version_number) enforces ordering at DB level"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "All four verification checks pass: model import OK, migration file exists, _capture_pipeline_metadata grep found, TechniquePageVersion grep found in stages.py. Additionally verified all 6 columns present on model, all 9 content fields in snapshot dict, and correct ordering of snapshot vs mutation in source."
completed_at: 2026-03-30T07:07:13.929Z
blocker_discovered: false
---
# T01: Added TechniquePageVersion model, Alembic migration 002, pipeline metadata capture, and pre-overwrite snapshot hook in stage 5 synthesis
> Added TechniquePageVersion model, Alembic migration 002, pipeline metadata capture, and pre-overwrite snapshot hook in stage 5 synthesis
## What Happened
---
id: T01
parent: S04
milestone: M004
key_files:
- backend/models.py
- alembic/versions/002_technique_page_versions.py
- backend/pipeline/stages.py
key_decisions:
- Version snapshot is best-effort — INSERT failure logs ERROR but does not block page update
- source_quality enum stored as string value in snapshot for JSON serialization
- Composite unique index on (technique_page_id, version_number) enforces ordering at DB level
duration: ""
verification_result: passed
completed_at: 2026-03-30T07:07:13.929Z
blocker_discovered: false
---
# T01: Added TechniquePageVersion model, Alembic migration 002, pipeline metadata capture, and pre-overwrite snapshot hook in stage 5 synthesis
**Added TechniquePageVersion model, Alembic migration 002, pipeline metadata capture, and pre-overwrite snapshot hook in stage 5 synthesis**
## What Happened
Added TechniquePageVersion model to backend/models.py with UUID PK, FK to technique_pages (CASCADE), version_number, content_snapshot (JSONB), pipeline_metadata (JSONB), and created_at. Added versions relationship on TechniquePage. Created Alembic migration 002 with composite unique index on (technique_page_id, version_number). Added _capture_pipeline_metadata() helper that hashes prompt files and collects model/modality config. Modified stage5_synthesis to snapshot all 9 content fields before attribute mutations, with best-effort error handling.
## Verification
All four verification checks pass: model import OK, migration file exists, _capture_pipeline_metadata grep found, TechniquePageVersion grep found in stages.py. Additionally verified all 6 columns present on model, all 9 content fields in snapshot dict, and correct ordering of snapshot vs mutation in source.
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `cd backend && python -c "from models import TechniquePageVersion; print('Model OK')"` | 0 | ✅ pass | 1000ms |
| 2 | `test -f alembic/versions/002_technique_page_versions.py` | 0 | ✅ pass | 100ms |
| 3 | `grep -q '_capture_pipeline_metadata' backend/pipeline/stages.py` | 0 | ✅ pass | 100ms |
| 4 | `grep -q 'TechniquePageVersion' backend/pipeline/stages.py` | 0 | ✅ pass | 100ms |
## Deviations
None.
## Known Issues
None.
## Files Created/Modified
- `backend/models.py`
- `alembic/versions/002_technique_page_versions.py`
- `backend/pipeline/stages.py`
## Deviations
None.
## Known Issues
None.

View file

@ -0,0 +1,79 @@
---
estimated_steps: 39
estimated_files: 3
skills_used: []
---
# T02: Add version API endpoints, schemas, and integration tests
Add Pydantic schemas for version responses, API endpoints for version list and version detail, extend TechniquePageDetail with version_count, and write integration tests covering the full flow.
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| PostgreSQL query | SQLAlchemy raises → FastAPI returns 500 | N/A (local) | N/A |
| Missing technique slug | Return 404 with clear message | N/A | N/A |
| Missing version number | Return 404 with clear message | N/A | N/A |
## Negative Tests
- **Malformed inputs**: GET /techniques/nonexistent-slug/versions → 404; GET /techniques/{slug}/versions/999 → 404
- **Boundary conditions**: Technique with zero versions → empty list response; version_count = 0 on fresh page
## Steps
1. Add Pydantic schemas to `backend/schemas.py`:
- `TechniquePageVersionSummary`: version_number (int), created_at (datetime), pipeline_metadata (dict | None)
- `TechniquePageVersionDetail`: version_number (int), content_snapshot (dict), pipeline_metadata (dict | None), created_at (datetime)
- `TechniquePageVersionListResponse`: items (list[TechniquePageVersionSummary]), total (int)
- Add `version_count: int = 0` field to `TechniquePageDetail`
2. Add endpoints to `backend/routers/techniques.py`:
- `GET /{slug}/versions` — query TechniquePageVersion rows for the page, ordered by version_number DESC. Return TechniquePageVersionListResponse. 404 if slug not found.
- `GET /{slug}/versions/{version_number}` — return single TechniquePageVersionDetail. 404 if slug or version not found.
3. Modify the existing `get_technique` endpoint to include `version_count` by counting TechniquePageVersion rows for the page.
4. Write integration tests in `backend/tests/test_public_api.py`:
- Test version list returns empty for page with no versions
- Test version list returns versions after inserting a TechniquePageVersion row directly
- Test version detail returns correct content_snapshot
- Test version detail 404 for nonexistent version number
- Test technique detail includes version_count field
- Test versions endpoint 404 for nonexistent slug
5. Run all existing tests to confirm no regressions.
## Must-Haves
- [ ] TechniquePageVersionSummary, TechniquePageVersionDetail, TechniquePageVersionListResponse schemas
- [ ] version_count field on TechniquePageDetail
- [ ] GET /{slug}/versions endpoint returns version list sorted by version_number desc
- [ ] GET /{slug}/versions/{version_number} endpoint returns full content snapshot
- [ ] 404 responses for missing slug or version number
- [ ] Integration tests pass for all version endpoints
- [ ] All existing tests still pass
## Verification
- `cd backend && python -m pytest tests/test_public_api.py -v` — all tests pass including new version tests
- `cd backend && python -m pytest tests/ -v` — full test suite passes (no regressions)
## Inputs
- ``backend/models.py` — TechniquePageVersion model created in T01`
- ``backend/schemas.py` — existing TechniquePageDetail schema to extend with version_count`
- ``backend/routers/techniques.py` — existing technique detail endpoint to modify and add version endpoints`
- ``backend/tests/test_public_api.py` — existing test file with seed helpers and test patterns to follow`
- ``backend/tests/conftest.py` — test fixtures (db_engine, client) for async integration tests`
## Expected Output
- ``backend/schemas.py` — TechniquePageVersionSummary, TechniquePageVersionDetail, TechniquePageVersionListResponse schemas added; version_count field added to TechniquePageDetail`
- ``backend/routers/techniques.py` — GET /{slug}/versions and GET /{slug}/versions/{version_number} endpoints added; get_technique modified to include version_count`
- ``backend/tests/test_public_api.py` — integration tests for version list, version detail, version_count on technique detail, and 404 cases`
## Verification
cd backend && python -m pytest tests/test_public_api.py -v && python -m pytest tests/ -v --timeout=60

View file

@ -0,0 +1,50 @@
---
estimated_steps: 19
estimated_files: 2
skills_used: []
---
# T03: Add frontend version count display and API client types
Add TypeScript types for version responses to the API client, fetch version count from the technique detail response, and display it in the technique page meta stats line.
## Steps
1. Update `frontend/src/api/public-client.ts`:
- Add `version_count: number` to `TechniquePageDetail` interface
- Add `TechniquePageVersionSummary` interface: { version_number: number, created_at: string, pipeline_metadata: Record<string, unknown> | null }
- Add `TechniquePageVersionListResponse` interface: { items: TechniquePageVersionSummary[], total: number }
- Add `fetchTechniqueVersions(slug: string)` function that calls GET /techniques/{slug}/versions
2. Update `frontend/src/pages/TechniquePage.tsx`:
- The technique detail response now includes `version_count`. Display it in the meta stats line after moment count: `· N version(s)` (only show if version_count > 0).
3. Run `npm run build` to verify zero TypeScript errors.
## Must-Haves
- [ ] TechniquePageDetail TypeScript interface includes version_count: number
- [ ] fetchTechniqueVersions API function exists
- [ ] Version count displayed in technique page meta stats (when > 0)
- [ ] `npm run build` exits 0 with zero TypeScript errors
## Verification
- `cd frontend && npm run build` exits 0
- `grep -q 'version_count' frontend/src/api/public-client.ts` confirms type added
- `grep -q 'version_count' frontend/src/pages/TechniquePage.tsx` confirms display wired
## Inputs
- ``frontend/src/api/public-client.ts` — existing TypeScript API client with TechniquePageDetail interface to extend`
- ``frontend/src/pages/TechniquePage.tsx` — existing technique page component with meta stats line to modify`
- ``backend/schemas.py` — TechniquePageVersionSummary and TechniquePageVersionListResponse schemas from T02 (for TypeScript type mirroring)`
## Expected Output
- ``frontend/src/api/public-client.ts` — version_count added to TechniquePageDetail; version types and fetchTechniqueVersions function added`
- ``frontend/src/pages/TechniquePage.tsx` — meta stats line shows version count when > 0`
## Verification
cd frontend && npm run build && grep -q 'version_count' src/api/public-client.ts && grep -q 'version_count' src/pages/TechniquePage.tsx

View file

@ -0,0 +1,39 @@
"""technique_page_versions table for article versioning
Revision ID: 002_technique_page_versions
Revises: 001_initial
Create Date: 2026-03-30
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB, UUID
# revision identifiers, used by Alembic.
revision: str = "002_technique_page_versions"
down_revision: Union[str, None] = "001_initial"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"technique_page_versions",
sa.Column("id", UUID(as_uuid=True), primary_key=True, server_default=sa.text("gen_random_uuid()")),
sa.Column("technique_page_id", UUID(as_uuid=True), sa.ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False),
sa.Column("version_number", sa.Integer, nullable=False),
sa.Column("content_snapshot", JSONB, nullable=False),
sa.Column("pipeline_metadata", JSONB, nullable=True),
sa.Column("created_at", sa.DateTime(), nullable=False, server_default=sa.func.now()),
)
op.create_index(
"ix_technique_page_versions_page_version",
"technique_page_versions",
["technique_page_id", "version_number"],
unique=True,
)
def downgrade() -> None:
op.drop_table("technique_page_versions")

View file

@ -253,6 +253,9 @@ class TechniquePage(Base):
key_moments: Mapped[list[KeyMoment]] = sa_relationship(
back_populates="technique_page", foreign_keys=[KeyMoment.technique_page_id]
)
versions: Mapped[list[TechniquePageVersion]] = sa_relationship(
back_populates="technique_page", order_by="TechniquePageVersion.version_number"
)
outgoing_links: Mapped[list[RelatedTechniqueLink]] = sa_relationship(
foreign_keys="RelatedTechniqueLink.source_page_id", back_populates="source_page"
)
@ -288,6 +291,27 @@ class RelatedTechniqueLink(Base):
)
class TechniquePageVersion(Base):
"""Snapshot of a TechniquePage before a pipeline re-synthesis overwrites it."""
__tablename__ = "technique_page_versions"
id: Mapped[uuid.UUID] = _uuid_pk()
technique_page_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("technique_pages.id", ondelete="CASCADE"), nullable=False
)
version_number: Mapped[int] = mapped_column(Integer, nullable=False)
content_snapshot: Mapped[dict] = mapped_column(JSONB, nullable=False)
pipeline_metadata: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
created_at: Mapped[datetime] = mapped_column(
default=_now, server_default=func.now()
)
# relationships
technique_page: Mapped[TechniquePage] = sa_relationship(
back_populates="versions"
)
class Tag(Base):
__tablename__ = "tags"

View file

@ -9,6 +9,7 @@ Celery tasks are synchronous — all DB access uses ``sqlalchemy.orm.Session``.
from __future__ import annotations
import hashlib
import json
import logging
import time
@ -18,7 +19,7 @@ from pathlib import Path
import yaml
from celery import chain as celery_chain
from pydantic import ValidationError
from sqlalchemy import create_engine, select
from sqlalchemy import create_engine, func, select
from sqlalchemy.orm import Session, sessionmaker
from config import get_settings
@ -28,6 +29,7 @@ from models import (
ProcessingStatus,
SourceVideo,
TechniquePage,
TechniquePageVersion,
TranscriptSegment,
)
from pipeline.embedding_client import EmbeddingClient
@ -474,6 +476,53 @@ def _load_classification_data(video_id: str) -> list[dict]:
return json.loads(raw)
def _capture_pipeline_metadata() -> dict:
"""Capture current pipeline configuration for version metadata.
Returns a dict with model names, prompt file SHA-256 hashes, and stage
modality settings. Handles missing prompt files gracefully.
"""
settings = get_settings()
prompts_path = Path(settings.prompts_path)
# Hash each prompt template file
prompt_hashes: dict[str, str] = {}
prompt_files = [
"stage2_segmentation.txt",
"stage3_extraction.txt",
"stage4_classification.txt",
"stage5_synthesis.txt",
]
for filename in prompt_files:
filepath = prompts_path / filename
try:
content = filepath.read_bytes()
prompt_hashes[filename] = hashlib.sha256(content).hexdigest()
except FileNotFoundError:
logger.warning("Prompt file not found for metadata capture: %s", filepath)
prompt_hashes[filename] = ""
except OSError as exc:
logger.warning("Could not read prompt file %s: %s", filepath, exc)
prompt_hashes[filename] = ""
return {
"models": {
"stage2": settings.llm_stage2_model,
"stage3": settings.llm_stage3_model,
"stage4": settings.llm_stage4_model,
"stage5": settings.llm_stage5_model,
"embedding": settings.embedding_model,
},
"modalities": {
"stage2": settings.llm_stage2_modality,
"stage3": settings.llm_stage3_modality,
"stage4": settings.llm_stage4_modality,
"stage5": settings.llm_stage5_modality,
},
"prompt_hashes": prompt_hashes,
}
# ── Stage 5: Synthesis ───────────────────────────────────────────────────────
@celery_app.task(bind=True, max_retries=3, default_retry_delay=30)
@ -564,6 +613,44 @@ def stage5_synthesis(self, video_id: str) -> str:
).scalar_one_or_none()
if existing:
# Snapshot existing content before overwriting
try:
snapshot = {
"title": existing.title,
"slug": existing.slug,
"topic_category": existing.topic_category,
"topic_tags": existing.topic_tags,
"summary": existing.summary,
"body_sections": existing.body_sections,
"signal_chains": existing.signal_chains,
"plugins": existing.plugins,
"source_quality": existing.source_quality.value if existing.source_quality else None,
}
version_count = session.execute(
select(func.count()).where(
TechniquePageVersion.technique_page_id == existing.id
)
).scalar()
version_number = version_count + 1
version = TechniquePageVersion(
technique_page_id=existing.id,
version_number=version_number,
content_snapshot=snapshot,
pipeline_metadata=_capture_pipeline_metadata(),
)
session.add(version)
logger.info(
"Version snapshot v%d created for page slug=%s",
version_number, existing.slug,
)
except Exception as snap_exc:
logger.error(
"Failed to create version snapshot for page slug=%s: %s",
existing.slug, snap_exc,
)
# Best-effort versioning — continue with page update
# Update existing page
existing.title = page_data.title
existing.summary = page_data.summary