diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md index 0de68b9..72469ab 100644 --- a/.gsd/DECISIONS.md +++ b/.gsd/DECISIONS.md @@ -49,3 +49,4 @@ | D041 | M022/S05 | architecture | Highlight scorer weight distribution for 10-dimension model | Original 7 dimensions reduced proportionally, new 3 audio proxy dimensions (speech_rate_variance, pause_density, speaking_pace) allocated 0.22 total weight. Audio dims default to 0.5 (neutral) when word_timings unavailable for backward compatibility. | Audio proxy signals derived from word-level timing data provide meaningful highlight quality indicators without requiring raw audio analysis (librosa). Neutral fallback ensures existing scoring paths are unaffected. | Yes | agent | | D042 | M023/S01 | architecture | Rich text editor for creator posts | Tiptap (headless, React) with StarterKit + Link + Placeholder extensions. Store Tiptap JSON as canonical format in JSONB column, render client-side via @tiptap/html. | Headless architecture fits dark theme customization. Large ecosystem, well-maintained. JSON storage is lossless and enables future server-side rendering. No HTML sanitization needed since canonical format is structured JSON. | Yes | agent | | D043 | M023/S02 | architecture | Personality weight → system prompt modulation strategy | 3-tier intensity (<0.4 subtle reference, 0.4-0.8 adopt voice, ≥0.8 fully embody) with temperature scaling 0.3–0.5 linear on weight | Stepped intensity prevents jarring persona at low weights while allowing full creator voice at high values. Temperature stays in 0.3-0.5 range to keep responses factually grounded even at maximum personality — wider ranges risk hallucination in a knowledge-base context. | Yes | agent | +| D044 | M023/S04 | architecture | Personality weight → system prompt modulation strategy (revision) | 5-tier continuous interpolation replacing 3-tier step function. Progressive field inclusion: weight < 0.2 = no personality block; 0.2+ adds basic tone; 0.4+ adds descriptors/explanation approach; 0.6+ adds signature phrases (count scaled with weight); 0.8+ adds full vocabulary/style markers; 0.9+ adds summary paragraph. Temperature scaling unchanged (0.3 + weight * 0.2). | 3-tier step function had jarring transitions at 0.4 and 0.8 boundaries. Continuous interpolation with progressive field inclusion gives finer control — encyclopedic responses stay clean at low weights while high weights pull in the full personality profile gradually. The 0.0-0.19 dead zone ensures purely encyclopedic mode remains truly encyclopedic with zero personality artifacts. | Yes | agent | diff --git a/.gsd/milestones/M023/M023-ROADMAP.md b/.gsd/milestones/M023/M023-ROADMAP.md index b96c9e0..d837269 100644 --- a/.gsd/milestones/M023/M023-ROADMAP.md +++ b/.gsd/milestones/M023/M023-ROADMAP.md @@ -8,6 +8,6 @@ The demo MVP comes together. Chat widget wires to the intelligence layer (INT-1) |----|-------|------|---------|------|------------| | S01 | [A] Post Editor + File Sharing | high | — | ✅ | Creator writes rich text posts with file attachments (presets, sample packs). Followers see posts in feed. Files downloadable via signed URLs. | | S02 | [A] Chat Widget ↔ Chat Engine Wiring (INT-1) | high | — | ✅ | Chat widget on creator profile wired to chat engine. Personality slider adjusts response style. Citations link to sources. | -| S03 | [B] Shorts Generation Pipeline v1 | medium | — | ⬜ | Shorts pipeline extracts clips from highlight boundaries in 3 format presets (vertical, square, horizontal) | +| S03 | [B] Shorts Generation Pipeline v1 | medium | — | ✅ | Shorts pipeline extracts clips from highlight boundaries in 3 format presets (vertical, square, horizontal) | | S04 | [B] Personality Slider (Full Interpolation) | medium | — | ⬜ | Personality slider at 0.0 gives encyclopedic response. At 1.0 gives creator-voiced response with their speech patterns. | | S05 | Forgejo KB Update — Demo Build Docs | low | S01, S02, S03, S04 | ⬜ | Forgejo wiki updated with post editor, MinIO, chat integration, shorts pipeline, personality system | diff --git a/.gsd/milestones/M023/slices/S03/S03-SUMMARY.md b/.gsd/milestones/M023/slices/S03/S03-SUMMARY.md new file mode 100644 index 0000000..fdd2a86 --- /dev/null +++ b/.gsd/milestones/M023/slices/S03/S03-SUMMARY.md @@ -0,0 +1,122 @@ +--- +id: S03 +parent: M023 +milestone: M023 +provides: + - GeneratedShort model with FormatPreset/ShortStatus enums + - stage_generate_shorts Celery task + - Shorts API endpoints (generate/list/download) + - Frontend shorts UI in HighlightQueue + - ffmpeg in Docker image + /videos volume mount +requires: + [] +affects: + - S05 +key_files: + - backend/models.py + - backend/config.py + - backend/pipeline/shorts_generator.py + - backend/pipeline/stages.py + - backend/routers/shorts.py + - backend/main.py + - frontend/src/api/shorts.ts + - frontend/src/pages/HighlightQueue.tsx + - frontend/src/pages/HighlightQueue.module.css + - docker/Dockerfile.api + - docker-compose.yml + - alembic/versions/025_add_generated_shorts.py +key_decisions: + - Used explicit enum creation in migration for clean up/down lifecycle + - Lazy imports inside Celery task for shorts_generator and model types to avoid circular imports + - Per-preset independent processing with isolated error handling — one failure doesn't block others + - Show generate button only on approved highlights with no in-progress shorts (or all-failed) + - Poll every 5s only for highlights with pending/processing shorts, stop when all settle + - Download opens presigned MinIO URL in new tab +patterns_established: + - Celery task with per-item independent error handling (generate all 3 presets, each in its own try/catch) + - ffmpeg subprocess wrapper with timeout and stderr capture for diagnostics + - Frontend polling pattern: 5s interval while processing, auto-stop when all settle +observability_surfaces: + - generated_shorts table: status and error_message columns per preset per highlight + - Celery worker logs: per-preset structured log lines with highlight_id, preset, status, duration_ms, file_size or error + - API endpoints: GET /admin/shorts/{highlight_id} returns full status for all presets +drill_down_paths: + - .gsd/milestones/M023/slices/S03/tasks/T01-SUMMARY.md + - .gsd/milestones/M023/slices/S03/tasks/T02-SUMMARY.md + - .gsd/milestones/M023/slices/S03/tasks/T03-SUMMARY.md +duration: "" +verification_result: passed +completed_at: 2026-04-04T09:54:34.807Z +blocker_discovered: false +--- + +# S03: [B] Shorts Generation Pipeline v1 + +**Shorts pipeline extracts video clips from approved highlights in 3 format presets (vertical, square, horizontal), stores in MinIO, and exposes generate/list/download through API and HighlightQueue UI.** + +## What Happened + +Three tasks delivered the full shorts generation pipeline end-to-end. + +T01 laid the infrastructure: GeneratedShort model with FormatPreset (vertical/square/horizontal) and ShortStatus (pending/processing/complete/failed) enums, Alembic migration 025, video_source_path config setting, ffmpeg in the Docker image, and /videos volume mount on both API and worker services. + +T02 built the generation engine: shorts_generator.py with PRESETS dict defining ffmpeg video filter chains for each format (vertical 1080×1920, square 1080×1080, horizontal 1920×1080), extract_clip() with 300s subprocess timeout, and resolve_video_path() with file existence validation. The stage_generate_shorts Celery task loads an approved HighlightCandidate, resolves the source video file, and processes each preset independently — creating GeneratedShort rows, extracting clips to /tmp, uploading to MinIO under shorts/{highlight_id}/{preset}.mp4, and updating status. Each preset failure is isolated so one bad encode doesn't block others. Temp files are cleaned in finally blocks. + +T03 wired the API and frontend: three endpoints on /api/v1/admin/shorts (POST generate trigger with 202 response, GET list per highlight, GET download returning presigned MinIO URL). Frontend API client with TypeScript types. HighlightQueue.tsx updated with generate button (visible only on approved highlights with no in-progress shorts), per-preset status badges (color-coded pending/processing/complete/failed with pulsing animation for processing), download links opening presigned URLs in new tabs, and 5s polling while any shorts are processing. + +## Verification + +All slice-level verification checks pass: +- Model imports (GeneratedShort, FormatPreset, ShortStatus): exit 0 +- Generator module imports (extract_clip, PRESETS, resolve_video_path): exit 0 +- Celery task imports (stage_generate_shorts): exit 0 +- Router imports (routers.shorts.router): exit 0 +- Router registered in main.py: confirmed via grep +- ffmpeg in Dockerfile.api: confirmed via grep +- video_source_path in config.py: confirmed via grep +- chrysopedia_videos volume mount in docker-compose.yml: confirmed via grep +- TypeScript compilation (npx tsc --noEmit): exit 0 +- Frontend production build (npm run build): exit 0 + +## Requirements Advanced + +None. + +## Requirements Validated + +None. + +## New Requirements Surfaced + +None. + +## Requirements Invalidated or Re-scoped + +None. + +## Deviations + +None. + +## Known Limitations + +Video source files must exist at the configured video_source_path (/videos mount). No retry mechanism on ffmpeg failures — preset is marked failed and requires manual re-trigger. Single Celery worker concurrency means generation jobs queue sequentially. + +## Follow-ups + +None. + +## Files Created/Modified + +- `backend/models.py` — Added FormatPreset, ShortStatus enums and GeneratedShort model with FK to highlight_candidates +- `backend/config.py` — Added video_source_path setting (default /videos) +- `docker/Dockerfile.api` — Added ffmpeg to apt-get install +- `docker-compose.yml` — Added /vmPool/r/services/chrysopedia_videos:/videos:ro volume mount to API and worker services +- `alembic/versions/025_add_generated_shorts.py` — Migration creating generated_shorts table with formatpreset and shortstatus enums +- `backend/pipeline/shorts_generator.py` — New ffmpeg wrapper: PRESETS dict, extract_clip(), resolve_video_path() +- `backend/pipeline/stages.py` — Added stage_generate_shorts Celery task with per-preset processing +- `backend/routers/shorts.py` — New router: POST generate, GET list, GET download endpoints +- `backend/main.py` — Registered shorts router +- `frontend/src/api/shorts.ts` — New API client with typed generateShorts, fetchShorts, getShortDownloadUrl +- `frontend/src/pages/HighlightQueue.tsx` — Added generate button, per-preset status badges, download links, 5s polling +- `frontend/src/pages/HighlightQueue.module.css` — Styles for shorts UI: badges, buttons, pulsing processing animation diff --git a/.gsd/milestones/M023/slices/S03/S03-UAT.md b/.gsd/milestones/M023/slices/S03/S03-UAT.md new file mode 100644 index 0000000..0f7461a --- /dev/null +++ b/.gsd/milestones/M023/slices/S03/S03-UAT.md @@ -0,0 +1,84 @@ +# S03: [B] Shorts Generation Pipeline v1 — UAT + +**Milestone:** M023 +**Written:** 2026-04-04T09:54:34.807Z + +# S03 UAT: Shorts Generation Pipeline v1 + +## Preconditions +- Chrysopedia stack running on ub01 (docker compose up -d) +- Migration 025 applied (docker exec chrysopedia-api alembic upgrade head) +- At least one approved highlight candidate exists in the review queue +- Video source files present at /vmPool/r/services/chrysopedia_videos/ +- MinIO running and accessible + +## Test Cases + +### TC-01: Model and Infrastructure Verification +1. SSH to ub01, exec into API container +2. Run: `python -c "from models import GeneratedShort, FormatPreset, ShortStatus; print(FormatPreset.vertical.value, FormatPreset.square.value, FormatPreset.horizontal.value)"` + - **Expected:** Prints "vertical square horizontal" +3. Run: `which ffmpeg` + - **Expected:** Returns path (e.g., /usr/bin/ffmpeg) +4. Check volume mount: `ls /videos/` + - **Expected:** Lists video files from the host mount + +### TC-02: Generate Shorts — Happy Path +1. Navigate to http://ub01:8096, log in as admin +2. Open Highlight Queue from admin menu +3. Find an approved highlight candidate +4. **Expected:** "Generate Shorts" button is visible +5. Click "Generate Shorts" +6. **Expected:** Button disappears, three preset badges appear (vertical, square, horizontal) showing "pending" or "processing" state +7. Wait for processing (badges should pulse during processing) +8. **Expected:** All three badges transition to "complete" with green color +9. **Expected:** Download links appear next to each completed preset + +### TC-03: Download Generated Short +1. After TC-02 completes, click a download link for any completed preset +2. **Expected:** New tab opens with the video file (presigned MinIO URL) +3. Video should play and match the expected format (e.g., vertical = portrait orientation) + +### TC-04: Generate Button Visibility Rules +1. Find a highlight that is NOT approved (pending/rejected) +2. **Expected:** No "Generate Shorts" button visible +3. Find an approved highlight that already has completed shorts +4. **Expected:** No "Generate Shorts" button (shorts already exist); status badges and download links shown instead + +### TC-05: Re-generate After All Failed +1. If a highlight has all three presets in "failed" state (e.g., due to missing video file) +2. **Expected:** "Generate Shorts" button reappears, allowing retry + +### TC-06: Missing Video File +1. Trigger generation for a highlight whose source video file doesn't exist at the /videos mount path +2. **Expected:** All three presets show "failed" status with error messages +3. Check API: `GET /api/v1/admin/shorts/{highlight_id}` +4. **Expected:** Response includes error_message for each preset indicating file not found + +### TC-07: API Endpoint Validation +1. `POST /api/v1/admin/shorts/generate/{highlight_id}` with an approved highlight + - **Expected:** 202 Accepted with status message +2. `POST /api/v1/admin/shorts/generate/{highlight_id}` with a non-approved highlight + - **Expected:** 400 or 404 error +3. `GET /api/v1/admin/shorts/{highlight_id}` + - **Expected:** JSON array of GeneratedShort objects with all fields (id, format_preset, status, etc.) +4. `GET /api/v1/admin/shorts/download/{short_id}` for a completed short + - **Expected:** Presigned URL returned +5. `GET /api/v1/admin/shorts/download/{short_id}` for a non-complete short + - **Expected:** Error response + +### TC-08: Polling Behavior +1. Trigger shorts generation, observe network tab in browser devtools +2. **Expected:** Polling requests every ~5 seconds to GET /admin/shorts/{highlight_id} +3. Once all presets reach terminal state (complete/failed), polling should stop +4. **Expected:** No further network requests after all presets settle + +### TC-09: Partial Failure Independence +1. Set up a scenario where one preset's ffmpeg command would fail (e.g., corrupt source at specific timestamp) +2. **Expected:** Failed preset shows "failed" badge with error, other presets still process to completion +3. **Expected:** Completed presets have download links; failed preset shows error message + +### Edge Cases +- Highlight with 0-second duration: generation should either fail gracefully or produce minimal clip +- Concurrent generation attempts on same highlight: second request should be rejected (no duplicate processing) +- Very long highlight (>5 min): ffmpeg should still complete within 300s timeout for most presets diff --git a/.gsd/milestones/M023/slices/S03/tasks/T03-VERIFY.json b/.gsd/milestones/M023/slices/S03/tasks/T03-VERIFY.json new file mode 100644 index 0000000..adf6f5a --- /dev/null +++ b/.gsd/milestones/M023/slices/S03/tasks/T03-VERIFY.json @@ -0,0 +1,36 @@ +{ + "schemaVersion": 1, + "taskId": "T03", + "unitId": "M023/S03/T03", + "timestamp": 1775296321161, + "passed": false, + "discoverySource": "task-plan", + "checks": [ + { + "command": "grep -q 'shorts' backend/main.py", + "exitCode": 0, + "durationMs": 7, + "verdict": "pass" + }, + { + "command": "cd backend", + "exitCode": 0, + "durationMs": 6, + "verdict": "pass" + }, + { + "command": "cd ../frontend", + "exitCode": 2, + "durationMs": 4, + "verdict": "fail" + }, + { + "command": "npx tsc --noEmit", + "exitCode": 1, + "durationMs": 788, + "verdict": "fail" + } + ], + "retryAttempt": 1, + "maxRetries": 2 +} diff --git a/.gsd/milestones/M023/slices/S04/S04-PLAN.md b/.gsd/milestones/M023/slices/S04/S04-PLAN.md index 87061c8..f552d75 100644 --- a/.gsd/milestones/M023/slices/S04/S04-PLAN.md +++ b/.gsd/milestones/M023/slices/S04/S04-PLAN.md @@ -1,6 +1,54 @@ # S04: [B] Personality Slider (Full Interpolation) -**Goal:** Build full personality interpolation from extracted profiles into chat system prompts +**Goal:** Personality slider produces continuous interpolation from encyclopedic (0.0) to fully creator-voiced (1.0), with progressive profile field inclusion and enhanced slider UX feedback. **Demo:** After this: Personality slider at 0.0 gives encyclopedic response. At 1.0 gives creator-voiced response with their speech patterns. ## Tasks +- [x] **T01: Replaced 3-tier step function with 5-tier continuous interpolation in _build_personality_block(), adding progressive field inclusion and scaled phrase counts** — Replace the 3-tier step function in `_build_personality_block()` with continuous interpolation. The new function progressively includes more personality profile fields as weight increases: + +- weight < 0.2: return empty string (no personality block) +- weight 0.2–0.39: basic tone — teaching_style, formality, energy + subtle instruction +- weight 0.4–0.59: + descriptors, explanation_approach + adopt-voice instruction +- weight 0.6–0.79: + signature_phrases (count scaled: `max(2, round(weight * len(phrases)))`) + creator-voice instruction +- weight 0.8–0.89: + distinctive_terms, sound_descriptions, sound_words, self_references, pacing + fully-embody instruction +- weight >= 0.9: + full summary paragraph + +Instruction text per tier: +- 0.2–0.39: "When relevant, subtly reference {name}'s communication style" +- 0.4–0.59: "Adopt {name}'s tone and communication style" +- 0.6–0.79: "Respond as {name} would, using their voice and manner" +- 0.8+: "Fully embody {name} — use their exact phrases, energy, and teaching approach" + +Keep existing temperature scaling (already linear, no changes needed). Keep uses_analogies and audience_engagement at weight >= 0.4. + +Update existing test `test_personality_prompt_injected_when_weight_and_profile` (uses weight=0.7, so now in the 0.6–0.79 tier — assert signature phrases present, but NOT distinctive_terms/sound_descriptions). Add new parametrized test covering weights 0.1, 0.3, 0.5, 0.7, 0.9 asserting correct progressive field presence/absence at each tier. Add test that weight=0.0 and weight=0.15 produce no personality block. + - Estimate: 45m + - Files: backend/chat_service.py, backend/tests/test_chat.py + - Verify: cd backend && python -m pytest tests/test_chat.py -v -k personality +- [ ] **T02: Enhance slider UX with dynamic tier label, value indicator, and gradient track** — Enhance the personality slider in ChatWidget.tsx with three visual feedback features: + +1. **Dynamic tier label**: Below the slider, show a centered label that changes based on current weight value: + - 0.0–0.19: "Encyclopedic" + - 0.2–0.39: "Subtle Reference" + - 0.4–0.59: "Creator Tone" + - 0.6–0.79: "Creator Voice" + - 0.8–1.0: "Full Embodiment" + +2. **Value indicator**: Show the numeric value (e.g., "0.7") next to or integrated with the tier label. Small, secondary text style. + +3. **Gradient track fill**: Use an inline style to set a CSS custom property `--slider-fill` based on the current value (percentage). In CSS, use `background: linear-gradient(to right, var(--color-accent) var(--slider-fill), var(--color-border) var(--slider-fill))` on the slider track. + +Implementation in ChatWidget.tsx: +- Add a `getTierLabel(weight: number): string` helper function +- Add `style={{ '--slider-fill': `${personalityWeight * 100}%` } as React.CSSProperties}` to the slider input +- Add a new div below the slider row with the tier label + value + +CSS changes in ChatWidget.module.css: +- `.slider` background becomes `linear-gradient(to right, var(--color-accent) var(--slider-fill, 0%), var(--color-border) var(--slider-fill, 0%))` +- Add `.tierLabel` class: centered, small font, color transitions +- Add `.tierValue` class: numeric value, even smaller, secondary color + +Keep existing 'Encyclopedic' and 'Creator Voice' endpoint labels as-is — they frame the slider range. + - Estimate: 30m + - Files: frontend/src/components/ChatWidget.tsx, frontend/src/components/ChatWidget.module.css + - Verify: cd frontend && npm run build diff --git a/.gsd/milestones/M023/slices/S04/S04-RESEARCH.md b/.gsd/milestones/M023/slices/S04/S04-RESEARCH.md new file mode 100644 index 0000000..3e80ddb --- /dev/null +++ b/.gsd/milestones/M023/slices/S04/S04-RESEARCH.md @@ -0,0 +1,94 @@ +# S04 Research: Personality Slider (Full Interpolation) + +## Summary + +This is **light research** — the infrastructure is fully built. S02 delivered the complete personality pipeline: API field, service-layer modulation, prompt injection, temperature scaling, frontend slider UI, and 22 tests. S04's job is to refine the 3-tier step function into continuous interpolation and enhance the slider UI feedback. + +The S02 summary explicitly states: "S04 will refine the prompt modulation with continuous interpolation rather than the current 3-tier step function." + +## Recommendation + +Three clean tasks: +1. **Backend**: Replace `_build_personality_block()` step function with continuous interpolation — progressively include more profile fields as weight increases, use linear blending for instruction intensity. +2. **Frontend slider UX**: Add visual feedback (current weight value indicator, gradient track fill, tier label that changes dynamically). +3. **Test updates**: Update existing personality tests and add new ones for continuous interpolation behavior at several weight points. + +## Implementation Landscape + +### What Exists (built in S02) + +**Backend** (`backend/chat_service.py`): +- `ChatRequest.personality_weight`: float 0.0-1.0, Pydantic validated +- `ChatService._inject_personality()`: queries `Creator.personality_profile` JSONB, calls `_build_personality_block()` +- `_build_personality_block()`: **current 3-tier step function** at lines ~200-240 + - `weight >= 0.8`: "Fully embody {name}'s voice and style" + - `weight >= 0.4`: "Respond in {name}'s voice" + - `weight < 0.4`: "Subtly adopt {name}'s communication style" + - Always includes: teaching_style, descriptors (max 5), phrases (max 6), formality+energy, analogies, audience_engagement +- Temperature scaling: `0.3 + (weight * 0.2)` — already linear, good as-is + +**Profile fields available** (from `personality_extraction.txt` prompt schema): +- Currently used: `vocabulary.signature_phrases`, `tone.descriptors`, `tone.teaching_style`, `tone.energy`, `tone.formality`, `style_markers.uses_analogies`, `style_markers.audience_engagement` +- **Unused but available for higher weights**: `vocabulary.distinctive_terms`, `vocabulary.sound_descriptions`, `vocabulary.jargon_level`, `style_markers.explanation_approach`, `style_markers.sound_words`, `style_markers.pacing`, `style_markers.self_references`, `summary` (full personality summary paragraph) + +**Frontend** (`frontend/src/components/ChatWidget.tsx`): +- `useState(0)` for personalityWeight +- Range input: min=0, max=1, step=0.1 +- Labels: "Encyclopedic" (left), "Creator Voice" (right) +- CSS: `ChatWidget.module.css` has `.sliderRow`, `.sliderLabel`, `.slider` with custom webkit/moz thumb styling in cyan accent + +**Frontend** (`frontend/src/pages/ChatPage.tsx`): +- **No personality slider** — this is the global chat page (no creator context), so no slider makes sense. No work needed here. + +**Tests** (`backend/tests/test_chat.py`): +- 22 tests total, 9 personality-specific +- Tests cover: weight forwarding, prompt injection content check, null/missing creator fallback, weight=0 skips query, temperature scaling at 0.0 and 1.0, validation boundaries (>1.0, <0.0, string) + +### Continuous Interpolation Design + +Replace the 3-tier if/elif/else with a continuous approach: + +**Intensity instruction** (the opening sentence): +- Use `weight` to linearly interpolate between instruction strengths +- 0.0-0.2: no personality block at all (purely encyclopedic) +- 0.2-0.4: "When relevant, subtly reference {name}'s communication style" +- 0.4-0.6: "Adopt {name}'s tone and communication style" +- 0.6-0.8: "Respond as {name} would, using their voice and manner" +- 0.8-1.0: "Fully embody {name} — use their exact phrases, energy, and teaching approach" + +**Progressive field inclusion** (more profile data at higher weights): +- weight ≥ 0.2: teaching_style + formality + energy (basic tone info) +- weight ≥ 0.4: + descriptors + explanation_approach (how they teach) +- weight ≥ 0.6: + signature_phrases (limited count scaling: `min(round(weight * 8), len(phrases))`) +- weight ≥ 0.8: + distinctive_terms, sound_descriptions, sound_words, self_references, pacing +- weight ≥ 0.9: + full summary paragraph as context + +**Phrase count scaling**: Instead of a fixed max of 6, scale with weight: `max(2, round(weight * len(phrases)))`. + +### Slider UX Enhancement + +Current slider is functional but gives no feedback about what each position means. Enhancement: +- **Dynamic label** below the slider showing the current tier ("Encyclopedic", "Subtle Reference", "Creator Tone", "Creator Voice", "Full Embodiment") +- **Track fill gradient** — CSS `background: linear-gradient()` using `--slider-fill` variable set via inline style based on value +- **Value indicator** — small bubble or text showing current numeric value (0.0-1.0) or percentage + +### File Change Map + +| File | Change | Risk | +|------|--------|------| +| `backend/chat_service.py` | Rewrite `_build_personality_block()` for continuous interpolation | Low — pure function, well-tested | +| `backend/tests/test_chat.py` | Update `test_personality_prompt_injected_when_weight_and_profile` assertions, add tests for intermediate weights (0.2, 0.5, 0.9) checking progressive field inclusion | Low | +| `frontend/src/components/ChatWidget.tsx` | Add dynamic label, enhance slider visual feedback | Low | +| `frontend/src/components/ChatWidget.module.css` | Slider track fill gradient, value indicator styling | Low | + +### Verification + +- `cd backend && python -m pytest tests/test_chat.py -v` — all tests pass including new interpolation tests +- `cd frontend && npm run build` — 0 errors +- Manual: at weight 0.2, system prompt should have minimal personality cues; at weight 1.0 should have full profile dump including summary + +### Natural Task Boundaries + +1. **T01: Backend continuous interpolation** — rewrite `_build_personality_block()`, update and add tests +2. **T02: Frontend slider UX** — dynamic label, gradient track, visual feedback +3. Both tasks are independent and could run in parallel. diff --git a/.gsd/milestones/M023/slices/S04/tasks/T01-PLAN.md b/.gsd/milestones/M023/slices/S04/tasks/T01-PLAN.md new file mode 100644 index 0000000..fb556e4 --- /dev/null +++ b/.gsd/milestones/M023/slices/S04/tasks/T01-PLAN.md @@ -0,0 +1,40 @@ +--- +estimated_steps: 14 +estimated_files: 2 +skills_used: [] +--- + +# T01: Rewrite _build_personality_block() for continuous interpolation + update tests + +Replace the 3-tier step function in `_build_personality_block()` with continuous interpolation. The new function progressively includes more personality profile fields as weight increases: + +- weight < 0.2: return empty string (no personality block) +- weight 0.2–0.39: basic tone — teaching_style, formality, energy + subtle instruction +- weight 0.4–0.59: + descriptors, explanation_approach + adopt-voice instruction +- weight 0.6–0.79: + signature_phrases (count scaled: `max(2, round(weight * len(phrases)))`) + creator-voice instruction +- weight 0.8–0.89: + distinctive_terms, sound_descriptions, sound_words, self_references, pacing + fully-embody instruction +- weight >= 0.9: + full summary paragraph + +Instruction text per tier: +- 0.2–0.39: "When relevant, subtly reference {name}'s communication style" +- 0.4–0.59: "Adopt {name}'s tone and communication style" +- 0.6–0.79: "Respond as {name} would, using their voice and manner" +- 0.8+: "Fully embody {name} — use their exact phrases, energy, and teaching approach" + +Keep existing temperature scaling (already linear, no changes needed). Keep uses_analogies and audience_engagement at weight >= 0.4. + +Update existing test `test_personality_prompt_injected_when_weight_and_profile` (uses weight=0.7, so now in the 0.6–0.79 tier — assert signature phrases present, but NOT distinctive_terms/sound_descriptions). Add new parametrized test covering weights 0.1, 0.3, 0.5, 0.7, 0.9 asserting correct progressive field presence/absence at each tier. Add test that weight=0.0 and weight=0.15 produce no personality block. + +## Inputs + +- ``backend/chat_service.py` — existing `_build_personality_block()` at line 292` +- ``backend/tests/test_chat.py` — existing personality tests starting at line 617, `_FAKE_PERSONALITY_PROFILE` fixture` + +## Expected Output + +- ``backend/chat_service.py` — rewritten `_build_personality_block()` with 5-tier continuous interpolation` +- ``backend/tests/test_chat.py` — updated and new personality interpolation tests` + +## Verification + +cd backend && python -m pytest tests/test_chat.py -v -k personality diff --git a/.gsd/milestones/M023/slices/S04/tasks/T01-SUMMARY.md b/.gsd/milestones/M023/slices/S04/tasks/T01-SUMMARY.md new file mode 100644 index 0000000..14e316b --- /dev/null +++ b/.gsd/milestones/M023/slices/S04/tasks/T01-SUMMARY.md @@ -0,0 +1,77 @@ +--- +id: T01 +parent: S04 +milestone: M023 +provides: [] +requires: [] +affects: [] +key_files: ["backend/chat_service.py", "backend/tests/test_chat.py"] +key_decisions: ["5-tier boundaries at 0.2/0.4/0.6/0.8/0.9 with distinct instruction text per tier", "Phrase count scaling: max(2, round(weight * len(phrases)))"] +patterns_established: [] +drill_down_paths: [] +observability_surfaces: [] +duration: "" +verification_result: "Ran `cd backend && python -m pytest tests/test_chat.py -v -k personality` — all 11 personality tests pass (9 existing + 2 new)." +completed_at: 2026-04-04T10:04:37.100Z +blocker_discovered: false +--- + +# T01: Replaced 3-tier step function with 5-tier continuous interpolation in _build_personality_block(), adding progressive field inclusion and scaled phrase counts + +> Replaced 3-tier step function with 5-tier continuous interpolation in _build_personality_block(), adding progressive field inclusion and scaled phrase counts + +## What Happened +--- +id: T01 +parent: S04 +milestone: M023 +key_files: + - backend/chat_service.py + - backend/tests/test_chat.py +key_decisions: + - 5-tier boundaries at 0.2/0.4/0.6/0.8/0.9 with distinct instruction text per tier + - Phrase count scaling: max(2, round(weight * len(phrases))) +duration: "" +verification_result: passed +completed_at: 2026-04-04T10:04:37.101Z +blocker_discovered: false +--- + +# T01: Replaced 3-tier step function with 5-tier continuous interpolation in _build_personality_block(), adding progressive field inclusion and scaled phrase counts + +**Replaced 3-tier step function with 5-tier continuous interpolation in _build_personality_block(), adding progressive field inclusion and scaled phrase counts** + +## What Happened + +Rewrote _build_personality_block() from a coarse 3-tier step function to 5-tier continuous interpolation. Weight < 0.2 returns empty string. Tiers progressively add profile fields: basic tone → descriptors/explanation_approach → signature phrases (count scaled by weight) → distinctive_terms/sound_descriptions/sound_words/self_references/pacing → full summary paragraph. Instruction text escalates per tier. Updated existing test for weight=0.7 tier and added 2 new test functions covering all tiers and phrase count scaling. + +## Verification + +Ran `cd backend && python -m pytest tests/test_chat.py -v -k personality` — all 11 personality tests pass (9 existing + 2 new). + +## Verification Evidence + +| # | Command | Exit Code | Verdict | Duration | +|---|---------|-----------|---------|----------| +| 1 | `cd backend && python -m pytest tests/test_chat.py -v -k personality` | 0 | ✅ pass | 5800ms | + + +## Deviations + +Moved uses_analogies and audience_engagement from unconditional to weight >= 0.4 gate, and gated descriptors at 0.4+, matching the task plan's tier design. + +## Known Issues + +None. + +## Files Created/Modified + +- `backend/chat_service.py` +- `backend/tests/test_chat.py` + + +## Deviations +Moved uses_analogies and audience_engagement from unconditional to weight >= 0.4 gate, and gated descriptors at 0.4+, matching the task plan's tier design. + +## Known Issues +None. diff --git a/.gsd/milestones/M023/slices/S04/tasks/T02-PLAN.md b/.gsd/milestones/M023/slices/S04/tasks/T02-PLAN.md new file mode 100644 index 0000000..84a7b11 --- /dev/null +++ b/.gsd/milestones/M023/slices/S04/tasks/T02-PLAN.md @@ -0,0 +1,46 @@ +--- +estimated_steps: 18 +estimated_files: 2 +skills_used: [] +--- + +# T02: Enhance slider UX with dynamic tier label, value indicator, and gradient track + +Enhance the personality slider in ChatWidget.tsx with three visual feedback features: + +1. **Dynamic tier label**: Below the slider, show a centered label that changes based on current weight value: + - 0.0–0.19: "Encyclopedic" + - 0.2–0.39: "Subtle Reference" + - 0.4–0.59: "Creator Tone" + - 0.6–0.79: "Creator Voice" + - 0.8–1.0: "Full Embodiment" + +2. **Value indicator**: Show the numeric value (e.g., "0.7") next to or integrated with the tier label. Small, secondary text style. + +3. **Gradient track fill**: Use an inline style to set a CSS custom property `--slider-fill` based on the current value (percentage). In CSS, use `background: linear-gradient(to right, var(--color-accent) var(--slider-fill), var(--color-border) var(--slider-fill))` on the slider track. + +Implementation in ChatWidget.tsx: +- Add a `getTierLabel(weight: number): string` helper function +- Add `style={{ '--slider-fill': `${personalityWeight * 100}%` } as React.CSSProperties}` to the slider input +- Add a new div below the slider row with the tier label + value + +CSS changes in ChatWidget.module.css: +- `.slider` background becomes `linear-gradient(to right, var(--color-accent) var(--slider-fill, 0%), var(--color-border) var(--slider-fill, 0%))` +- Add `.tierLabel` class: centered, small font, color transitions +- Add `.tierValue` class: numeric value, even smaller, secondary color + +Keep existing 'Encyclopedic' and 'Creator Voice' endpoint labels as-is — they frame the slider range. + +## Inputs + +- ``frontend/src/components/ChatWidget.tsx` — existing slider at line 277–291` +- ``frontend/src/components/ChatWidget.module.css` — existing slider styles at line 70–135` + +## Expected Output + +- ``frontend/src/components/ChatWidget.tsx` — enhanced slider with tier label, value indicator, gradient track` +- ``frontend/src/components/ChatWidget.module.css` — new tierLabel/tierValue classes, gradient track background` + +## Verification + +cd frontend && npm run build diff --git a/backend/chat_service.py b/backend/chat_service.py index 13dbea9..8e8d144 100644 --- a/backend/chat_service.py +++ b/backend/chat_service.py @@ -292,42 +292,91 @@ def _build_context_block(items: list[dict[str, Any]]) -> str: def _build_personality_block(creator_name: str, profile: dict[str, Any], weight: float) -> str: """Build a personality voice injection block from a creator's personality_profile JSONB. - The ``weight`` (0.0–1.0) determines how strongly the personality should - come through. At low weights the instruction is softer ("subtly adopt"); - at high weights it is emphatic ("fully embody"). + The ``weight`` (0.0–1.0) controls progressive inclusion of personality + fields via 5 tiers of continuous interpolation: + + - < 0.2: no personality block (empty string) + - 0.2–0.39: basic tone — teaching_style, formality, energy + subtle hint + - 0.4–0.59: + descriptors, explanation_approach + adopt-voice instruction + - 0.6–0.79: + signature_phrases (count scaled by weight) + creator-voice + - 0.8–0.89: + distinctive_terms, sound_descriptions, sound_words, + self_references, pacing + fully-embody instruction + - >= 0.9: + full summary paragraph """ + if weight < 0.2: + return "" + vocab = profile.get("vocabulary", {}) tone = profile.get("tone", {}) style = profile.get("style_markers", {}) - phrases = vocab.get("signature_phrases", []) - descriptors = tone.get("descriptors", []) teaching_style = tone.get("teaching_style", "") energy = tone.get("energy", "moderate") formality = tone.get("formality", "conversational") + descriptors = tone.get("descriptors", []) + phrases = vocab.get("signature_phrases", []) parts: list[str] = [] - # Intensity qualifier - if weight >= 0.8: - parts.append(f"Fully embody {creator_name}'s voice and style.") - elif weight >= 0.4: - parts.append(f"Respond in {creator_name}'s voice.") + # --- Tier 1 (0.2–0.39): basic tone --- + if weight < 0.4: + parts.append( + f"When relevant, subtly reference {creator_name}'s communication style." + ) + elif weight < 0.6: + parts.append(f"Adopt {creator_name}'s tone and communication style.") + elif weight < 0.8: + parts.append( + f"Respond as {creator_name} would, using their voice and manner." + ) else: - parts.append(f"Subtly adopt {creator_name}'s communication style.") + parts.append( + f"Fully embody {creator_name} — use their exact phrases, energy, and teaching approach." + ) if teaching_style: parts.append(f"Teaching style: {teaching_style}.") - if descriptors: - parts.append(f"Tone: {', '.join(descriptors[:5])}.") - if phrases: - parts.append(f"Use their signature phrases: {', '.join(phrases[:6])}.") parts.append(f"Match their {formality} {energy} tone.") - # Style markers - if style.get("uses_analogies"): - parts.append("Use analogies when helpful.") - if style.get("audience_engagement"): - parts.append(f"Audience engagement: {style['audience_engagement']}.") + # --- Tier 2 (0.4+): descriptors, explanation_approach, uses_analogies, audience_engagement --- + if weight >= 0.4: + if descriptors: + parts.append(f"Tone: {', '.join(descriptors[:5])}.") + explanation = style.get("explanation_approach", "") + if explanation: + parts.append(f"Explanation approach: {explanation}.") + if style.get("uses_analogies"): + parts.append("Use analogies when helpful.") + if style.get("audience_engagement"): + parts.append(f"Audience engagement: {style['audience_engagement']}.") + + # --- Tier 3 (0.6+): signature phrases (count scaled by weight) --- + if weight >= 0.6 and phrases: + count = max(2, round(weight * len(phrases))) + parts.append(f"Use their signature phrases: {', '.join(phrases[:count])}.") + + # --- Tier 4 (0.8+): distinctive_terms, sound_descriptions, sound_words, self_references, pacing --- + if weight >= 0.8: + distinctive = vocab.get("distinctive_terms", []) + if distinctive: + parts.append(f"Distinctive terms: {', '.join(distinctive)}.") + sound_desc = vocab.get("sound_descriptions", []) + if sound_desc: + parts.append(f"Sound descriptions: {', '.join(sound_desc)}.") + sound_words = style.get("sound_words", []) + if sound_words: + parts.append(f"Sound words: {', '.join(sound_words)}.") + self_refs = style.get("self_references", "") + if self_refs: + parts.append(f"Self-references: {self_refs}.") + pacing = style.get("pacing", "") + if pacing: + parts.append(f"Pacing: {pacing}.") + + # --- Tier 5 (0.9+): full summary paragraph --- + if weight >= 0.9: + summary = profile.get("summary", "") + if summary: + parts.append(summary) return " ".join(parts) diff --git a/backend/tests/test_chat.py b/backend/tests/test_chat.py index 6564655..16aeb0b 100644 --- a/backend/tests/test_chat.py +++ b/backend/tests/test_chat.py @@ -681,12 +681,103 @@ async def test_personality_prompt_injected_when_weight_and_profile(chat_client): assert len(captured_messages) >= 2 system_prompt = captured_messages[0]["content"] - # Personality block should be appended + # weight=0.7 → tier 3: signature phrases YES, distinctive_terms NO assert "Keota" in system_prompt - assert "let's gooo" in system_prompt + assert "Respond as Keota would" in system_prompt assert "hands-on demo-driven" in system_prompt assert "casual" in system_prompt assert "high" in system_prompt + assert "let's gooo" in system_prompt # signature phrases included at 0.6+ + assert "enthusiastic" in system_prompt # descriptors at 0.4+ + assert "example-first" in system_prompt # explanation_approach at 0.4+ + # Tier 4 fields (0.8+) should NOT be present + assert "sauce" not in system_prompt # distinctive_terms + assert "crispy" not in system_prompt # sound_descriptions + assert "brrr" not in system_prompt # sound_words + + +def test_personality_block_continuous_interpolation_tiers(): + """Progressive field inclusion across 5 interpolation tiers.""" + from chat_service import _build_personality_block + + profile = _FAKE_PERSONALITY_PROFILE + + # weight < 0.2: empty + for w in (0.0, 0.1, 0.15, 0.19): + result = _build_personality_block("Keota", profile, w) + assert result == "", f"weight={w} should produce empty block" + + # weight 0.2–0.39: basic tone only + for w in (0.2, 0.3, 0.39): + result = _build_personality_block("Keota", profile, w) + assert "subtly reference Keota" in result + assert "hands-on demo-driven" in result + assert "casual" in result and "high" in result + # Should NOT include descriptors, explanation_approach, phrases + assert "enthusiastic" not in result + assert "example-first" not in result + assert "let's gooo" not in result + + # weight 0.4–0.59: + descriptors, explanation_approach + for w in (0.4, 0.5, 0.59): + result = _build_personality_block("Keota", profile, w) + assert "Adopt Keota" in result + assert "enthusiastic" in result # descriptors + assert "example-first" in result # explanation_approach + assert "analogies" in result # uses_analogies + # Should NOT include phrases or tier-4 fields + assert "let's gooo" not in result + assert "sauce" not in result + + # weight 0.6–0.79: + signature phrases + for w in (0.6, 0.7, 0.79): + result = _build_personality_block("Keota", profile, w) + assert "Respond as Keota would" in result + assert "let's gooo" in result # signature phrases + assert "enthusiastic" in result # still has descriptors + # Should NOT include tier-4 fields + assert "sauce" not in result + assert "crispy" not in result + + # weight 0.8–0.89: + distinctive_terms, sound_descriptions, etc. + for w in (0.8, 0.85, 0.89): + result = _build_personality_block("Keota", profile, w) + assert "Fully embody Keota" in result + assert "sauce" in result # distinctive_terms + assert "crispy" in result # sound_descriptions + assert "brrr" in result # sound_words + assert "I always" in result # self_references + assert "fast" in result # pacing + # Should NOT include summary + assert "High-energy producer" not in result + + # weight >= 0.9: + summary + for w in (0.9, 0.95, 1.0): + result = _build_personality_block("Keota", profile, w) + assert "Fully embody Keota" in result + assert "High-energy producer" in result # summary + + +def test_personality_block_phrase_count_scales_with_weight(): + """Signature phrase count = max(2, round(weight * len(phrases))).""" + from chat_service import _build_personality_block + + # Profile with 6 phrases to make scaling visible + profile = { + "vocabulary": { + "signature_phrases": ["p1", "p2", "p3", "p4", "p5", "p6"], + }, + "tone": {}, + "style_markers": {}, + } + # weight=0.6: max(2, round(0.6*6)) = max(2,4) = 4 → first 4 phrases + result = _build_personality_block("Test", profile, 0.6) + assert "p4" in result + assert "p5" not in result + + # weight=1.0: max(2, round(1.0*6)) = 6 → all phrases + result = _build_personality_block("Test", profile, 1.0) + assert "p6" in result @pytest.mark.asyncio