docs: M022 wiki update — follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers

New page: Personality-Profiles (extraction pipeline, JSONB schema, frontend component) Updated: Home (M022 features), Highlights (10 dimensions, creator endpoints, trim), Chat-Engine (multi-turn memory, ChatWidget), Data-Model (CreatorFollow, personality_profile, trim columns), API-Surface (follow, creator highlight, personality endpoints), Frontend (new components/pages), Decisions (D036-D041), _Sidebar (Personality-Profiles link)
2026-04-04 03:44:23 -05:00 · 2026-04-04 03:44:23 -05:00 · 49ab6d029a
commit 49ab6d029a
parent eec99b6c7d
9 changed files with 494 additions and 193 deletions
--- a/API-Surface.md
+++ b/API-Surface.md
@ -1,6 +1,6 @@
 # API Surface

-50 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
+61 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.

 ## Public Endpoints (10)

@ -26,11 +26,19 @@ title, slug, topic_category, topic_tags, summary, body_sections, body_sections_f
 | Method | Path | Response Shape | Notes |
 |--------|------|---------------|-------|
 | GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views |
-| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links |
+| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links, follower_count, personality_profile |
 | GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) |
 | GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques |
 | GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques |

+## Chat Endpoint (1)
+
+| Method | Path | Auth | Purpose |
+|--------|------|------|---------|
+| POST | `/api/v1/chat` | None | Streaming Q&A — SSE response with sources, tokens, done event. See [[Chat-Engine]] |
+
+**Request fields:** `query` (required, 1-1000 chars), `creator` (optional slug/UUID), `conversation_id` (optional UUID for multi-turn threading)
+
 ## Auth Endpoints (4)

 All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
@ -42,6 +50,28 @@ All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
 | GET | `/auth/me` | Bearer JWT | Current user profile. Returns UserResponse. |
 | PUT | `/auth/me` | Bearer JWT | Update display_name and/or password (requires current_password for password changes). Returns UserResponse. |

+## Follow Endpoints (4) — M022/S02
+
+All require Bearer JWT.
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/api/v1/follows/{creator_id}` | Follow a creator (idempotent via INSERT ON CONFLICT DO NOTHING) |
+| DELETE | `/api/v1/follows/{creator_id}` | Unfollow a creator |
+| GET | `/api/v1/follows/{creator_id}/status` | Check if current user follows this creator |
+| GET | `/api/v1/follows/me` | List creators the current user follows |
+
+## Creator Highlight Endpoints (4) — M022/S01
+
+Creator-scoped highlight review. Requires Bearer JWT with creator ownership.
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters) |
+| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
+| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) |
+| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end |
+
 ## Consent Endpoints (5)

 All under prefix `/api/v1/consent/`. All require Bearer JWT.
@ -54,16 +84,6 @@ All under prefix `/api/v1/consent/`. All require Bearer JWT.
 | GET | `/consent/videos/{video_id}/history` | Creator (owner) or Admin | Versioned audit trail of consent changes for a video. |
 | GET | `/consent/admin/summary` | Admin only | Aggregate consent flag counts across all videos. |

-### Consent Fields
-
-Three boolean consent flags per video, each independently toggleable:
-
-| Field | Default | Meaning |
-|-------|---------|---------|
-| `kb_inclusion` | false | Allow indexing into knowledge base |
-| `training_usage` | false | Allow use for model training |
-| `public_display` | true | Allow public display on site |
-
 ## Report Endpoints (3)

 | Method | Path | Purpose |
@ -72,7 +92,9 @@ Three boolean consent flags per video, each independently toggleable:
 | GET | `/api/v1/admin/reports` | List all reports |
 | PATCH | `/api/v1/admin/reports/{id}` | Update report status |

-## Pipeline Admin Endpoints (20+)
+## Admin Endpoints
+
+### Pipeline Admin (20+)

 All under prefix `/api/v1/admin/pipeline/`.

@ -100,52 +122,20 @@ All under prefix `/api/v1/admin/pipeline/`.
 | POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
 | POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |

-## Other Endpoints (2)
+### Highlight Admin (4)

-| Method | Path | Notes |
-|--------|------|-------|
-| POST | `/api/v1/ingest` | Transcript upload |
-| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) |
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/admin/highlights/detect/{video_id}` | Score all KeyMoments for a video |
+| POST | `/admin/highlights/detect-all` | Score all videos |
+| GET | `/admin/highlights/candidates` | Paginated candidate list |
+| GET | `/admin/highlights/candidates/{id}` | Single candidate with score_breakdown |

-## Response Conventions
+### Personality Extraction (1) — M022/S06

-**Standard paginated response:**
-```json
-{
-  "items": [...],
-  "total": 83,
-  "offset": 0,
-  "limit": 20
-}
-```
-
-**Known inconsistencies:**
- `GET /topics` returns bare list instead of paginated dict
- `GET /videos` returns bare list instead of paginated dict
- Search uses `items` key (not `results`)
- `/techniques/random` returns JSON `{slug}` (not HTTP redirect)
-
-**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.**
-
-## Authentication
-
-JWT-based authentication added in M019. See [[Authentication]] for full details.
-
- **Public endpoints** (search, browse, techniques) require no auth
- **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
- **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)
-
---
-
-*See also: [[Architecture]], [[Data-Model]], [[Frontend]], [[Authentication]]*
-utput` | Delete all pipeline output |
-| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization |
-| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index |
-| GET | `/admin/pipeline/worker-status` | Celery worker health |
-| GET | `/admin/pipeline/recent-activity` | Recent pipeline events |
-| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
-| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality profile extraction task |

 ## Other Endpoints (2)

@ -178,9 +168,11 @@ utput` | Delete all pipeline output |

 JWT-based authentication added in M019. See [[Authentication]] for full details.

- **Public endpoints** (search, browse, techniques) require no auth
+- **Public endpoints** (search, browse, techniques, chat) require no auth
 - **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
+- **Follow endpoints** require Bearer JWT
+- **Creator endpoints** (`/creator/*`) require Bearer JWT with creator ownership verification
+- **Consent endpoints** require Bearer JWT with ownership verification
 - **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)

 ---
--- a/Chat-Engine.md
+++ b/Chat-Engine.md
@ -1,29 +1,33 @@
 # Chat Engine

-Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03.
+Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03, expanded with multi-turn memory in M022/S04 and chat widget in M022/S03.

 ## Architecture

 ```
-User types question in ChatPage
+User types question in ChatPage or ChatWidget
        │
        ▼
-POST /api/v1/chat  { query: "...", creator?: "..." }
+POST /api/v1/chat  { query, creator?, conversation_id? }
        │
        ▼
-ChatService.stream(query, creator?)
+ChatService.stream(query, creator?, conversation_id?)
        │
-        ├─ 1. Retrieve: SearchService.search(query, creator)
+        ├─ 1. Load history: Redis chrysopedia:chat:{conversation_id}
+        │
+        ├─ 2. Retrieve: SearchService.search(query, creator)
        │     └─ Uses 4-tier cascade if creator provided (see [[Search-Retrieval]])
        │
-        ├─ 2. Prompt: Assemble numbered context block into encyclopedic system prompt
+        ├─ 3. Prompt: System prompt + history + numbered context + user message
        │     └─ Sources formatted as [1] Title — Summary for citation mapping
        │
-        ├─ 3. Stream: openai.AsyncOpenAI with stream=True
+        ├─ 4. Stream: openai.AsyncOpenAI with stream=True
        │     └─ Tokens streamed as SSE events in real-time
        │
+        ├─ 5. Save history: Append user message + assistant response to Redis
+        │
        ▼
-SSE response → ChatPage renders tokens + citation links
+SSE response → ChatPage/ChatWidget renders tokens + citation links
 ```

 ## SSE Protocol
@ -34,10 +38,28 @@ The chat endpoint returns a `text/event-stream` response with four event types i
 |-------|---------|------|
 | `sources` | `[{title, slug, creator_name, summary}]` | First — citation metadata for link rendering |
 | `token` | `string` (text chunk) | Repeated — streamed LLM completion tokens |
-| `done` | `{cascade_tier: "creator"\|"domain"\|"global"\|"none"\|""}` | Once — signals completion, includes which retrieval tier answered |
+| `done` | `{cascade_tier, conversation_id}` | Once — signals completion, includes retrieval tier and conversation ID |
 | `error` | `{message: string}` | On failure — emitted if LLM errors mid-stream |

-The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context (see [[Search-Retrieval]]).
+The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context. The `conversation_id` enables the frontend to thread follow-up messages.
+
+## Multi-Turn Conversation Memory (M022/S04)
+
+### Redis Storage
+
+- **Key pattern:** `chrysopedia:chat:{conversation_id}` 
+- **Format:** Single JSON string containing a list of `{role, content}` message dicts
+- **TTL:** 1 hour, refreshed on each interaction
+- **Cap:** 10 turn pairs (20 messages) — oldest pairs trimmed when exceeded
+
+### Conversation Flow
+
+1. Client sends `conversation_id` in POST body (or omits for new conversation)
+2. Server auto-generates UUID when `conversation_id` is omitted
+3. History loaded from Redis and injected between system prompt and user message
+4. Assistant response accumulated during streaming
+5. User message + assistant response appended to history in Redis
+6. `conversation_id` returned in SSE `done` event for threading

 ## Citation Format

@ -45,7 +67,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its

 - `[1]` → links to `/techniques/:slug` for the corresponding source
 - Multiple citations supported: `[1][3]` or `[1,3]`
- Citation regex: `/\[(\d+)\]/g` parsed locally in ChatPage
+- Citation regex: `/\[(\d+)\]/g` parsed locally in both ChatPage and ChatWidget

 ## API Endpoint

@ -55,6 +77,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
 |-------|------|----------|------------|
 | `query` | string | Yes | 1–1000 characters |
 | `creator` | string | No | Creator UUID or slug for scoped retrieval |
+| `conversation_id` | string | No | UUID for multi-turn threading. Auto-generated if omitted. |

 **Response:** `text/event-stream` (SSE)

@ -65,9 +88,11 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its

 Located in `backend/chat_service.py`. The retrieve-prompt-stream pipeline:

-1. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
-2. **Prompt** — Builds a numbered context block from search results. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
-3. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI` (configured to point at DGX Sparks Qwen or local Ollama). Yields SSE events as tokens arrive.
+1. **Load History** — `_load_history()` reads from Redis key `chrysopedia:chat:{conversation_id}`. Returns empty list if key absent.
+2. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
+3. **Prompt** — Builds message array: system prompt → conversation history → numbered context block → user message. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
+4. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI`. Yields SSE events as tokens arrive.
+5. **Save History** — `_save_history()` appends the user message and accumulated assistant response to Redis. Trims to 10 turn pairs if exceeded. Refreshes TTL to 1 hour.

 Error handling: If the LLM fails mid-stream (after some tokens have been sent), an `error` event is emitted so the frontend can display a failure message rather than leaving the response hanging.

@ -77,38 +102,63 @@ Route: `/chat` (lazy-loaded, code-split)

 ### Components

- **Text input + submit button** — Query entry with Enter-to-submit
+- **Multi-message conversation UI** — Messages array with conversation bubble layout
+- **Conversation threading** — `conversationId` state, "New conversation" button to reset
 - **Streaming message display** — Accumulates tokens with blinking cursor animation during streaming
- **Citation markers** — `[N]` parsed to superscript links targeting `/techniques/:slug`
- **Source list** — Numbered sources with creator attribution displayed below the response
- **States:** Loading (streaming indicator), error (message display), empty (placeholder prompt)
+- **Typing indicator** — Three-dot animation while streaming
+- **Citation markers** — `[N]` parsed to superscript links targeting `/techniques/:slug` (per-message)
+- **Source list** — Numbered sources with creator attribution displayed below each response
+- **Auto-scroll** — Scrolls to bottom as new tokens arrive

 ### SSE Client

 Located in `frontend/src/api/chat.ts`. Uses `fetch()` + `ReadableStream` with typed callbacks:

 ```typescript
-streamChat(query, creator?, {
+streamChat(query, {
  onSources: (sources) => void,
  onToken: (token) => void,
-  onDone: (data) => void,
+  onDone: (data: ChatDoneMeta) => void,
  onError: (error) => void,
-})
+}, creatorName?, conversationId?)
 ```

+`ChatDoneMeta` type includes `cascade_tier` and `conversation_id` fields.
+
+## Frontend: ChatWidget (M022/S03)
+
+Floating chat bubble on creator detail pages. Fixed-position bottom-right.
+
+### Behavior
+
+- **Bubble** → click → **slide-up panel** with conversation UI
+- Creator-scoped: passes `creatorName` to `streamChat()` for retrieval cascade
+- **Suggested questions** generated client-side from technique titles and categories
+- **Typing indicator** — three-dot animation during streaming
+- **Citation links** — parsed from response, linked to technique pages
+- **Responsive** — full-width below 640px, 400px panel on desktop
+- **Conversation threading** — `conversationId` generated via `crypto.randomUUID()` on first send, threaded through `streamChat()`, updated from done event
+- **Reset on close** — messages and conversationId cleared when panel closes
+
 ## Key Files

- `backend/chat_service.py` — ChatService retrieve-prompt-stream pipeline
- `backend/routers/chat.py` — POST /api/v1/chat endpoint
- `frontend/src/api/chat.ts` — SSE client utility
- `frontend/src/pages/ChatPage.tsx` — Chat UI page component
- `frontend/src/pages/ChatPage.module.css` — Chat page styles
+- `backend/chat_service.py` — ChatService with history load/save, retrieve-prompt-stream pipeline
+- `backend/routers/chat.py` — POST /api/v1/chat endpoint with conversation_id support
+- `backend/tests/test_chat.py` — 13 tests (6 streaming + 7 conversation memory)
+- `frontend/src/api/chat.ts` — SSE client with conversationId param and ChatDoneMeta type
+- `frontend/src/pages/ChatPage.tsx` — Multi-message conversation UI
+- `frontend/src/pages/ChatPage.module.css` — Conversation bubble layout styles
+- `frontend/src/components/ChatWidget.tsx` — Floating chat widget component
+- `frontend/src/components/ChatWidget.module.css` — Widget styles (38 custom property refs)

 ## Design Decisions

- **Standalone ASGI test client pattern** — Tests use mocked DB to avoid PostgreSQL dependency, enabling fast CI runs
- **Patch `openai.AsyncOpenAI` constructor** rather than instance attribute for reliable test mocking
- **Local citation regex** in ChatPage rather than importing from utils — link targets differ from technique page citations
+- **Redis JSON string** — Conversation history stored as single JSON value (atomic read/write) rather than Redis list type
+- **Auto-generate conversation_id** — Server creates UUID when client omits it, ensuring consistent `done` event shape
+- **Widget resets on close** — Clean slate UX; no persistence across open/close cycles
+- **Client-side suggested questions** — Generated from technique titles/categories without API call
+- **Citation parsing duplicated** — ChatPage and ChatWidget each parse citations independently (extracted utility deferred)
+- **Standalone ASGI test client** — Tests use mocked DB to avoid PostgreSQL dependency

 ---

--- a/Data-Model.md
+++ b/Data-Model.md
@ -1,6 +1,6 @@
 # Data Model

-18 SQLAlchemy models in `backend/models.py`.
+20 SQLAlchemy models in `backend/models.py`.

 ## Entity Relationship Overview

@ -17,6 +17,8 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
    │           ├──→ (N) RelatedTechniqueLink
    │           └──→ (M:N) SourceVideo  (via TechniquePageVideo)
    │
+    ├──→ (N) CreatorFollow ←── User
+    │
    └──→ (0..1) User ──→ (N) InviteCode (created_by)
 ```

@ -34,6 +36,7 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
 | bio | Text | Admin-editable |
 | social_links | JSONB | Platform → URL mapping |
 | featured | Boolean | For homepage spotlight |
+| personality_profile | JSONB | LLM-extracted personality data (M022/S06). See [[Personality-Profiles]] |

 ### SourceVideo

@ -101,6 +104,33 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
 | content_snapshot | JSONB | Full page state at version time |
 | pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config |

+### HighlightCandidate
+
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID PK | |
+| key_moment_id | FK → KeyMoment | Unique constraint |
+| source_video_id | FK → SourceVideo | Indexed |
+| score | Float | Composite score 0.0–1.0 |
+| score_breakdown | JSONB | Per-dimension scores (10 fields, see [[Highlights]]) |
+| duration_secs | Float | Cached from KeyMoment |
+| status | Enum(HighlightStatus) | candidate / approved / rejected |
+| trim_start | Float | Nullable — trim offset in seconds (M022/S01) |
+| trim_end | Float | Nullable — trim offset in seconds (M022/S01) |
+| created_at | Timestamp | |
+| updated_at | Timestamp | |
+
+### CreatorFollow (M022/S02)
+
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID PK | |
+| user_id | FK → User | Part of unique constraint |
+| creator_id | FK → Creator | Part of unique constraint |
+| created_at | Timestamp | |
+
+Unique constraint on `(user_id, creator_id)`. Idempotent follow via `INSERT ON CONFLICT DO NOTHING`.
+
 ## Authentication & User Models

 ### User
@ -192,20 +222,17 @@ Append-only versioned record of per-field consent changes.
 | **HighlightStatus** | candidate, approved, rejected (M021/S04) |
 | **ChapterStatus** | draft, approved, hidden (M021/S06) |

+## Migrations
+
+| Migration | Description |
+|-----------|-------------|
+| 019 | Add highlight_candidates table |
+| 021 | Add trim_start/trim_end to highlight_candidates (M022/S01) |
+| 022 | Add creator_follows table (M022/S02) |
+| 023 | Add personality_profile JSONB to creators (M022/S06) |
+
 ## Schema Notes

- **No Alembic migrations** — schema changes currently require manual DDL
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
- **User passwords** are stored as bcrypt hashes via `bcrypt.hashpw()`
- **Consent audit** uses version numbers assigned in application code (`max(version) + 1` per video_consent_id)
-
---
-
-*See also: [[Architecture]], [[API-Surface]], [[Pipeline]], [[Authentication]]*
- changes currently require manual DDL
 - **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
 - **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
 - **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
--- a/Decisions.md
+++ b/Decisions.md
@ -31,12 +31,26 @@ Architectural and pattern decisions made during Chrysopedia development. Append-
 | D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 |
 | D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure |

-## M021 Decisions
+## Authentication & Infrastructure Decisions

 | # | When | Decision | Choice | Rationale |
 |---|------|----------|--------|-----------|
-| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score; retrieval order is the only signal |
-| D040 | M021/S02 | Creator-scoped retrieval strategy | 4-tier cascade: creator → domain → global → none | Progressive widening ensures results while preferring creator context; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
+| D036 | M019/S02 | JWT auth configuration | HS256 with existing app_secret_key, 24h expiry, OAuth2PasswordBearer | Reuses existing secret; integrates with FastAPI dependency injection |
+| D037 | — | Search impressions query | Exact case-insensitive title match via EXISTS subquery against SearchLog | MVP approach; expandable to ILIKE later |
+| D038 | — | Primary git remote | git.xpltd.co (Forgejo) instead of github.com | Consolidating on self-hosted Forgejo; wiki already there |
+
+## Search & Retrieval Decisions
+
+| # | When | Decision | Choice | Rationale |
+|---|------|----------|--------|-----------|
+| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score |
+| D040 | M021/S02 | Creator-scoped retrieval | 4-tier cascade: creator → domain → global → none | Progressive widening; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
+
+## M022 Decisions
+
+| # | When | Decision | Choice | Rationale |
+|---|------|----------|--------|-----------|
+| D041 | M022/S05 | Highlight scorer weight distribution | 10 dimensions: original 7 reduced proportionally, 3 audio proxy dims get 0.22 total weight. Neutral fallback (0.5) when word_timings unavailable. | Audio proxy signals from word-level timing data; neutral fallback preserves backward compatibility |

 ## UI/UX Decisions

--- a/Frontend.md
+++ b/Frontend.md
@ -10,10 +10,13 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
 | `/search` | SearchResults | Public | Sort, highlights, partial matches |
 | `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations |
 | `/creators` | CreatorsBrowse | Public | Random default sort, genre filters |
-| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list |
+| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list, follow button, personality profile, chat widget |
 | `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics |
 | `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques |
+| `/chat` | ChatPage | Public | Multi-message conversation UI with threading |
 | `/about` | About | Public | Static project info |
+| `/creator/highlights` | HighlightQueue | Creator JWT | Highlight review queue with filter tabs (M022/S01) |
+| `/creator/tiers` | CreatorTiers | Creator JWT | Free/Pro/Premium tier cards with Coming Soon modals (M022/S02) |
 | `/admin/reports` | AdminReports | Admin* | Content reports |
 | `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management |
 | `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin |
@ -38,6 +41,51 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
 | CopyLinkButton | Clipboard copy with tooltip |
 | SocialIcons | Social media link icons (9 platforms) |
 | ReportIssueModal | Content report submission |
+| ChatWidget | Floating chat bubble on creator pages — SSE streaming, citations, suggested questions (M022/S03) |
+| PersonalityProfile | Collapsible creator personality display — 3 sub-cards (Teaching Style, Vocabulary, Style) (M022/S06) |
+
+## Feature Pages (M022)
+
+### HighlightQueue (M022/S01)
+
+Creator-scoped highlight review page at `/creator/highlights`.
+
+- **Filter tabs** — All / Shorts / Approved / Rejected
+- **Candidate cards** — Title, duration, composite score, status badge
+- **Score breakdown bars** — 10-dimension visual bars (fetched lazily on expand)
+- **Action buttons** — Approve / Discard with ownership verification
+- **Inline trim panel** — Validated trim_start / trim_end inputs
+- **Files:** `HighlightQueue.tsx`, `HighlightQueue.module.css`, `highlights.ts` (API)
+
+### CreatorTiers (M022/S02)
+
+Tier configuration at `/creator/tiers`.
+
+- **Three cards** — Free (active), Pro, Premium
+- **Coming Soon modals** — Styled placeholders per D033 (Stripe deferred to Phase 3)
+- **Files:** `CreatorTiers.tsx`, `CreatorTiers.module.css`
+
+### ChatWidget (M022/S03)
+
+Floating chat on creator detail pages.
+
+- **Fixed-position bubble** (bottom-right) → slide-up conversation panel
+- **Creator-scoped** — passes creatorName to streamChat() for retrieval cascade
+- **Suggested questions** — client-side from technique titles/categories
+- **Streaming SSE** — tokens, citations, typing indicator
+- **Responsive** — full-width below 640px, 400px panel on desktop
+- **Conversation threading** — conversationId via crypto.randomUUID(), resets on close
+- **Files:** `ChatWidget.tsx`, `ChatWidget.module.css`
+
+### PersonalityProfile (M022/S06)
+
+Collapsible personality display on creator detail pages.
+
+- **Grid-template-rows animation** — 0fr → 1fr for smooth expand/collapse
+- **Three sub-cards:** Teaching Style, Vocabulary, Style
+- **Pill badges** for phrases/terms, checkmark/cross for boolean markers
+- **Gracefully hidden** when profile is null
+- **Files:** `PersonalityProfile.tsx`, styles in `App.css`

 ## Hooks

@ -45,19 +93,22 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
 |------|---------|
 | useCountUp | Animated counter for homepage stats |
 | useSortPreference | Persists sort preference in localStorage |
-| useDocumentTitle | Sets `<title>` per page (all 10 pages instrumented) |
+| useDocumentTitle | Sets `<title>` per page (all pages instrumented) |

 ## State Management

-Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library.
+Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library. AuthProvider context for JWT auth state.

 ## API Client

-Two API modules:
+API modules:
 - `public-client.ts` (~600 lines) — typed `request<T>` helper for REST endpoints
- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`
- `videos.ts` — chapter management functions (fetchChapters, fetchCreatorChapters, updateChapter, reorderChapters, approveChapters)
- `auth.ts` — authentication + impersonation functions including `fetchImpersonationLog()`
+- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`, `ChatDoneMeta` type
+- `videos.ts` — chapter management functions
+- `auth.ts` — authentication + impersonation functions
+- `highlights.ts` — creator highlight review functions (M022/S01)
+- `follows.ts` — follow/unfollow/status/list functions (M022/S02)
+- `creators.ts` — creator detail with personality_profile and follower_count types (M022/S02, S06)

 Relative `/api/v1` base URL (nginx proxies to API container).

@ -66,26 +117,13 @@ Relative `/api/v1` base URL (nginx proxies to API container).
 | Property | Value |
 |----------|-------|
 | File | `frontend/src/App.css` |
-| Lines | 5,820 |
-| Unique classes | ~589 |
+| Lines | ~6,500+ |
 | Naming | BEM (`block__element--modifier`) |
 | Theme | Dark-only (no light mode) |
 | Custom properties | 77 in `:root` (D017) |
 | Accent color | Cyan `#22d3ee` |
 | Font stack | System fonts |
-| Preprocessor | None |
-| CSS Modules | None |
-
-### Custom Property Categories (77 total)
-
- **Surface colors:** page background, card backgrounds, nav, footer, input
- **Text colors:** primary, secondary, muted, inverse, link, heading
- **Accent colors:** primary cyan, hover/active, focus rings
- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories
- **Status colors:** Success/warning/error/info
- **Border colors:** Default, hover, focus, divider
- **Shadow colors:** Elevation, glow effects
- **Overlay colors:** Modal/dropdown overlays
+| CSS Modules | Used for new components (HighlightQueue, CreatorTiers, ChatWidget, ChatPage) |

 ### Breakpoints

@ -93,7 +131,7 @@ Relative `/api/v1` base URL (nginx proxies to API container).
 |-----------|-------|
 | 480px | Narrow mobile — compact cards |
 | 600px | Wider mobile — grid adjustments |
-| 640px | Small tablet — content width |
+| 640px | Small tablet / chat widget responsive break |
 | 768px | Desktop ↔ mobile transition — sidebar collapse |

 ### Layout Patterns
@ -114,10 +152,3 @@ Relative `/api/v1` base URL (nginx proxies to API container).
 ---

 *See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
-*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
-ocalhost:8001`
- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container
-
---
-
-*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
--- a/Highlights.md
+++ b/Highlights.md
@ -1,10 +1,10 @@
 # Highlight Detection

-Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 7 weighted dimensions. Added in M021/S04.
+Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 10 weighted dimensions. Originally added in M021/S04 with 7 dimensions, expanded to 10 in M022/S05.

 ## Overview

-Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 7 dimensions derived from existing KeyMoment metadata.
+Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 10 dimensions derived from existing KeyMoment metadata and word-level transcript timing data.

 ## Scoring Dimensions

@ -12,13 +12,22 @@ Total weight sums to 1.0. Each dimension produces a 0.0–1.0 score.

 | Dimension | Weight | What It Measures |
 |-----------|--------|-----------------|
-| `duration_fitness` | 0.25 | Piecewise linear curve peaking at 30–60 seconds (ideal clip length) |
-| `content_type` | 0.20 | Content type favorability: tutorial > tip > walkthrough > exploration |
-| `specificity_density` | 0.20 | Regex-based counting of specific units, ratios, and named parameters in summary text |
-| `plugin_richness` | 0.10 | Number of plugins/VSTs referenced (more = more actionable) |
-| `transcript_energy` | 0.10 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
-| `source_quality` | 0.10 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
-| `video_type` | 0.05 | Video type favorability mapping |
+| `duration_fitness` | 0.20 | Piecewise linear curve peaking at 30–60 seconds (ideal clip length) |
+| `content_type` | 0.16 | Content type favorability: tutorial > tip > walkthrough > exploration |
+| `specificity_density` | 0.16 | Regex-based counting of specific units, ratios, and named parameters in summary text |
+| `plugin_richness` | 0.08 | Number of plugins/VSTs referenced (more = more actionable) |
+| `transcript_energy` | 0.08 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
+| `source_quality` | 0.08 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
+| `video_type` | 0.02 | Video type favorability mapping |
+| `speech_rate_variance` | ~0.07 | Coefficient of variation of words-per-second in 5s sliding windows |
+| `pause_density` | ~0.08 | Count and weight of inter-word gaps (>0.5s short, >1.0s long) |
+| `speaking_pace` | ~0.07 | Bell-curve fitness around optimal 3–5 WPS teaching pace |
+
+### Audio Proxy Dimensions (M022/S05)
+
+The three new dimensions (speech_rate_variance, pause_density, speaking_pace) are derived from **word-level transcript timing data** — not raw audio. This provides meaningful speech-pattern signals without requiring librosa or audio processing dependencies.
+
+**Neutral fallback:** When `word_timings` are unavailable (no word-level data in transcript), all three audio proxy dimensions default to **0.5** (neutral score). This preserves backward compatibility — existing scoring paths are unaffected. The weights of the original 7 dimensions were reduced proportionally to accommodate the new 0.22 total weight for audio dimensions (D041).

 ### Duration Fitness Curve

@ -36,12 +45,14 @@ Uses piecewise linear (not Gaussian) for predictability:
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID PK | |
-| key_moment_id | FK → KeyMoment | Unique constraint (`uq_highlight_candidate_moment`) |
+| key_moment_id | FK → KeyMoment | Unique constraint (`highlight_candidates_key_moment_id_key`) |
 | source_video_id | FK → SourceVideo | Indexed |
 | score | Float | Composite score 0.0–1.0 |
-| score_breakdown | JSONB | Per-dimension scores (7 fields) |
+| score_breakdown | JSONB | Per-dimension scores (10 fields) |
 | duration_secs | Float | Cached from KeyMoment for display |
 | status | Enum(HighlightStatus) | candidate / approved / rejected |
+| trim_start | Float | Nullable — trim start offset in seconds (M022/S01) |
+| trim_end | Float | Nullable — trim end offset in seconds (M022/S01) |
 | created_at | Timestamp | |
 | updated_at | Timestamp | |

@ -59,12 +70,15 @@ Uses piecewise linear (not Gaussian) for predictability:
 - `score` DESC — rank ordering
 - `status` — filter by review state

-### Migration
+### Migrations

-Alembic migration `019_add_highlight_candidates.py` creates the table with all indexes and the named unique constraint.
+- `019_add_highlight_candidates.py` — Creates table with indexes and unique constraint
+- `021_add_highlight_trim_columns.py` — Adds trim_start and trim_end columns (M022/S01)

 ## API Endpoints

+### Admin Endpoints
+
 All under `/api/v1/admin/highlights/`. Admin access.

 | Method | Path | Purpose |
@ -74,37 +88,31 @@ All under `/api/v1/admin/highlights/`. Admin access.
 | GET | `/admin/highlights/candidates` | Paginated candidate list, sorted by score DESC |
 | GET | `/admin/highlights/candidates/{id}` | Single candidate with full `score_breakdown` |

-### Detect Response
+### Creator Endpoints (M022/S01)
+
+Creator-scoped highlight review. Requires JWT auth with creator ownership verification.
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters, score DESC) |
+| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
+| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) with ownership verification |
+| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end (validation: non-negative, start < end) |
+
+### Score Breakdown Response

 ```json
 {
-  "video_id": "uuid",
-  "candidates_created": 12,
-  "candidates_updated": 0
-}
-```
-
-### Candidate Response
-
-```json
-{
-  "id": "uuid",
-  "key_moment_id": "uuid",
-  "source_video_id": "uuid",
-  "score": 0.847,
-  "score_breakdown": {
-    "duration_fitness": 0.95,
-    "content_type_weight": 0.80,
-    "specificity_density": 0.72,
-    "plugin_richness": 0.60,
-    "transcript_energy": 0.85,
-    "source_quality_weight": 1.00,
-    "video_type_weight": 0.50
-  },
-  "duration_secs": 45.0,
-  "status": "candidate",
-  "created_at": "...",
-  "updated_at": "..."
+  "duration_fitness": 0.95,
+  "content_type_weight": 0.80,
+  "specificity_density": 0.72,
+  "plugin_richness": 0.60,
+  "transcript_energy": 0.85,
+  "source_quality_weight": 1.00,
+  "video_type_weight": 0.50,
+  "speech_rate_variance_score": 0.057,
+  "pause_density_score": 0.0,
+  "speaking_pace_score": 1.0
 }
 ```

@ -114,30 +122,55 @@ All under `/api/v1/admin/highlights/`. Admin access.

 - **Binding:** `bind=True, max_retries=3`
 - **Session:** Uses `_get_sync_session` (sync SQLAlchemy, per D004)
- **Flow:** Load KeyMoments for video → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on named constraint `uq_highlight_candidate_moment`
+- **Flow:** Load KeyMoments for video → load transcript JSON → extract word timings per moment → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on constraint `highlight_candidates_key_moment_id_key`
+- **Transcript handling:** Loads transcript JSON once per video via `SourceVideo.transcript_path`. Accepts both `{segments: [...]}` and bare `[...]` JSON formats.
+- **Fallback:** If transcript is missing or malformed, `word_timings=None` and scorer uses neutral values for audio dimensions
 - **Events:** Emits `pipeline_events` rows for start/complete/error with candidate count in payload

 ### Scoring Function

-`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict, returns `(score, breakdown_dict)`. This separation enables easy unit testing (28 tests, runs in 0.03s).
+`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict and optional `word_timings` list, returns `(score, breakdown_dict)`. This separation enables easy unit testing (62 tests, runs in 0.09s).
+
+### Word Timing Extraction
+
+`extract_word_timings()` filters word-level timing dicts from transcript JSON by time window. Used by the Celery task to extract timings per KeyMoment before scoring.
+
+## Frontend: Highlight Review Queue (M022/S01)
+
+Route: `/creator/highlights` (JWT-protected, lazy-loaded)
+
+### Components
+
+- **Filter tabs** — All / Shorts / Approved / Rejected
+- **Candidate cards** — Key moment title, duration, composite score, status badge
+- **Score breakdown bars** — Visual bars for each of the 10 scoring dimensions (fetched lazily on expand)
+- **Action buttons** — Approve / Discard with ownership verification
+- **Inline trim panel** — Validated number inputs for trim_start / trim_end
+- **Sidebar link** — Star icon in creator dashboard SidebarNav

 ## Design Decisions

 - **Pure function scoring** — No DB or side effects in `score_moment()`, enabling fast unit tests
 - **Piecewise linear duration** — Predictable behavior vs. Gaussian bell curve
- **Named unique constraint** — `uq_highlight_candidate_moment` enables idempotent upserts via `ON CONFLICT`
- **Lazy import** — `score_moment` imported inside Celery task to avoid circular imports at module load
+- **Neutral fallback at 0.5** — New audio dimensions don't penalize moments without word-level timing data (D041)
+- **Proportional weight reduction** — Original 7 dimensions reduced proportionally to make room for 0.22 audio weight
+- **Lazy detail fetch** — Score breakdown fetched on expand, not on list load (avoids N+1)
+- **Creator-scoped router** — Ownership verification pattern reusable for future creator endpoints

 ## Key Files

- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 7 dimensions
- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown, HighlightCandidateResponse, HighlightBatchResult)
+- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 10 dimensions, word timing extraction
+- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown with 10 fields)
 - `backend/pipeline/stages.py` — `stage_highlight_detection` Celery task
 - `backend/routers/highlights.py` — 4 admin API endpoints
- `backend/models.py` — HighlightCandidate model, HighlightStatus enum
- `alembic/versions/019_add_highlight_candidates.py` — Migration
- `backend/pipeline/test_highlight_scorer.py` — 28 unit tests
+- `backend/routers/creator_highlights.py` — 4 creator-scoped endpoints (M022/S01)
+- `backend/models.py` — HighlightCandidate model with trim columns
+- `alembic/versions/019_add_highlight_candidates.py` — Initial migration
+- `alembic/versions/021_add_highlight_trim_columns.py` — Trim columns migration
+- `backend/pipeline/test_highlight_scorer.py` — 62 unit tests
+- `frontend/src/pages/HighlightQueue.tsx` — Creator review queue page
+- `frontend/src/api/highlights.ts` — Highlight API client

 ---

-*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]]*
+*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]], [[Frontend]]*
--- a/Home.md
+++ b/Home.md
@ -8,12 +8,38 @@ Producers can search for specific techniques and find timestamped key moments, s

 - [[Architecture]] — System architecture, Docker services, network topology
 - [[Data-Model]] — SQLAlchemy models, relationships, enums
- [[API-Surface]] â€” All 41 API endpoints grouped by domain
+- [[API-Surface]] — All 60+ API endpoints grouped by domain
 - [[Frontend]] — Routes, components, hooks, CSS architecture
 - [[Pipeline]] — 6-stage LLM extraction pipeline, prompt system
+- [[Chat-Engine]] — Streaming Q&A with multi-turn memory
+- [[Highlights]] — 10-dimension highlight detection and review queue
+- [[Personality-Profiles]] — LLM-extracted creator teaching personality
+- [[Search-Retrieval]] — LightRAG + Qdrant retrieval cascade
 - [[Deployment]] — Docker Compose setup, rebuild commands
 - [[Development-Guide]] — Local dev setup, common gotchas
- [[Decisions]] â€” Architectural decisions register (D001â€“D035)
+- [[Decisions]] — Architectural decisions register (D001–D041)
+
+## Features
+
+### Core
+- **Technique Pages** — LLM-synthesized study guides with v2 body sections, signal chains, citations
+- **Search** — LightRAG primary + Qdrant fallback with 4-tier creator-scoped cascade
+- **Pipeline** — 6-stage LLM extraction (transcripts → key moments → classification → synthesis → embedding)
+- **Player** — Audio player with chapter markers
+
+### Creator Tools
+- **Follow System** — User-to-creator follows with follower counts (M022)
+- **Personality Profiles** — LLM-extracted teaching style, vocabulary, and tone analysis (M022)
+- **Creator Tiers** — Free/Pro/Premium tier configuration with Coming Soon placeholders (M022)
+- **Highlight Detection v2** — 10-dimension scoring with audio proxy signals, creator review queue (M022)
+- **Chat Widget** — Floating creator-scoped chat bubble with streaming SSE and citations (M022)
+- **Multi-Turn Chat Memory** — Redis-backed conversation history with conversation_id threading (M022)
+- **Creator Dashboard** — Video management, chapter editing, consent controls
+
+### Platform
+- **Authentication** — JWT with invite codes, admin/creator roles
+- **Consent System** — Per-video granular consent with audit trail
+- **Impersonation** — Admin-to-creator context switching with audit log

 ## Current Scale

@ -31,16 +57,11 @@ Producers can search for specific techniques and find timestamped key moments, s
 | Database | PostgreSQL 16 |
 | Cache/Broker | Redis 7 |
 | Vector Store | Qdrant 1.13.2 |
+| RAG Framework | LightRAG + NetworkX |
 | Embeddings | Ollama (nomic-embed-text) |
 | LLM | OpenAI-compatible API (DGX Sparks Qwen primary, local Ollama fallback) |
 | Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 |

 ---

-*Last updated: 2026-04-04 â€” M021 chat engine, retrieval cascade, highlights, audio mode, chapters, impersonation write mode*
-inx reverse proxy on nuc01 |
-
---
-
-*Last updated: 2026-04-03 â€” M018/S02 initial bootstrap*
-” M018/S02 initial bootstrap*
+*Last updated: 2026-04-04 — M022 follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers*
--- a/Personality-Profiles.md
+++ b/Personality-Profiles.md
@ -0,0 +1,132 @@
+# Personality Profiles
+
+LLM-powered extraction of creator teaching personality from transcript analysis. Added in M022/S06.
+
+## Overview
+
+Personality profiles capture each creator's distinctive teaching style — vocabulary patterns, tonal qualities, and stylistic markers — by analyzing their transcript corpus with a structured LLM extraction pipeline. Profiles are stored as JSONB on the Creator model and displayed on creator detail pages.
+
+## Extraction Pipeline
+
+### Transcript Sampling
+
+Three-tier sampling strategy based on total transcript size:
+
+| Tier | Condition | Strategy |
+|------|-----------|----------|
+| Small | < 20K chars | Use all transcript text |
+| Medium | 20K–60K chars | 300-character excerpts per key moment |
+| Large | > 60K chars | Topic-diverse random sampling via Redis classification data |
+
+Large-tier sampling uses deterministic seeding and pulls from across topic categories to ensure the profile reflects the creator's full range, not just their most common topic.
+
+### LLM Extraction
+
+The prompt template at `prompts/personality_extraction.txt` instructs the LLM to analyze transcript excerpts and produce structured JSON. The LLM response is parsed and validated with a Pydantic model before storage.
+
+**Celery task:** `extract_personality_profile` in `backend/pipeline/stages.py`
+- Joins KeyMoment → SourceVideo to load transcripts
+- Samples transcripts per the tier strategy
+- Calls LLM with `response_model=object` for JSON mode
+- Validates response with `PersonalityProfile` Pydantic model
+- Stores result as JSONB on Creator row
+- Emits pipeline_events for observability
+
+### Error Handling
+
+- Zero-transcript creators: early return, no profile
+- Invalid JSON from LLM: retry
+- Pydantic validation failure: retry
+- Pipeline events track start/complete/error
+
+## PersonalityProfile Schema
+
+Stored as `Creator.personality_profile` JSONB column. Nested structure:
+
+### VocabularyProfile
+
+| Field | Type | Description |
+|-------|------|-------------|
+| signature_phrases | list[str] | Characteristic phrases the creator uses repeatedly |
+| jargon_level | str | How technical their language is (e.g., "high", "moderate") |
+| filler_words | list[str] | Common filler words/phrases |
+| distinctive_terms | list[str] | Unique terminology or coined phrases |
+
+### ToneProfile
+
+| Field | Type | Description |
+|-------|------|-------------|
+| formality | str | Formal to casual spectrum |
+| energy | str | Energy level descriptor |
+| humor | str | Humor style/frequency |
+| teaching_style | str | Overall teaching approach |
+
+### StyleMarkersProfile
+
+| Field | Type | Description |
+|-------|------|-------------|
+| explanation_approach | str | How they explain concepts |
+| analogies | bool | Whether they use analogies frequently |
+| sound_words | bool | Whether they use onomatopoeia / sound words |
+| audience_engagement | str | How they address / engage viewers |
+
+### Metadata
+
+Each profile includes extraction metadata:
+
+| Field | Description |
+|-------|-------------|
+| extracted_at | ISO timestamp of extraction |
+| transcript_sample_size | Number of characters sampled |
+| model_used | LLM model identifier |
+
+## API
+
+### Admin Trigger
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality extraction task |
+
+Returns immediately — extraction runs asynchronously via Celery. Check `pipeline_events` for status.
+
+### Creator Detail
+
+`GET /api/v1/creators/{slug}` includes `personality_profile` field (null if not yet extracted).
+
+## Frontend Component
+
+`PersonalityProfile.tsx` — collapsible section on creator detail pages.
+
+### Layout
+
+- **Collapsible header** with chevron toggle (CSS `grid-template-rows: 0fr/1fr` animation)
+- **Three sub-cards:**
+  - **Teaching Style** — formality, energy, humor, teaching_style, explanation_approach, audience_engagement
+  - **Vocabulary** — jargon_level summary, signature_phrases pills, filler_words pills, distinctive_terms pills
+  - **Style** — analogies (checkmark/cross), sound_words (checkmark/cross), summary paragraph
+- **Metadata footer** — extraction date, sample size
+
+Handles null profiles gracefully (renders nothing).
+
+## Key Files
+
+- `prompts/personality_extraction.txt` — LLM prompt template
+- `backend/pipeline/stages.py` — `extract_personality_profile` Celery task, `_sample_creator_transcripts()` helper
+- `backend/schemas.py` — PersonalityProfile, VocabularyProfile, ToneProfile, StyleMarkersProfile Pydantic models
+- `backend/models.py` — Creator.personality_profile JSONB column
+- `backend/routers/admin.py` — POST /admin/creators/{slug}/extract-profile endpoint
+- `backend/routers/creators.py` — Passthrough in GET /creators/{slug}
+- `alembic/versions/023_add_personality_profile.py` — Migration
+- `frontend/src/components/PersonalityProfile.tsx` — Collapsible profile component
+- `frontend/src/api/creators.ts` — TypeScript interfaces for profile sub-objects
+
+## Design Decisions
+
+- **3-tier transcript sampling** — Balances coverage vs. token cost. Topic-diverse random sampling for large creators prevents profile skew toward dominant topic.
+- **Admin trigger endpoint** — On-demand extraction rather than automatic on ingest. Profiles are expensive (large LLM call) and only needed once per creator.
+- **JSONB storage** — Profile schema may evolve; JSONB avoids migration for every field change.
+
+---
+
+*See also: [[Data-Model]], [[API-Surface]], [[Frontend]], [[Pipeline]]*
--- a/_Sidebar.md
+++ b/_Sidebar.md
@ -14,6 +14,7 @@
 - [[Chat-Engine]]
 - [[Search-Retrieval]]
 - [[Highlights]]
+- [[Personality-Profiles]]

 **Reference**
 - [[API-Surface]]