docs: M022 wiki update — follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers

New page: Personality-Profiles (extraction pipeline, JSONB schema, frontend component)
Updated: Home (M022 features), Highlights (10 dimensions, creator endpoints, trim),
  Chat-Engine (multi-turn memory, ChatWidget), Data-Model (CreatorFollow, personality_profile, trim columns),
  API-Surface (follow, creator highlight, personality endpoints), Frontend (new components/pages),
  Decisions (D036-D041), _Sidebar (Personality-Profiles link)
jlightner 2026-04-04 03:44:23 -05:00
parent eec99b6c7d
commit 49ab6d029a
9 changed files with 494 additions and 193 deletions

@ -1,6 +1,6 @@
# API Surface
50 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
61 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
## Public Endpoints (10)
@ -26,11 +26,19 @@ title, slug, topic_category, topic_tags, summary, body_sections, body_sections_f
| Method | Path | Response Shape | Notes |
|--------|------|---------------|-------|
| GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views |
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links |
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links, follower_count, personality_profile |
| GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) |
| GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques |
| GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques |
## Chat Endpoint (1)
| Method | Path | Auth | Purpose |
|--------|------|------|---------|
| POST | `/api/v1/chat` | None | Streaming Q&A — SSE response with sources, tokens, done event. See [[Chat-Engine]] |
**Request fields:** `query` (required, 1-1000 chars), `creator` (optional slug/UUID), `conversation_id` (optional UUID for multi-turn threading)
## Auth Endpoints (4)
All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
@ -42,6 +50,28 @@ All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
| GET | `/auth/me` | Bearer JWT | Current user profile. Returns UserResponse. |
| PUT | `/auth/me` | Bearer JWT | Update display_name and/or password (requires current_password for password changes). Returns UserResponse. |
## Follow Endpoints (4) — M022/S02
All require Bearer JWT.
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/v1/follows/{creator_id}` | Follow a creator (idempotent via INSERT ON CONFLICT DO NOTHING) |
| DELETE | `/api/v1/follows/{creator_id}` | Unfollow a creator |
| GET | `/api/v1/follows/{creator_id}/status` | Check if current user follows this creator |
| GET | `/api/v1/follows/me` | List creators the current user follows |
## Creator Highlight Endpoints (4) — M022/S01
Creator-scoped highlight review. Requires Bearer JWT with creator ownership.
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters) |
| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) |
| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end |
## Consent Endpoints (5)
All under prefix `/api/v1/consent/`. All require Bearer JWT.
@ -54,16 +84,6 @@ All under prefix `/api/v1/consent/`. All require Bearer JWT.
| GET | `/consent/videos/{video_id}/history` | Creator (owner) or Admin | Versioned audit trail of consent changes for a video. |
| GET | `/consent/admin/summary` | Admin only | Aggregate consent flag counts across all videos. |
### Consent Fields
Three boolean consent flags per video, each independently toggleable:
| Field | Default | Meaning |
|-------|---------|---------|
| `kb_inclusion` | false | Allow indexing into knowledge base |
| `training_usage` | false | Allow use for model training |
| `public_display` | true | Allow public display on site |
## Report Endpoints (3)
| Method | Path | Purpose |
@ -72,7 +92,9 @@ Three boolean consent flags per video, each independently toggleable:
| GET | `/api/v1/admin/reports` | List all reports |
| PATCH | `/api/v1/admin/reports/{id}` | Update report status |
## Pipeline Admin Endpoints (20+)
## Admin Endpoints
### Pipeline Admin (20+)
All under prefix `/api/v1/admin/pipeline/`.
@ -100,52 +122,20 @@ All under prefix `/api/v1/admin/pipeline/`.
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
## Other Endpoints (2)
### Highlight Admin (4)
| Method | Path | Notes |
|--------|------|-------|
| POST | `/api/v1/ingest` | Transcript upload |
| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) |
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/admin/highlights/detect/{video_id}` | Score all KeyMoments for a video |
| POST | `/admin/highlights/detect-all` | Score all videos |
| GET | `/admin/highlights/candidates` | Paginated candidate list |
| GET | `/admin/highlights/candidates/{id}` | Single candidate with score_breakdown |
## Response Conventions
### Personality Extraction (1) — M022/S06
**Standard paginated response:**
```json
{
"items": [...],
"total": 83,
"offset": 0,
"limit": 20
}
```
**Known inconsistencies:**
- `GET /topics` returns bare list instead of paginated dict
- `GET /videos` returns bare list instead of paginated dict
- Search uses `items` key (not `results`)
- `/techniques/random` returns JSON `{slug}` (not HTTP redirect)
**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.**
## Authentication
JWT-based authentication added in M019. See [[Authentication]] for full details.
- **Public endpoints** (search, browse, techniques) require no auth
- **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
- **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)
---
*See also: [[Architecture]], [[Data-Model]], [[Frontend]], [[Authentication]]*
utput` | Delete all pipeline output |
| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization |
| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index |
| GET | `/admin/pipeline/worker-status` | Celery worker health |
| GET | `/admin/pipeline/recent-activity` | Recent pipeline events |
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality profile extraction task |
## Other Endpoints (2)
@ -178,9 +168,11 @@ utput` | Delete all pipeline output |
JWT-based authentication added in M019. See [[Authentication]] for full details.
- **Public endpoints** (search, browse, techniques) require no auth
- **Public endpoints** (search, browse, techniques, chat) require no auth
- **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
- **Follow endpoints** require Bearer JWT
- **Creator endpoints** (`/creator/*`) require Bearer JWT with creator ownership verification
- **Consent endpoints** require Bearer JWT with ownership verification
- **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)
---

@ -1,29 +1,33 @@
# Chat Engine
Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03.
Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03, expanded with multi-turn memory in M022/S04 and chat widget in M022/S03.
## Architecture
```
User types question in ChatPage
User types question in ChatPage or ChatWidget
POST /api/v1/chat { query: "...", creator?: "..." }
POST /api/v1/chat { query, creator?, conversation_id? }
ChatService.stream(query, creator?)
ChatService.stream(query, creator?, conversation_id?)
├─ 1. Retrieve: SearchService.search(query, creator)
├─ 1. Load history: Redis chrysopedia:chat:{conversation_id}
├─ 2. Retrieve: SearchService.search(query, creator)
│ └─ Uses 4-tier cascade if creator provided (see [[Search-Retrieval]])
├─ 2. Prompt: Assemble numbered context block into encyclopedic system prompt
├─ 3. Prompt: System prompt + history + numbered context + user message
│ └─ Sources formatted as [1] Title — Summary for citation mapping
├─ 3. Stream: openai.AsyncOpenAI with stream=True
├─ 4. Stream: openai.AsyncOpenAI with stream=True
│ └─ Tokens streamed as SSE events in real-time
├─ 5. Save history: Append user message + assistant response to Redis
SSE response → ChatPage renders tokens + citation links
SSE response → ChatPage/ChatWidget renders tokens + citation links
```
## SSE Protocol
@ -34,10 +38,28 @@ The chat endpoint returns a `text/event-stream` response with four event types i
|-------|---------|------|
| `sources` | `[{title, slug, creator_name, summary}]` | First — citation metadata for link rendering |
| `token` | `string` (text chunk) | Repeated — streamed LLM completion tokens |
| `done` | `{cascade_tier: "creator"\|"domain"\|"global"\|"none"\|""}` | Once — signals completion, includes which retrieval tier answered |
| `done` | `{cascade_tier, conversation_id}` | Once — signals completion, includes retrieval tier and conversation ID |
| `error` | `{message: string}` | On failure — emitted if LLM errors mid-stream |
The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context (see [[Search-Retrieval]]).
The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context. The `conversation_id` enables the frontend to thread follow-up messages.
## Multi-Turn Conversation Memory (M022/S04)
### Redis Storage
- **Key pattern:** `chrysopedia:chat:{conversation_id}`
- **Format:** Single JSON string containing a list of `{role, content}` message dicts
- **TTL:** 1 hour, refreshed on each interaction
- **Cap:** 10 turn pairs (20 messages) — oldest pairs trimmed when exceeded
### Conversation Flow
1. Client sends `conversation_id` in POST body (or omits for new conversation)
2. Server auto-generates UUID when `conversation_id` is omitted
3. History loaded from Redis and injected between system prompt and user message
4. Assistant response accumulated during streaming
5. User message + assistant response appended to history in Redis
6. `conversation_id` returned in SSE `done` event for threading
## Citation Format
@ -45,7 +67,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
- `[1]` → links to `/techniques/:slug` for the corresponding source
- Multiple citations supported: `[1][3]` or `[1,3]`
- Citation regex: `/\[(\d+)\]/g` parsed locally in ChatPage
- Citation regex: `/\[(\d+)\]/g` parsed locally in both ChatPage and ChatWidget
## API Endpoint
@ -55,6 +77,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
|-------|------|----------|------------|
| `query` | string | Yes | 11000 characters |
| `creator` | string | No | Creator UUID or slug for scoped retrieval |
| `conversation_id` | string | No | UUID for multi-turn threading. Auto-generated if omitted. |
**Response:** `text/event-stream` (SSE)
@ -65,9 +88,11 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
Located in `backend/chat_service.py`. The retrieve-prompt-stream pipeline:
1. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
2. **Prompt** — Builds a numbered context block from search results. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
3. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI` (configured to point at DGX Sparks Qwen or local Ollama). Yields SSE events as tokens arrive.
1. **Load History**`_load_history()` reads from Redis key `chrysopedia:chat:{conversation_id}`. Returns empty list if key absent.
2. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
3. **Prompt** — Builds message array: system prompt → conversation history → numbered context block → user message. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
4. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI`. Yields SSE events as tokens arrive.
5. **Save History**`_save_history()` appends the user message and accumulated assistant response to Redis. Trims to 10 turn pairs if exceeded. Refreshes TTL to 1 hour.
Error handling: If the LLM fails mid-stream (after some tokens have been sent), an `error` event is emitted so the frontend can display a failure message rather than leaving the response hanging.
@ -77,38 +102,63 @@ Route: `/chat` (lazy-loaded, code-split)
### Components
- **Text input + submit button** — Query entry with Enter-to-submit
- **Multi-message conversation UI** — Messages array with conversation bubble layout
- **Conversation threading**`conversationId` state, "New conversation" button to reset
- **Streaming message display** — Accumulates tokens with blinking cursor animation during streaming
- **Citation markers**`[N]` parsed to superscript links targeting `/techniques/:slug`
- **Source list** — Numbered sources with creator attribution displayed below the response
- **States:** Loading (streaming indicator), error (message display), empty (placeholder prompt)
- **Typing indicator** — Three-dot animation while streaming
- **Citation markers**`[N]` parsed to superscript links targeting `/techniques/:slug` (per-message)
- **Source list** — Numbered sources with creator attribution displayed below each response
- **Auto-scroll** — Scrolls to bottom as new tokens arrive
### SSE Client
Located in `frontend/src/api/chat.ts`. Uses `fetch()` + `ReadableStream` with typed callbacks:
```typescript
streamChat(query, creator?, {
streamChat(query, {
onSources: (sources) => void,
onToken: (token) => void,
onDone: (data) => void,
onDone: (data: ChatDoneMeta) => void,
onError: (error) => void,
})
}, creatorName?, conversationId?)
```
`ChatDoneMeta` type includes `cascade_tier` and `conversation_id` fields.
## Frontend: ChatWidget (M022/S03)
Floating chat bubble on creator detail pages. Fixed-position bottom-right.
### Behavior
- **Bubble** → click → **slide-up panel** with conversation UI
- Creator-scoped: passes `creatorName` to `streamChat()` for retrieval cascade
- **Suggested questions** generated client-side from technique titles and categories
- **Typing indicator** — three-dot animation during streaming
- **Citation links** — parsed from response, linked to technique pages
- **Responsive** — full-width below 640px, 400px panel on desktop
- **Conversation threading**`conversationId` generated via `crypto.randomUUID()` on first send, threaded through `streamChat()`, updated from done event
- **Reset on close** — messages and conversationId cleared when panel closes
## Key Files
- `backend/chat_service.py` — ChatService retrieve-prompt-stream pipeline
- `backend/routers/chat.py` — POST /api/v1/chat endpoint
- `frontend/src/api/chat.ts` — SSE client utility
- `frontend/src/pages/ChatPage.tsx` — Chat UI page component
- `frontend/src/pages/ChatPage.module.css` — Chat page styles
- `backend/chat_service.py` — ChatService with history load/save, retrieve-prompt-stream pipeline
- `backend/routers/chat.py` — POST /api/v1/chat endpoint with conversation_id support
- `backend/tests/test_chat.py` — 13 tests (6 streaming + 7 conversation memory)
- `frontend/src/api/chat.ts` — SSE client with conversationId param and ChatDoneMeta type
- `frontend/src/pages/ChatPage.tsx` — Multi-message conversation UI
- `frontend/src/pages/ChatPage.module.css` — Conversation bubble layout styles
- `frontend/src/components/ChatWidget.tsx` — Floating chat widget component
- `frontend/src/components/ChatWidget.module.css` — Widget styles (38 custom property refs)
## Design Decisions
- **Standalone ASGI test client pattern** — Tests use mocked DB to avoid PostgreSQL dependency, enabling fast CI runs
- **Patch `openai.AsyncOpenAI` constructor** rather than instance attribute for reliable test mocking
- **Local citation regex** in ChatPage rather than importing from utils — link targets differ from technique page citations
- **Redis JSON string** — Conversation history stored as single JSON value (atomic read/write) rather than Redis list type
- **Auto-generate conversation_id** — Server creates UUID when client omits it, ensuring consistent `done` event shape
- **Widget resets on close** — Clean slate UX; no persistence across open/close cycles
- **Client-side suggested questions** — Generated from technique titles/categories without API call
- **Citation parsing duplicated** — ChatPage and ChatWidget each parse citations independently (extracted utility deferred)
- **Standalone ASGI test client** — Tests use mocked DB to avoid PostgreSQL dependency
---

@ -1,6 +1,6 @@
# Data Model
18 SQLAlchemy models in `backend/models.py`.
20 SQLAlchemy models in `backend/models.py`.
## Entity Relationship Overview
@ -17,6 +17,8 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
│ ├──→ (N) RelatedTechniqueLink
│ └──→ (M:N) SourceVideo (via TechniquePageVideo)
├──→ (N) CreatorFollow ←── User
└──→ (0..1) User ──→ (N) InviteCode (created_by)
```
@ -34,6 +36,7 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
| bio | Text | Admin-editable |
| social_links | JSONB | Platform → URL mapping |
| featured | Boolean | For homepage spotlight |
| personality_profile | JSONB | LLM-extracted personality data (M022/S06). See [[Personality-Profiles]] |
### SourceVideo
@ -101,6 +104,33 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
| content_snapshot | JSONB | Full page state at version time |
| pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config |
### HighlightCandidate
| Field | Type | Notes |
|-------|------|-------|
| id | UUID PK | |
| key_moment_id | FK → KeyMoment | Unique constraint |
| source_video_id | FK → SourceVideo | Indexed |
| score | Float | Composite score 0.01.0 |
| score_breakdown | JSONB | Per-dimension scores (10 fields, see [[Highlights]]) |
| duration_secs | Float | Cached from KeyMoment |
| status | Enum(HighlightStatus) | candidate / approved / rejected |
| trim_start | Float | Nullable — trim offset in seconds (M022/S01) |
| trim_end | Float | Nullable — trim offset in seconds (M022/S01) |
| created_at | Timestamp | |
| updated_at | Timestamp | |
### CreatorFollow (M022/S02)
| Field | Type | Notes |
|-------|------|-------|
| id | UUID PK | |
| user_id | FK → User | Part of unique constraint |
| creator_id | FK → Creator | Part of unique constraint |
| created_at | Timestamp | |
Unique constraint on `(user_id, creator_id)`. Idempotent follow via `INSERT ON CONFLICT DO NOTHING`.
## Authentication & User Models
### User
@ -192,20 +222,17 @@ Append-only versioned record of per-field consent changes.
| **HighlightStatus** | candidate, approved, rejected (M021/S04) |
| **ChapterStatus** | draft, approved, hidden (M021/S06) |
## Migrations
| Migration | Description |
|-----------|-------------|
| 019 | Add highlight_candidates table |
| 021 | Add trim_start/trim_end to highlight_candidates (M022/S01) |
| 022 | Add creator_follows table (M022/S02) |
| 023 | Add personality_profile JSONB to creators (M022/S06) |
## Schema Notes
- **No Alembic migrations** — schema changes currently require manual DDL
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
- **User passwords** are stored as bcrypt hashes via `bcrypt.hashpw()`
- **Consent audit** uses version numbers assigned in application code (`max(version) + 1` per video_consent_id)
---
*See also: [[Architecture]], [[API-Surface]], [[Pipeline]], [[Authentication]]*
changes currently require manual DDL
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns

@ -31,12 +31,26 @@ Architectural and pattern decisions made during Chrysopedia development. Append-
| D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 |
| D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure |
## M021 Decisions
## Authentication & Infrastructure Decisions
| # | When | Decision | Choice | Rationale |
|---|------|----------|--------|-----------|
| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score; retrieval order is the only signal |
| D040 | M021/S02 | Creator-scoped retrieval strategy | 4-tier cascade: creator → domain → global → none | Progressive widening ensures results while preferring creator context; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
| D036 | M019/S02 | JWT auth configuration | HS256 with existing app_secret_key, 24h expiry, OAuth2PasswordBearer | Reuses existing secret; integrates with FastAPI dependency injection |
| D037 | — | Search impressions query | Exact case-insensitive title match via EXISTS subquery against SearchLog | MVP approach; expandable to ILIKE later |
| D038 | — | Primary git remote | git.xpltd.co (Forgejo) instead of github.com | Consolidating on self-hosted Forgejo; wiki already there |
## Search & Retrieval Decisions
| # | When | Decision | Choice | Rationale |
|---|------|----------|--------|-----------|
| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score |
| D040 | M021/S02 | Creator-scoped retrieval | 4-tier cascade: creator → domain → global → none | Progressive widening; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
## M022 Decisions
| # | When | Decision | Choice | Rationale |
|---|------|----------|--------|-----------|
| D041 | M022/S05 | Highlight scorer weight distribution | 10 dimensions: original 7 reduced proportionally, 3 audio proxy dims get 0.22 total weight. Neutral fallback (0.5) when word_timings unavailable. | Audio proxy signals from word-level timing data; neutral fallback preserves backward compatibility |
## UI/UX Decisions

@ -10,10 +10,13 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
| `/search` | SearchResults | Public | Sort, highlights, partial matches |
| `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations |
| `/creators` | CreatorsBrowse | Public | Random default sort, genre filters |
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list |
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list, follow button, personality profile, chat widget |
| `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics |
| `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques |
| `/chat` | ChatPage | Public | Multi-message conversation UI with threading |
| `/about` | About | Public | Static project info |
| `/creator/highlights` | HighlightQueue | Creator JWT | Highlight review queue with filter tabs (M022/S01) |
| `/creator/tiers` | CreatorTiers | Creator JWT | Free/Pro/Premium tier cards with Coming Soon modals (M022/S02) |
| `/admin/reports` | AdminReports | Admin* | Content reports |
| `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management |
| `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin |
@ -38,6 +41,51 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
| CopyLinkButton | Clipboard copy with tooltip |
| SocialIcons | Social media link icons (9 platforms) |
| ReportIssueModal | Content report submission |
| ChatWidget | Floating chat bubble on creator pages — SSE streaming, citations, suggested questions (M022/S03) |
| PersonalityProfile | Collapsible creator personality display — 3 sub-cards (Teaching Style, Vocabulary, Style) (M022/S06) |
## Feature Pages (M022)
### HighlightQueue (M022/S01)
Creator-scoped highlight review page at `/creator/highlights`.
- **Filter tabs** — All / Shorts / Approved / Rejected
- **Candidate cards** — Title, duration, composite score, status badge
- **Score breakdown bars** — 10-dimension visual bars (fetched lazily on expand)
- **Action buttons** — Approve / Discard with ownership verification
- **Inline trim panel** — Validated trim_start / trim_end inputs
- **Files:** `HighlightQueue.tsx`, `HighlightQueue.module.css`, `highlights.ts` (API)
### CreatorTiers (M022/S02)
Tier configuration at `/creator/tiers`.
- **Three cards** — Free (active), Pro, Premium
- **Coming Soon modals** — Styled placeholders per D033 (Stripe deferred to Phase 3)
- **Files:** `CreatorTiers.tsx`, `CreatorTiers.module.css`
### ChatWidget (M022/S03)
Floating chat on creator detail pages.
- **Fixed-position bubble** (bottom-right) → slide-up conversation panel
- **Creator-scoped** — passes creatorName to streamChat() for retrieval cascade
- **Suggested questions** — client-side from technique titles/categories
- **Streaming SSE** — tokens, citations, typing indicator
- **Responsive** — full-width below 640px, 400px panel on desktop
- **Conversation threading** — conversationId via crypto.randomUUID(), resets on close
- **Files:** `ChatWidget.tsx`, `ChatWidget.module.css`
### PersonalityProfile (M022/S06)
Collapsible personality display on creator detail pages.
- **Grid-template-rows animation** — 0fr → 1fr for smooth expand/collapse
- **Three sub-cards:** Teaching Style, Vocabulary, Style
- **Pill badges** for phrases/terms, checkmark/cross for boolean markers
- **Gracefully hidden** when profile is null
- **Files:** `PersonalityProfile.tsx`, styles in `App.css`
## Hooks
@ -45,19 +93,22 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
|------|---------|
| useCountUp | Animated counter for homepage stats |
| useSortPreference | Persists sort preference in localStorage |
| useDocumentTitle | Sets `<title>` per page (all 10 pages instrumented) |
| useDocumentTitle | Sets `<title>` per page (all pages instrumented) |
## State Management
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library.
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library. AuthProvider context for JWT auth state.
## API Client
Two API modules:
API modules:
- `public-client.ts` (~600 lines) — typed `request<T>` helper for REST endpoints
- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`
- `videos.ts` — chapter management functions (fetchChapters, fetchCreatorChapters, updateChapter, reorderChapters, approveChapters)
- `auth.ts` — authentication + impersonation functions including `fetchImpersonationLog()`
- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`, `ChatDoneMeta` type
- `videos.ts` — chapter management functions
- `auth.ts` — authentication + impersonation functions
- `highlights.ts` — creator highlight review functions (M022/S01)
- `follows.ts` — follow/unfollow/status/list functions (M022/S02)
- `creators.ts` — creator detail with personality_profile and follower_count types (M022/S02, S06)
Relative `/api/v1` base URL (nginx proxies to API container).
@ -66,26 +117,13 @@ Relative `/api/v1` base URL (nginx proxies to API container).
| Property | Value |
|----------|-------|
| File | `frontend/src/App.css` |
| Lines | 5,820 |
| Unique classes | ~589 |
| Lines | ~6,500+ |
| Naming | BEM (`block__element--modifier`) |
| Theme | Dark-only (no light mode) |
| Custom properties | 77 in `:root` (D017) |
| Accent color | Cyan `#22d3ee` |
| Font stack | System fonts |
| Preprocessor | None |
| CSS Modules | None |
### Custom Property Categories (77 total)
- **Surface colors:** page background, card backgrounds, nav, footer, input
- **Text colors:** primary, secondary, muted, inverse, link, heading
- **Accent colors:** primary cyan, hover/active, focus rings
- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories
- **Status colors:** Success/warning/error/info
- **Border colors:** Default, hover, focus, divider
- **Shadow colors:** Elevation, glow effects
- **Overlay colors:** Modal/dropdown overlays
| CSS Modules | Used for new components (HighlightQueue, CreatorTiers, ChatWidget, ChatPage) |
### Breakpoints
@ -93,7 +131,7 @@ Relative `/api/v1` base URL (nginx proxies to API container).
|-----------|-------|
| 480px | Narrow mobile — compact cards |
| 600px | Wider mobile — grid adjustments |
| 640px | Small tablet — content width |
| 640px | Small tablet / chat widget responsive break |
| 768px | Desktop ↔ mobile transition — sidebar collapse |
### Layout Patterns
@ -114,10 +152,3 @@ Relative `/api/v1` base URL (nginx proxies to API container).
---
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
ocalhost:8001`
- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container
---
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*

@ -1,10 +1,10 @@
# Highlight Detection
Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 7 weighted dimensions. Added in M021/S04.
Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 10 weighted dimensions. Originally added in M021/S04 with 7 dimensions, expanded to 10 in M022/S05.
## Overview
Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 7 dimensions derived from existing KeyMoment metadata.
Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 10 dimensions derived from existing KeyMoment metadata and word-level transcript timing data.
## Scoring Dimensions
@ -12,13 +12,22 @@ Total weight sums to 1.0. Each dimension produces a 0.01.0 score.
| Dimension | Weight | What It Measures |
|-----------|--------|-----------------|
| `duration_fitness` | 0.25 | Piecewise linear curve peaking at 3060 seconds (ideal clip length) |
| `content_type` | 0.20 | Content type favorability: tutorial > tip > walkthrough > exploration |
| `specificity_density` | 0.20 | Regex-based counting of specific units, ratios, and named parameters in summary text |
| `plugin_richness` | 0.10 | Number of plugins/VSTs referenced (more = more actionable) |
| `transcript_energy` | 0.10 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
| `source_quality` | 0.10 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
| `video_type` | 0.05 | Video type favorability mapping |
| `duration_fitness` | 0.20 | Piecewise linear curve peaking at 3060 seconds (ideal clip length) |
| `content_type` | 0.16 | Content type favorability: tutorial > tip > walkthrough > exploration |
| `specificity_density` | 0.16 | Regex-based counting of specific units, ratios, and named parameters in summary text |
| `plugin_richness` | 0.08 | Number of plugins/VSTs referenced (more = more actionable) |
| `transcript_energy` | 0.08 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
| `source_quality` | 0.08 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
| `video_type` | 0.02 | Video type favorability mapping |
| `speech_rate_variance` | ~0.07 | Coefficient of variation of words-per-second in 5s sliding windows |
| `pause_density` | ~0.08 | Count and weight of inter-word gaps (>0.5s short, >1.0s long) |
| `speaking_pace` | ~0.07 | Bell-curve fitness around optimal 35 WPS teaching pace |
### Audio Proxy Dimensions (M022/S05)
The three new dimensions (speech_rate_variance, pause_density, speaking_pace) are derived from **word-level transcript timing data** — not raw audio. This provides meaningful speech-pattern signals without requiring librosa or audio processing dependencies.
**Neutral fallback:** When `word_timings` are unavailable (no word-level data in transcript), all three audio proxy dimensions default to **0.5** (neutral score). This preserves backward compatibility — existing scoring paths are unaffected. The weights of the original 7 dimensions were reduced proportionally to accommodate the new 0.22 total weight for audio dimensions (D041).
### Duration Fitness Curve
@ -36,12 +45,14 @@ Uses piecewise linear (not Gaussian) for predictability:
| Field | Type | Notes |
|-------|------|-------|
| id | UUID PK | |
| key_moment_id | FK → KeyMoment | Unique constraint (`uq_highlight_candidate_moment`) |
| key_moment_id | FK → KeyMoment | Unique constraint (`highlight_candidates_key_moment_id_key`) |
| source_video_id | FK → SourceVideo | Indexed |
| score | Float | Composite score 0.01.0 |
| score_breakdown | JSONB | Per-dimension scores (7 fields) |
| score_breakdown | JSONB | Per-dimension scores (10 fields) |
| duration_secs | Float | Cached from KeyMoment for display |
| status | Enum(HighlightStatus) | candidate / approved / rejected |
| trim_start | Float | Nullable — trim start offset in seconds (M022/S01) |
| trim_end | Float | Nullable — trim end offset in seconds (M022/S01) |
| created_at | Timestamp | |
| updated_at | Timestamp | |
@ -59,12 +70,15 @@ Uses piecewise linear (not Gaussian) for predictability:
- `score` DESC — rank ordering
- `status` — filter by review state
### Migration
### Migrations
Alembic migration `019_add_highlight_candidates.py` creates the table with all indexes and the named unique constraint.
- `019_add_highlight_candidates.py` — Creates table with indexes and unique constraint
- `021_add_highlight_trim_columns.py` — Adds trim_start and trim_end columns (M022/S01)
## API Endpoints
### Admin Endpoints
All under `/api/v1/admin/highlights/`. Admin access.
| Method | Path | Purpose |
@ -74,37 +88,31 @@ All under `/api/v1/admin/highlights/`. Admin access.
| GET | `/admin/highlights/candidates` | Paginated candidate list, sorted by score DESC |
| GET | `/admin/highlights/candidates/{id}` | Single candidate with full `score_breakdown` |
### Detect Response
### Creator Endpoints (M022/S01)
Creator-scoped highlight review. Requires JWT auth with creator ownership verification.
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters, score DESC) |
| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) with ownership verification |
| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end (validation: non-negative, start < end) |
### Score Breakdown Response
```json
{
"video_id": "uuid",
"candidates_created": 12,
"candidates_updated": 0
}
```
### Candidate Response
```json
{
"id": "uuid",
"key_moment_id": "uuid",
"source_video_id": "uuid",
"score": 0.847,
"score_breakdown": {
"duration_fitness": 0.95,
"content_type_weight": 0.80,
"specificity_density": 0.72,
"plugin_richness": 0.60,
"transcript_energy": 0.85,
"source_quality_weight": 1.00,
"video_type_weight": 0.50
},
"duration_secs": 45.0,
"status": "candidate",
"created_at": "...",
"updated_at": "..."
"duration_fitness": 0.95,
"content_type_weight": 0.80,
"specificity_density": 0.72,
"plugin_richness": 0.60,
"transcript_energy": 0.85,
"source_quality_weight": 1.00,
"video_type_weight": 0.50,
"speech_rate_variance_score": 0.057,
"pause_density_score": 0.0,
"speaking_pace_score": 1.0
}
```
@ -114,30 +122,55 @@ All under `/api/v1/admin/highlights/`. Admin access.
- **Binding:** `bind=True, max_retries=3`
- **Session:** Uses `_get_sync_session` (sync SQLAlchemy, per D004)
- **Flow:** Load KeyMoments for video → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on named constraint `uq_highlight_candidate_moment`
- **Flow:** Load KeyMoments for video → load transcript JSON → extract word timings per moment → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on constraint `highlight_candidates_key_moment_id_key`
- **Transcript handling:** Loads transcript JSON once per video via `SourceVideo.transcript_path`. Accepts both `{segments: [...]}` and bare `[...]` JSON formats.
- **Fallback:** If transcript is missing or malformed, `word_timings=None` and scorer uses neutral values for audio dimensions
- **Events:** Emits `pipeline_events` rows for start/complete/error with candidate count in payload
### Scoring Function
`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict, returns `(score, breakdown_dict)`. This separation enables easy unit testing (28 tests, runs in 0.03s).
`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict and optional `word_timings` list, returns `(score, breakdown_dict)`. This separation enables easy unit testing (62 tests, runs in 0.09s).
### Word Timing Extraction
`extract_word_timings()` filters word-level timing dicts from transcript JSON by time window. Used by the Celery task to extract timings per KeyMoment before scoring.
## Frontend: Highlight Review Queue (M022/S01)
Route: `/creator/highlights` (JWT-protected, lazy-loaded)
### Components
- **Filter tabs** — All / Shorts / Approved / Rejected
- **Candidate cards** — Key moment title, duration, composite score, status badge
- **Score breakdown bars** — Visual bars for each of the 10 scoring dimensions (fetched lazily on expand)
- **Action buttons** — Approve / Discard with ownership verification
- **Inline trim panel** — Validated number inputs for trim_start / trim_end
- **Sidebar link** — Star icon in creator dashboard SidebarNav
## Design Decisions
- **Pure function scoring** — No DB or side effects in `score_moment()`, enabling fast unit tests
- **Piecewise linear duration** — Predictable behavior vs. Gaussian bell curve
- **Named unique constraint**`uq_highlight_candidate_moment` enables idempotent upserts via `ON CONFLICT`
- **Lazy import**`score_moment` imported inside Celery task to avoid circular imports at module load
- **Neutral fallback at 0.5** — New audio dimensions don't penalize moments without word-level timing data (D041)
- **Proportional weight reduction** — Original 7 dimensions reduced proportionally to make room for 0.22 audio weight
- **Lazy detail fetch** — Score breakdown fetched on expand, not on list load (avoids N+1)
- **Creator-scoped router** — Ownership verification pattern reusable for future creator endpoints
## Key Files
- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 7 dimensions
- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown, HighlightCandidateResponse, HighlightBatchResult)
- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 10 dimensions, word timing extraction
- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown with 10 fields)
- `backend/pipeline/stages.py``stage_highlight_detection` Celery task
- `backend/routers/highlights.py` — 4 admin API endpoints
- `backend/models.py` — HighlightCandidate model, HighlightStatus enum
- `alembic/versions/019_add_highlight_candidates.py` — Migration
- `backend/pipeline/test_highlight_scorer.py` — 28 unit tests
- `backend/routers/creator_highlights.py` — 4 creator-scoped endpoints (M022/S01)
- `backend/models.py` — HighlightCandidate model with trim columns
- `alembic/versions/019_add_highlight_candidates.py` — Initial migration
- `alembic/versions/021_add_highlight_trim_columns.py` — Trim columns migration
- `backend/pipeline/test_highlight_scorer.py` — 62 unit tests
- `frontend/src/pages/HighlightQueue.tsx` — Creator review queue page
- `frontend/src/api/highlights.ts` — Highlight API client
---
*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]]*
*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]], [[Frontend]]*

39
Home.md

@ -8,12 +8,38 @@ Producers can search for specific techniques and find timestamped key moments, s
- [[Architecture]] — System architecture, Docker services, network topology
- [[Data-Model]] — SQLAlchemy models, relationships, enums
- [[API-Surface]] — All 41 API endpoints grouped by domain
- [[API-Surface]] — All 60+ API endpoints grouped by domain
- [[Frontend]] — Routes, components, hooks, CSS architecture
- [[Pipeline]] — 6-stage LLM extraction pipeline, prompt system
- [[Chat-Engine]] — Streaming Q&A with multi-turn memory
- [[Highlights]] — 10-dimension highlight detection and review queue
- [[Personality-Profiles]] — LLM-extracted creator teaching personality
- [[Search-Retrieval]] — LightRAG + Qdrant retrieval cascade
- [[Deployment]] — Docker Compose setup, rebuild commands
- [[Development-Guide]] — Local dev setup, common gotchas
- [[Decisions]] — Architectural decisions register (D001–D035)
- [[Decisions]] — Architectural decisions register (D001D041)
## Features
### Core
- **Technique Pages** — LLM-synthesized study guides with v2 body sections, signal chains, citations
- **Search** — LightRAG primary + Qdrant fallback with 4-tier creator-scoped cascade
- **Pipeline** — 6-stage LLM extraction (transcripts → key moments → classification → synthesis → embedding)
- **Player** — Audio player with chapter markers
### Creator Tools
- **Follow System** — User-to-creator follows with follower counts (M022)
- **Personality Profiles** — LLM-extracted teaching style, vocabulary, and tone analysis (M022)
- **Creator Tiers** — Free/Pro/Premium tier configuration with Coming Soon placeholders (M022)
- **Highlight Detection v2** — 10-dimension scoring with audio proxy signals, creator review queue (M022)
- **Chat Widget** — Floating creator-scoped chat bubble with streaming SSE and citations (M022)
- **Multi-Turn Chat Memory** — Redis-backed conversation history with conversation_id threading (M022)
- **Creator Dashboard** — Video management, chapter editing, consent controls
### Platform
- **Authentication** — JWT with invite codes, admin/creator roles
- **Consent System** — Per-video granular consent with audit trail
- **Impersonation** — Admin-to-creator context switching with audit log
## Current Scale
@ -31,16 +57,11 @@ Producers can search for specific techniques and find timestamped key moments, s
| Database | PostgreSQL 16 |
| Cache/Broker | Redis 7 |
| Vector Store | Qdrant 1.13.2 |
| RAG Framework | LightRAG + NetworkX |
| Embeddings | Ollama (nomic-embed-text) |
| LLM | OpenAI-compatible API (DGX Sparks Qwen primary, local Ollama fallback) |
| Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 |
---
*Last updated: 2026-04-04 — M021 chat engine, retrieval cascade, highlights, audio mode, chapters, impersonation write mode*
inx reverse proxy on nuc01 |
---
*Last updated: 2026-04-03 — M018/S02 initial bootstrap*
” M018/S02 initial bootstrap*
*Last updated: 2026-04-04 — M022 follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers*

132
Personality-Profiles.md Normal file

@ -0,0 +1,132 @@
# Personality Profiles
LLM-powered extraction of creator teaching personality from transcript analysis. Added in M022/S06.
## Overview
Personality profiles capture each creator's distinctive teaching style — vocabulary patterns, tonal qualities, and stylistic markers — by analyzing their transcript corpus with a structured LLM extraction pipeline. Profiles are stored as JSONB on the Creator model and displayed on creator detail pages.
## Extraction Pipeline
### Transcript Sampling
Three-tier sampling strategy based on total transcript size:
| Tier | Condition | Strategy |
|------|-----------|----------|
| Small | < 20K chars | Use all transcript text |
| Medium | 20K60K chars | 300-character excerpts per key moment |
| Large | > 60K chars | Topic-diverse random sampling via Redis classification data |
Large-tier sampling uses deterministic seeding and pulls from across topic categories to ensure the profile reflects the creator's full range, not just their most common topic.
### LLM Extraction
The prompt template at `prompts/personality_extraction.txt` instructs the LLM to analyze transcript excerpts and produce structured JSON. The LLM response is parsed and validated with a Pydantic model before storage.
**Celery task:** `extract_personality_profile` in `backend/pipeline/stages.py`
- Joins KeyMoment → SourceVideo to load transcripts
- Samples transcripts per the tier strategy
- Calls LLM with `response_model=object` for JSON mode
- Validates response with `PersonalityProfile` Pydantic model
- Stores result as JSONB on Creator row
- Emits pipeline_events for observability
### Error Handling
- Zero-transcript creators: early return, no profile
- Invalid JSON from LLM: retry
- Pydantic validation failure: retry
- Pipeline events track start/complete/error
## PersonalityProfile Schema
Stored as `Creator.personality_profile` JSONB column. Nested structure:
### VocabularyProfile
| Field | Type | Description |
|-------|------|-------------|
| signature_phrases | list[str] | Characteristic phrases the creator uses repeatedly |
| jargon_level | str | How technical their language is (e.g., "high", "moderate") |
| filler_words | list[str] | Common filler words/phrases |
| distinctive_terms | list[str] | Unique terminology or coined phrases |
### ToneProfile
| Field | Type | Description |
|-------|------|-------------|
| formality | str | Formal to casual spectrum |
| energy | str | Energy level descriptor |
| humor | str | Humor style/frequency |
| teaching_style | str | Overall teaching approach |
### StyleMarkersProfile
| Field | Type | Description |
|-------|------|-------------|
| explanation_approach | str | How they explain concepts |
| analogies | bool | Whether they use analogies frequently |
| sound_words | bool | Whether they use onomatopoeia / sound words |
| audience_engagement | str | How they address / engage viewers |
### Metadata
Each profile includes extraction metadata:
| Field | Description |
|-------|-------------|
| extracted_at | ISO timestamp of extraction |
| transcript_sample_size | Number of characters sampled |
| model_used | LLM model identifier |
## API
### Admin Trigger
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality extraction task |
Returns immediately — extraction runs asynchronously via Celery. Check `pipeline_events` for status.
### Creator Detail
`GET /api/v1/creators/{slug}` includes `personality_profile` field (null if not yet extracted).
## Frontend Component
`PersonalityProfile.tsx` — collapsible section on creator detail pages.
### Layout
- **Collapsible header** with chevron toggle (CSS `grid-template-rows: 0fr/1fr` animation)
- **Three sub-cards:**
- **Teaching Style** — formality, energy, humor, teaching_style, explanation_approach, audience_engagement
- **Vocabulary** — jargon_level summary, signature_phrases pills, filler_words pills, distinctive_terms pills
- **Style** — analogies (checkmark/cross), sound_words (checkmark/cross), summary paragraph
- **Metadata footer** — extraction date, sample size
Handles null profiles gracefully (renders nothing).
## Key Files
- `prompts/personality_extraction.txt` — LLM prompt template
- `backend/pipeline/stages.py``extract_personality_profile` Celery task, `_sample_creator_transcripts()` helper
- `backend/schemas.py` — PersonalityProfile, VocabularyProfile, ToneProfile, StyleMarkersProfile Pydantic models
- `backend/models.py` — Creator.personality_profile JSONB column
- `backend/routers/admin.py` — POST /admin/creators/{slug}/extract-profile endpoint
- `backend/routers/creators.py` — Passthrough in GET /creators/{slug}
- `alembic/versions/023_add_personality_profile.py` — Migration
- `frontend/src/components/PersonalityProfile.tsx` — Collapsible profile component
- `frontend/src/api/creators.ts` — TypeScript interfaces for profile sub-objects
## Design Decisions
- **3-tier transcript sampling** — Balances coverage vs. token cost. Topic-diverse random sampling for large creators prevents profile skew toward dominant topic.
- **Admin trigger endpoint** — On-demand extraction rather than automatic on ingest. Profiles are expensive (large LLM call) and only needed once per creator.
- **JSONB storage** — Profile schema may evolve; JSONB avoids migration for every field change.
---
*See also: [[Data-Model]], [[API-Surface]], [[Frontend]], [[Pipeline]]*

@ -14,6 +14,7 @@
- [[Chat-Engine]]
- [[Search-Retrieval]]
- [[Highlights]]
- [[Personality-Profiles]]
**Reference**
- [[API-Surface]]