docs: M022 wiki update — follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers
New page: Personality-Profiles (extraction pipeline, JSONB schema, frontend component) Updated: Home (M022 features), Highlights (10 dimensions, creator endpoints, trim), Chat-Engine (multi-turn memory, ChatWidget), Data-Model (CreatorFollow, personality_profile, trim columns), API-Surface (follow, creator highlight, personality endpoints), Frontend (new components/pages), Decisions (D036-D041), _Sidebar (Personality-Profiles link)
parent
eec99b6c7d
commit
49ab6d029a
9 changed files with 494 additions and 193 deletions
108
API-Surface.md
108
API-Surface.md
|
|
@ -1,6 +1,6 @@
|
|||
# API Surface
|
||||
|
||||
50 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
|
||||
61 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
|
||||
|
||||
## Public Endpoints (10)
|
||||
|
||||
|
|
@ -26,11 +26,19 @@ title, slug, topic_category, topic_tags, summary, body_sections, body_sections_f
|
|||
| Method | Path | Response Shape | Notes |
|
||||
|--------|------|---------------|-------|
|
||||
| GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views |
|
||||
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links |
|
||||
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links, follower_count, personality_profile |
|
||||
| GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) |
|
||||
| GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques |
|
||||
| GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques |
|
||||
|
||||
## Chat Endpoint (1)
|
||||
|
||||
| Method | Path | Auth | Purpose |
|
||||
|--------|------|------|---------|
|
||||
| POST | `/api/v1/chat` | None | Streaming Q&A — SSE response with sources, tokens, done event. See [[Chat-Engine]] |
|
||||
|
||||
**Request fields:** `query` (required, 1-1000 chars), `creator` (optional slug/UUID), `conversation_id` (optional UUID for multi-turn threading)
|
||||
|
||||
## Auth Endpoints (4)
|
||||
|
||||
All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
|
||||
|
|
@ -42,6 +50,28 @@ All under prefix `/api/v1/auth/`. JWT-protected except registration and login.
|
|||
| GET | `/auth/me` | Bearer JWT | Current user profile. Returns UserResponse. |
|
||||
| PUT | `/auth/me` | Bearer JWT | Update display_name and/or password (requires current_password for password changes). Returns UserResponse. |
|
||||
|
||||
## Follow Endpoints (4) — M022/S02
|
||||
|
||||
All require Bearer JWT.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/api/v1/follows/{creator_id}` | Follow a creator (idempotent via INSERT ON CONFLICT DO NOTHING) |
|
||||
| DELETE | `/api/v1/follows/{creator_id}` | Unfollow a creator |
|
||||
| GET | `/api/v1/follows/{creator_id}/status` | Check if current user follows this creator |
|
||||
| GET | `/api/v1/follows/me` | List creators the current user follows |
|
||||
|
||||
## Creator Highlight Endpoints (4) — M022/S01
|
||||
|
||||
Creator-scoped highlight review. Requires Bearer JWT with creator ownership.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters) |
|
||||
| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
|
||||
| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) |
|
||||
| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end |
|
||||
|
||||
## Consent Endpoints (5)
|
||||
|
||||
All under prefix `/api/v1/consent/`. All require Bearer JWT.
|
||||
|
|
@ -54,16 +84,6 @@ All under prefix `/api/v1/consent/`. All require Bearer JWT.
|
|||
| GET | `/consent/videos/{video_id}/history` | Creator (owner) or Admin | Versioned audit trail of consent changes for a video. |
|
||||
| GET | `/consent/admin/summary` | Admin only | Aggregate consent flag counts across all videos. |
|
||||
|
||||
### Consent Fields
|
||||
|
||||
Three boolean consent flags per video, each independently toggleable:
|
||||
|
||||
| Field | Default | Meaning |
|
||||
|-------|---------|---------|
|
||||
| `kb_inclusion` | false | Allow indexing into knowledge base |
|
||||
| `training_usage` | false | Allow use for model training |
|
||||
| `public_display` | true | Allow public display on site |
|
||||
|
||||
## Report Endpoints (3)
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|
|
@ -72,7 +92,9 @@ Three boolean consent flags per video, each independently toggleable:
|
|||
| GET | `/api/v1/admin/reports` | List all reports |
|
||||
| PATCH | `/api/v1/admin/reports/{id}` | Update report status |
|
||||
|
||||
## Pipeline Admin Endpoints (20+)
|
||||
## Admin Endpoints
|
||||
|
||||
### Pipeline Admin (20+)
|
||||
|
||||
All under prefix `/api/v1/admin/pipeline/`.
|
||||
|
||||
|
|
@ -100,52 +122,20 @@ All under prefix `/api/v1/admin/pipeline/`.
|
|||
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
|
||||
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
|
||||
|
||||
## Other Endpoints (2)
|
||||
### Highlight Admin (4)
|
||||
|
||||
| Method | Path | Notes |
|
||||
|--------|------|-------|
|
||||
| POST | `/api/v1/ingest` | Transcript upload |
|
||||
| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) |
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/admin/highlights/detect/{video_id}` | Score all KeyMoments for a video |
|
||||
| POST | `/admin/highlights/detect-all` | Score all videos |
|
||||
| GET | `/admin/highlights/candidates` | Paginated candidate list |
|
||||
| GET | `/admin/highlights/candidates/{id}` | Single candidate with score_breakdown |
|
||||
|
||||
## Response Conventions
|
||||
### Personality Extraction (1) — M022/S06
|
||||
|
||||
**Standard paginated response:**
|
||||
```json
|
||||
{
|
||||
"items": [...],
|
||||
"total": 83,
|
||||
"offset": 0,
|
||||
"limit": 20
|
||||
}
|
||||
```
|
||||
|
||||
**Known inconsistencies:**
|
||||
- `GET /topics` returns bare list instead of paginated dict
|
||||
- `GET /videos` returns bare list instead of paginated dict
|
||||
- Search uses `items` key (not `results`)
|
||||
- `/techniques/random` returns JSON `{slug}` (not HTTP redirect)
|
||||
|
||||
**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.**
|
||||
|
||||
## Authentication
|
||||
|
||||
JWT-based authentication added in M019. See [[Authentication]] for full details.
|
||||
|
||||
- **Public endpoints** (search, browse, techniques) require no auth
|
||||
- **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
|
||||
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
|
||||
- **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)
|
||||
|
||||
---
|
||||
|
||||
*See also: [[Architecture]], [[Data-Model]], [[Frontend]], [[Authentication]]*
|
||||
utput` | Delete all pipeline output |
|
||||
| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization |
|
||||
| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index |
|
||||
| GET | `/admin/pipeline/worker-status` | Celery worker health |
|
||||
| GET | `/admin/pipeline/recent-activity` | Recent pipeline events |
|
||||
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
|
||||
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality profile extraction task |
|
||||
|
||||
## Other Endpoints (2)
|
||||
|
||||
|
|
@ -178,9 +168,11 @@ utput` | Delete all pipeline output |
|
|||
|
||||
JWT-based authentication added in M019. See [[Authentication]] for full details.
|
||||
|
||||
- **Public endpoints** (search, browse, techniques) require no auth
|
||||
- **Public endpoints** (search, browse, techniques, chat) require no auth
|
||||
- **Auth endpoints** (`/auth/register`, `/auth/login`) are open; `/auth/me` requires Bearer JWT
|
||||
- **Consent endpoints** require Bearer JWT with ownership verification (creator must own the video, or be admin)
|
||||
- **Follow endpoints** require Bearer JWT
|
||||
- **Creator endpoints** (`/creator/*`) require Bearer JWT with creator ownership verification
|
||||
- **Consent endpoints** require Bearer JWT with ownership verification
|
||||
- **Admin endpoints** (`/admin/*`) are accessible to anyone with network access (auth planned for future milestone)
|
||||
|
||||
---
|
||||
|
|
|
|||
108
Chat-Engine.md
108
Chat-Engine.md
|
|
@ -1,29 +1,33 @@
|
|||
# Chat Engine
|
||||
|
||||
Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03.
|
||||
Streaming question-answering interface backed by LightRAG retrieval and LLM completion. Added in M021/S03, expanded with multi-turn memory in M022/S04 and chat widget in M022/S03.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
User types question in ChatPage
|
||||
User types question in ChatPage or ChatWidget
|
||||
│
|
||||
▼
|
||||
POST /api/v1/chat { query: "...", creator?: "..." }
|
||||
POST /api/v1/chat { query, creator?, conversation_id? }
|
||||
│
|
||||
▼
|
||||
ChatService.stream(query, creator?)
|
||||
ChatService.stream(query, creator?, conversation_id?)
|
||||
│
|
||||
├─ 1. Retrieve: SearchService.search(query, creator)
|
||||
├─ 1. Load history: Redis chrysopedia:chat:{conversation_id}
|
||||
│
|
||||
├─ 2. Retrieve: SearchService.search(query, creator)
|
||||
│ └─ Uses 4-tier cascade if creator provided (see [[Search-Retrieval]])
|
||||
│
|
||||
├─ 2. Prompt: Assemble numbered context block into encyclopedic system prompt
|
||||
├─ 3. Prompt: System prompt + history + numbered context + user message
|
||||
│ └─ Sources formatted as [1] Title — Summary for citation mapping
|
||||
│
|
||||
├─ 3. Stream: openai.AsyncOpenAI with stream=True
|
||||
├─ 4. Stream: openai.AsyncOpenAI with stream=True
|
||||
│ └─ Tokens streamed as SSE events in real-time
|
||||
│
|
||||
├─ 5. Save history: Append user message + assistant response to Redis
|
||||
│
|
||||
▼
|
||||
SSE response → ChatPage renders tokens + citation links
|
||||
SSE response → ChatPage/ChatWidget renders tokens + citation links
|
||||
```
|
||||
|
||||
## SSE Protocol
|
||||
|
|
@ -34,10 +38,28 @@ The chat endpoint returns a `text/event-stream` response with four event types i
|
|||
|-------|---------|------|
|
||||
| `sources` | `[{title, slug, creator_name, summary}]` | First — citation metadata for link rendering |
|
||||
| `token` | `string` (text chunk) | Repeated — streamed LLM completion tokens |
|
||||
| `done` | `{cascade_tier: "creator"\|"domain"\|"global"\|"none"\|""}` | Once — signals completion, includes which retrieval tier answered |
|
||||
| `done` | `{cascade_tier, conversation_id}` | Once — signals completion, includes retrieval tier and conversation ID |
|
||||
| `error` | `{message: string}` | On failure — emitted if LLM errors mid-stream |
|
||||
|
||||
The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context (see [[Search-Retrieval]]).
|
||||
The `cascade_tier` in the `done` event reveals which tier of the retrieval cascade served the context. The `conversation_id` enables the frontend to thread follow-up messages.
|
||||
|
||||
## Multi-Turn Conversation Memory (M022/S04)
|
||||
|
||||
### Redis Storage
|
||||
|
||||
- **Key pattern:** `chrysopedia:chat:{conversation_id}`
|
||||
- **Format:** Single JSON string containing a list of `{role, content}` message dicts
|
||||
- **TTL:** 1 hour, refreshed on each interaction
|
||||
- **Cap:** 10 turn pairs (20 messages) — oldest pairs trimmed when exceeded
|
||||
|
||||
### Conversation Flow
|
||||
|
||||
1. Client sends `conversation_id` in POST body (or omits for new conversation)
|
||||
2. Server auto-generates UUID when `conversation_id` is omitted
|
||||
3. History loaded from Redis and injected between system prompt and user message
|
||||
4. Assistant response accumulated during streaming
|
||||
5. User message + assistant response appended to history in Redis
|
||||
6. `conversation_id` returned in SSE `done` event for threading
|
||||
|
||||
## Citation Format
|
||||
|
||||
|
|
@ -45,7 +67,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
|
|||
|
||||
- `[1]` → links to `/techniques/:slug` for the corresponding source
|
||||
- Multiple citations supported: `[1][3]` or `[1,3]`
|
||||
- Citation regex: `/\[(\d+)\]/g` parsed locally in ChatPage
|
||||
- Citation regex: `/\[(\d+)\]/g` parsed locally in both ChatPage and ChatWidget
|
||||
|
||||
## API Endpoint
|
||||
|
||||
|
|
@ -55,6 +77,7 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
|
|||
|-------|------|----------|------------|
|
||||
| `query` | string | Yes | 1–1000 characters |
|
||||
| `creator` | string | No | Creator UUID or slug for scoped retrieval |
|
||||
| `conversation_id` | string | No | UUID for multi-turn threading. Auto-generated if omitted. |
|
||||
|
||||
**Response:** `text/event-stream` (SSE)
|
||||
|
||||
|
|
@ -65,9 +88,11 @@ The LLM is instructed to reference sources using numbered citations `[N]` in its
|
|||
|
||||
Located in `backend/chat_service.py`. The retrieve-prompt-stream pipeline:
|
||||
|
||||
1. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
|
||||
2. **Prompt** — Builds a numbered context block from search results. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
|
||||
3. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI` (configured to point at DGX Sparks Qwen or local Ollama). Yields SSE events as tokens arrive.
|
||||
1. **Load History** — `_load_history()` reads from Redis key `chrysopedia:chat:{conversation_id}`. Returns empty list if key absent.
|
||||
2. **Retrieve** — Calls `SearchService.search()` with the query and optional creator parameter. Gets back ranked technique page results with the cascade_tier.
|
||||
3. **Prompt** — Builds message array: system prompt → conversation history → numbered context block → user message. System prompt instructs the LLM to act as a music production encyclopedia, cite sources with `[N]` notation, and stay grounded in the provided context.
|
||||
4. **Stream** — Opens an async streaming completion via `openai.AsyncOpenAI`. Yields SSE events as tokens arrive.
|
||||
5. **Save History** — `_save_history()` appends the user message and accumulated assistant response to Redis. Trims to 10 turn pairs if exceeded. Refreshes TTL to 1 hour.
|
||||
|
||||
Error handling: If the LLM fails mid-stream (after some tokens have been sent), an `error` event is emitted so the frontend can display a failure message rather than leaving the response hanging.
|
||||
|
||||
|
|
@ -77,38 +102,63 @@ Route: `/chat` (lazy-loaded, code-split)
|
|||
|
||||
### Components
|
||||
|
||||
- **Text input + submit button** — Query entry with Enter-to-submit
|
||||
- **Multi-message conversation UI** — Messages array with conversation bubble layout
|
||||
- **Conversation threading** — `conversationId` state, "New conversation" button to reset
|
||||
- **Streaming message display** — Accumulates tokens with blinking cursor animation during streaming
|
||||
- **Citation markers** — `[N]` parsed to superscript links targeting `/techniques/:slug`
|
||||
- **Source list** — Numbered sources with creator attribution displayed below the response
|
||||
- **States:** Loading (streaming indicator), error (message display), empty (placeholder prompt)
|
||||
- **Typing indicator** — Three-dot animation while streaming
|
||||
- **Citation markers** — `[N]` parsed to superscript links targeting `/techniques/:slug` (per-message)
|
||||
- **Source list** — Numbered sources with creator attribution displayed below each response
|
||||
- **Auto-scroll** — Scrolls to bottom as new tokens arrive
|
||||
|
||||
### SSE Client
|
||||
|
||||
Located in `frontend/src/api/chat.ts`. Uses `fetch()` + `ReadableStream` with typed callbacks:
|
||||
|
||||
```typescript
|
||||
streamChat(query, creator?, {
|
||||
streamChat(query, {
|
||||
onSources: (sources) => void,
|
||||
onToken: (token) => void,
|
||||
onDone: (data) => void,
|
||||
onDone: (data: ChatDoneMeta) => void,
|
||||
onError: (error) => void,
|
||||
})
|
||||
}, creatorName?, conversationId?)
|
||||
```
|
||||
|
||||
`ChatDoneMeta` type includes `cascade_tier` and `conversation_id` fields.
|
||||
|
||||
## Frontend: ChatWidget (M022/S03)
|
||||
|
||||
Floating chat bubble on creator detail pages. Fixed-position bottom-right.
|
||||
|
||||
### Behavior
|
||||
|
||||
- **Bubble** → click → **slide-up panel** with conversation UI
|
||||
- Creator-scoped: passes `creatorName` to `streamChat()` for retrieval cascade
|
||||
- **Suggested questions** generated client-side from technique titles and categories
|
||||
- **Typing indicator** — three-dot animation during streaming
|
||||
- **Citation links** — parsed from response, linked to technique pages
|
||||
- **Responsive** — full-width below 640px, 400px panel on desktop
|
||||
- **Conversation threading** — `conversationId` generated via `crypto.randomUUID()` on first send, threaded through `streamChat()`, updated from done event
|
||||
- **Reset on close** — messages and conversationId cleared when panel closes
|
||||
|
||||
## Key Files
|
||||
|
||||
- `backend/chat_service.py` — ChatService retrieve-prompt-stream pipeline
|
||||
- `backend/routers/chat.py` — POST /api/v1/chat endpoint
|
||||
- `frontend/src/api/chat.ts` — SSE client utility
|
||||
- `frontend/src/pages/ChatPage.tsx` — Chat UI page component
|
||||
- `frontend/src/pages/ChatPage.module.css` — Chat page styles
|
||||
- `backend/chat_service.py` — ChatService with history load/save, retrieve-prompt-stream pipeline
|
||||
- `backend/routers/chat.py` — POST /api/v1/chat endpoint with conversation_id support
|
||||
- `backend/tests/test_chat.py` — 13 tests (6 streaming + 7 conversation memory)
|
||||
- `frontend/src/api/chat.ts` — SSE client with conversationId param and ChatDoneMeta type
|
||||
- `frontend/src/pages/ChatPage.tsx` — Multi-message conversation UI
|
||||
- `frontend/src/pages/ChatPage.module.css` — Conversation bubble layout styles
|
||||
- `frontend/src/components/ChatWidget.tsx` — Floating chat widget component
|
||||
- `frontend/src/components/ChatWidget.module.css` — Widget styles (38 custom property refs)
|
||||
|
||||
## Design Decisions
|
||||
|
||||
- **Standalone ASGI test client pattern** — Tests use mocked DB to avoid PostgreSQL dependency, enabling fast CI runs
|
||||
- **Patch `openai.AsyncOpenAI` constructor** rather than instance attribute for reliable test mocking
|
||||
- **Local citation regex** in ChatPage rather than importing from utils — link targets differ from technique page citations
|
||||
- **Redis JSON string** — Conversation history stored as single JSON value (atomic read/write) rather than Redis list type
|
||||
- **Auto-generate conversation_id** — Server creates UUID when client omits it, ensuring consistent `done` event shape
|
||||
- **Widget resets on close** — Clean slate UX; no persistence across open/close cycles
|
||||
- **Client-side suggested questions** — Generated from technique titles/categories without API call
|
||||
- **Citation parsing duplicated** — ChatPage and ChatWidget each parse citations independently (extracted utility deferred)
|
||||
- **Standalone ASGI test client** — Tests use mocked DB to avoid PostgreSQL dependency
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Data Model
|
||||
|
||||
18 SQLAlchemy models in `backend/models.py`.
|
||||
20 SQLAlchemy models in `backend/models.py`.
|
||||
|
||||
## Entity Relationship Overview
|
||||
|
||||
|
|
@ -17,6 +17,8 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
|
|||
│ ├──→ (N) RelatedTechniqueLink
|
||||
│ └──→ (M:N) SourceVideo (via TechniquePageVideo)
|
||||
│
|
||||
├──→ (N) CreatorFollow ←── User
|
||||
│
|
||||
└──→ (0..1) User ──→ (N) InviteCode (created_by)
|
||||
```
|
||||
|
||||
|
|
@ -34,6 +36,7 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
|
|||
| bio | Text | Admin-editable |
|
||||
| social_links | JSONB | Platform → URL mapping |
|
||||
| featured | Boolean | For homepage spotlight |
|
||||
| personality_profile | JSONB | LLM-extracted personality data (M022/S06). See [[Personality-Profiles]] |
|
||||
|
||||
### SourceVideo
|
||||
|
||||
|
|
@ -101,6 +104,33 @@ Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
|
|||
| content_snapshot | JSONB | Full page state at version time |
|
||||
| pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config |
|
||||
|
||||
### HighlightCandidate
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID PK | |
|
||||
| key_moment_id | FK → KeyMoment | Unique constraint |
|
||||
| source_video_id | FK → SourceVideo | Indexed |
|
||||
| score | Float | Composite score 0.0–1.0 |
|
||||
| score_breakdown | JSONB | Per-dimension scores (10 fields, see [[Highlights]]) |
|
||||
| duration_secs | Float | Cached from KeyMoment |
|
||||
| status | Enum(HighlightStatus) | candidate / approved / rejected |
|
||||
| trim_start | Float | Nullable — trim offset in seconds (M022/S01) |
|
||||
| trim_end | Float | Nullable — trim offset in seconds (M022/S01) |
|
||||
| created_at | Timestamp | |
|
||||
| updated_at | Timestamp | |
|
||||
|
||||
### CreatorFollow (M022/S02)
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID PK | |
|
||||
| user_id | FK → User | Part of unique constraint |
|
||||
| creator_id | FK → Creator | Part of unique constraint |
|
||||
| created_at | Timestamp | |
|
||||
|
||||
Unique constraint on `(user_id, creator_id)`. Idempotent follow via `INSERT ON CONFLICT DO NOTHING`.
|
||||
|
||||
## Authentication & User Models
|
||||
|
||||
### User
|
||||
|
|
@ -192,20 +222,17 @@ Append-only versioned record of per-field consent changes.
|
|||
| **HighlightStatus** | candidate, approved, rejected (M021/S04) |
|
||||
| **ChapterStatus** | draft, approved, hidden (M021/S06) |
|
||||
|
||||
## Migrations
|
||||
|
||||
| Migration | Description |
|
||||
|-----------|-------------|
|
||||
| 019 | Add highlight_candidates table |
|
||||
| 021 | Add trim_start/trim_end to highlight_candidates (M022/S01) |
|
||||
| 022 | Add creator_follows table (M022/S02) |
|
||||
| 023 | Add personality_profile JSONB to creators (M022/S06) |
|
||||
|
||||
## Schema Notes
|
||||
|
||||
- **No Alembic migrations** — schema changes currently require manual DDL
|
||||
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
|
||||
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
|
||||
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
|
||||
- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
|
||||
- **User passwords** are stored as bcrypt hashes via `bcrypt.hashpw()`
|
||||
- **Consent audit** uses version numbers assigned in application code (`max(version) + 1` per video_consent_id)
|
||||
|
||||
---
|
||||
|
||||
*See also: [[Architecture]], [[API-Surface]], [[Pipeline]], [[Authentication]]*
|
||||
changes currently require manual DDL
|
||||
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
|
||||
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
|
||||
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
|
||||
|
|
|
|||
20
Decisions.md
20
Decisions.md
|
|
@ -31,12 +31,26 @@ Architectural and pattern decisions made during Chrysopedia development. Append-
|
|||
| D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 |
|
||||
| D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure |
|
||||
|
||||
## M021 Decisions
|
||||
## Authentication & Infrastructure Decisions
|
||||
|
||||
| # | When | Decision | Choice | Rationale |
|
||||
|---|------|----------|--------|-----------|
|
||||
| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score; retrieval order is the only signal |
|
||||
| D040 | M021/S02 | Creator-scoped retrieval strategy | 4-tier cascade: creator → domain → global → none | Progressive widening ensures results while preferring creator context; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
|
||||
| D036 | M019/S02 | JWT auth configuration | HS256 with existing app_secret_key, 24h expiry, OAuth2PasswordBearer | Reuses existing secret; integrates with FastAPI dependency injection |
|
||||
| D037 | — | Search impressions query | Exact case-insensitive title match via EXISTS subquery against SearchLog | MVP approach; expandable to ILIKE later |
|
||||
| D038 | — | Primary git remote | git.xpltd.co (Forgejo) instead of github.com | Consolidating on self-hosted Forgejo; wiki already there |
|
||||
|
||||
## Search & Retrieval Decisions
|
||||
|
||||
| # | When | Decision | Choice | Rationale |
|
||||
|---|------|----------|--------|-----------|
|
||||
| D039 | M021/S01 | LightRAG scoring strategy | Position-based (1.0 → 0.5 descending), sequential Qdrant fallback | `/query/data` has no numeric relevance score |
|
||||
| D040 | M021/S02 | Creator-scoped retrieval | 4-tier cascade: creator → domain → global → none | Progressive widening; `ll_keywords` for soft scoping; 3x oversampling for post-filter survival |
|
||||
|
||||
## M022 Decisions
|
||||
|
||||
| # | When | Decision | Choice | Rationale |
|
||||
|---|------|----------|--------|-----------|
|
||||
| D041 | M022/S05 | Highlight scorer weight distribution | 10 dimensions: original 7 reduced proportionally, 3 audio proxy dims get 0.22 total weight. Neutral fallback (0.5) when word_timings unavailable. | Audio proxy signals from word-level timing data; neutral fallback preserves backward compatibility |
|
||||
|
||||
## UI/UX Decisions
|
||||
|
||||
|
|
|
|||
91
Frontend.md
91
Frontend.md
|
|
@ -10,10 +10,13 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
|
|||
| `/search` | SearchResults | Public | Sort, highlights, partial matches |
|
||||
| `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations |
|
||||
| `/creators` | CreatorsBrowse | Public | Random default sort, genre filters |
|
||||
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list |
|
||||
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list, follow button, personality profile, chat widget |
|
||||
| `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics |
|
||||
| `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques |
|
||||
| `/chat` | ChatPage | Public | Multi-message conversation UI with threading |
|
||||
| `/about` | About | Public | Static project info |
|
||||
| `/creator/highlights` | HighlightQueue | Creator JWT | Highlight review queue with filter tabs (M022/S01) |
|
||||
| `/creator/tiers` | CreatorTiers | Creator JWT | Free/Pro/Premium tier cards with Coming Soon modals (M022/S02) |
|
||||
| `/admin/reports` | AdminReports | Admin* | Content reports |
|
||||
| `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management |
|
||||
| `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin |
|
||||
|
|
@ -38,6 +41,51 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
|
|||
| CopyLinkButton | Clipboard copy with tooltip |
|
||||
| SocialIcons | Social media link icons (9 platforms) |
|
||||
| ReportIssueModal | Content report submission |
|
||||
| ChatWidget | Floating chat bubble on creator pages — SSE streaming, citations, suggested questions (M022/S03) |
|
||||
| PersonalityProfile | Collapsible creator personality display — 3 sub-cards (Teaching Style, Vocabulary, Style) (M022/S06) |
|
||||
|
||||
## Feature Pages (M022)
|
||||
|
||||
### HighlightQueue (M022/S01)
|
||||
|
||||
Creator-scoped highlight review page at `/creator/highlights`.
|
||||
|
||||
- **Filter tabs** — All / Shorts / Approved / Rejected
|
||||
- **Candidate cards** — Title, duration, composite score, status badge
|
||||
- **Score breakdown bars** — 10-dimension visual bars (fetched lazily on expand)
|
||||
- **Action buttons** — Approve / Discard with ownership verification
|
||||
- **Inline trim panel** — Validated trim_start / trim_end inputs
|
||||
- **Files:** `HighlightQueue.tsx`, `HighlightQueue.module.css`, `highlights.ts` (API)
|
||||
|
||||
### CreatorTiers (M022/S02)
|
||||
|
||||
Tier configuration at `/creator/tiers`.
|
||||
|
||||
- **Three cards** — Free (active), Pro, Premium
|
||||
- **Coming Soon modals** — Styled placeholders per D033 (Stripe deferred to Phase 3)
|
||||
- **Files:** `CreatorTiers.tsx`, `CreatorTiers.module.css`
|
||||
|
||||
### ChatWidget (M022/S03)
|
||||
|
||||
Floating chat on creator detail pages.
|
||||
|
||||
- **Fixed-position bubble** (bottom-right) → slide-up conversation panel
|
||||
- **Creator-scoped** — passes creatorName to streamChat() for retrieval cascade
|
||||
- **Suggested questions** — client-side from technique titles/categories
|
||||
- **Streaming SSE** — tokens, citations, typing indicator
|
||||
- **Responsive** — full-width below 640px, 400px panel on desktop
|
||||
- **Conversation threading** — conversationId via crypto.randomUUID(), resets on close
|
||||
- **Files:** `ChatWidget.tsx`, `ChatWidget.module.css`
|
||||
|
||||
### PersonalityProfile (M022/S06)
|
||||
|
||||
Collapsible personality display on creator detail pages.
|
||||
|
||||
- **Grid-template-rows animation** — 0fr → 1fr for smooth expand/collapse
|
||||
- **Three sub-cards:** Teaching Style, Vocabulary, Style
|
||||
- **Pill badges** for phrases/terms, checkmark/cross for boolean markers
|
||||
- **Gracefully hidden** when profile is null
|
||||
- **Files:** `PersonalityProfile.tsx`, styles in `App.css`
|
||||
|
||||
## Hooks
|
||||
|
||||
|
|
@ -45,19 +93,22 @@ React 18 + TypeScript + Vite SPA. No UI library, no state management library, no
|
|||
|------|---------|
|
||||
| useCountUp | Animated counter for homepage stats |
|
||||
| useSortPreference | Persists sort preference in localStorage |
|
||||
| useDocumentTitle | Sets `<title>` per page (all 10 pages instrumented) |
|
||||
| useDocumentTitle | Sets `<title>` per page (all pages instrumented) |
|
||||
|
||||
## State Management
|
||||
|
||||
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library.
|
||||
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library. AuthProvider context for JWT auth state.
|
||||
|
||||
## API Client
|
||||
|
||||
Two API modules:
|
||||
API modules:
|
||||
- `public-client.ts` (~600 lines) — typed `request<T>` helper for REST endpoints
|
||||
- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`
|
||||
- `videos.ts` — chapter management functions (fetchChapters, fetchCreatorChapters, updateChapter, reorderChapters, approveChapters)
|
||||
- `auth.ts` — authentication + impersonation functions including `fetchImpersonationLog()`
|
||||
- `chat.ts` — SSE streaming client for POST /api/v1/chat using `fetch()` + `ReadableStream`, `ChatDoneMeta` type
|
||||
- `videos.ts` — chapter management functions
|
||||
- `auth.ts` — authentication + impersonation functions
|
||||
- `highlights.ts` — creator highlight review functions (M022/S01)
|
||||
- `follows.ts` — follow/unfollow/status/list functions (M022/S02)
|
||||
- `creators.ts` — creator detail with personality_profile and follower_count types (M022/S02, S06)
|
||||
|
||||
Relative `/api/v1` base URL (nginx proxies to API container).
|
||||
|
||||
|
|
@ -66,26 +117,13 @@ Relative `/api/v1` base URL (nginx proxies to API container).
|
|||
| Property | Value |
|
||||
|----------|-------|
|
||||
| File | `frontend/src/App.css` |
|
||||
| Lines | 5,820 |
|
||||
| Unique classes | ~589 |
|
||||
| Lines | ~6,500+ |
|
||||
| Naming | BEM (`block__element--modifier`) |
|
||||
| Theme | Dark-only (no light mode) |
|
||||
| Custom properties | 77 in `:root` (D017) |
|
||||
| Accent color | Cyan `#22d3ee` |
|
||||
| Font stack | System fonts |
|
||||
| Preprocessor | None |
|
||||
| CSS Modules | None |
|
||||
|
||||
### Custom Property Categories (77 total)
|
||||
|
||||
- **Surface colors:** page background, card backgrounds, nav, footer, input
|
||||
- **Text colors:** primary, secondary, muted, inverse, link, heading
|
||||
- **Accent colors:** primary cyan, hover/active, focus rings
|
||||
- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories
|
||||
- **Status colors:** Success/warning/error/info
|
||||
- **Border colors:** Default, hover, focus, divider
|
||||
- **Shadow colors:** Elevation, glow effects
|
||||
- **Overlay colors:** Modal/dropdown overlays
|
||||
| CSS Modules | Used for new components (HighlightQueue, CreatorTiers, ChatWidget, ChatPage) |
|
||||
|
||||
### Breakpoints
|
||||
|
||||
|
|
@ -93,7 +131,7 @@ Relative `/api/v1` base URL (nginx proxies to API container).
|
|||
|-----------|-------|
|
||||
| 480px | Narrow mobile — compact cards |
|
||||
| 600px | Wider mobile — grid adjustments |
|
||||
| 640px | Small tablet — content width |
|
||||
| 640px | Small tablet / chat widget responsive break |
|
||||
| 768px | Desktop ↔ mobile transition — sidebar collapse |
|
||||
|
||||
### Layout Patterns
|
||||
|
|
@ -114,10 +152,3 @@ Relative `/api/v1` base URL (nginx proxies to API container).
|
|||
---
|
||||
|
||||
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
|
||||
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
|
||||
ocalhost:8001`
|
||||
- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container
|
||||
|
||||
---
|
||||
|
||||
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
|
||||
|
|
|
|||
135
Highlights.md
135
Highlights.md
|
|
@ -1,10 +1,10 @@
|
|||
# Highlight Detection
|
||||
|
||||
Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 7 weighted dimensions. Added in M021/S04.
|
||||
Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 10 weighted dimensions. Originally added in M021/S04 with 7 dimensions, expanded to 10 in M022/S05.
|
||||
|
||||
## Overview
|
||||
|
||||
Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 7 dimensions derived from existing KeyMoment metadata.
|
||||
Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 10 dimensions derived from existing KeyMoment metadata and word-level transcript timing data.
|
||||
|
||||
## Scoring Dimensions
|
||||
|
||||
|
|
@ -12,13 +12,22 @@ Total weight sums to 1.0. Each dimension produces a 0.0–1.0 score.
|
|||
|
||||
| Dimension | Weight | What It Measures |
|
||||
|-----------|--------|-----------------|
|
||||
| `duration_fitness` | 0.25 | Piecewise linear curve peaking at 30–60 seconds (ideal clip length) |
|
||||
| `content_type` | 0.20 | Content type favorability: tutorial > tip > walkthrough > exploration |
|
||||
| `specificity_density` | 0.20 | Regex-based counting of specific units, ratios, and named parameters in summary text |
|
||||
| `plugin_richness` | 0.10 | Number of plugins/VSTs referenced (more = more actionable) |
|
||||
| `transcript_energy` | 0.10 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
|
||||
| `source_quality` | 0.10 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
|
||||
| `video_type` | 0.05 | Video type favorability mapping |
|
||||
| `duration_fitness` | 0.20 | Piecewise linear curve peaking at 30–60 seconds (ideal clip length) |
|
||||
| `content_type` | 0.16 | Content type favorability: tutorial > tip > walkthrough > exploration |
|
||||
| `specificity_density` | 0.16 | Regex-based counting of specific units, ratios, and named parameters in summary text |
|
||||
| `plugin_richness` | 0.08 | Number of plugins/VSTs referenced (more = more actionable) |
|
||||
| `transcript_energy` | 0.08 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
|
||||
| `source_quality` | 0.08 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
|
||||
| `video_type` | 0.02 | Video type favorability mapping |
|
||||
| `speech_rate_variance` | ~0.07 | Coefficient of variation of words-per-second in 5s sliding windows |
|
||||
| `pause_density` | ~0.08 | Count and weight of inter-word gaps (>0.5s short, >1.0s long) |
|
||||
| `speaking_pace` | ~0.07 | Bell-curve fitness around optimal 3–5 WPS teaching pace |
|
||||
|
||||
### Audio Proxy Dimensions (M022/S05)
|
||||
|
||||
The three new dimensions (speech_rate_variance, pause_density, speaking_pace) are derived from **word-level transcript timing data** — not raw audio. This provides meaningful speech-pattern signals without requiring librosa or audio processing dependencies.
|
||||
|
||||
**Neutral fallback:** When `word_timings` are unavailable (no word-level data in transcript), all three audio proxy dimensions default to **0.5** (neutral score). This preserves backward compatibility — existing scoring paths are unaffected. The weights of the original 7 dimensions were reduced proportionally to accommodate the new 0.22 total weight for audio dimensions (D041).
|
||||
|
||||
### Duration Fitness Curve
|
||||
|
||||
|
|
@ -36,12 +45,14 @@ Uses piecewise linear (not Gaussian) for predictability:
|
|||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID PK | |
|
||||
| key_moment_id | FK → KeyMoment | Unique constraint (`uq_highlight_candidate_moment`) |
|
||||
| key_moment_id | FK → KeyMoment | Unique constraint (`highlight_candidates_key_moment_id_key`) |
|
||||
| source_video_id | FK → SourceVideo | Indexed |
|
||||
| score | Float | Composite score 0.0–1.0 |
|
||||
| score_breakdown | JSONB | Per-dimension scores (7 fields) |
|
||||
| score_breakdown | JSONB | Per-dimension scores (10 fields) |
|
||||
| duration_secs | Float | Cached from KeyMoment for display |
|
||||
| status | Enum(HighlightStatus) | candidate / approved / rejected |
|
||||
| trim_start | Float | Nullable — trim start offset in seconds (M022/S01) |
|
||||
| trim_end | Float | Nullable — trim end offset in seconds (M022/S01) |
|
||||
| created_at | Timestamp | |
|
||||
| updated_at | Timestamp | |
|
||||
|
||||
|
|
@ -59,12 +70,15 @@ Uses piecewise linear (not Gaussian) for predictability:
|
|||
- `score` DESC — rank ordering
|
||||
- `status` — filter by review state
|
||||
|
||||
### Migration
|
||||
### Migrations
|
||||
|
||||
Alembic migration `019_add_highlight_candidates.py` creates the table with all indexes and the named unique constraint.
|
||||
- `019_add_highlight_candidates.py` — Creates table with indexes and unique constraint
|
||||
- `021_add_highlight_trim_columns.py` — Adds trim_start and trim_end columns (M022/S01)
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Admin Endpoints
|
||||
|
||||
All under `/api/v1/admin/highlights/`. Admin access.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|
|
@ -74,37 +88,31 @@ All under `/api/v1/admin/highlights/`. Admin access.
|
|||
| GET | `/admin/highlights/candidates` | Paginated candidate list, sorted by score DESC |
|
||||
| GET | `/admin/highlights/candidates/{id}` | Single candidate with full `score_breakdown` |
|
||||
|
||||
### Detect Response
|
||||
### Creator Endpoints (M022/S01)
|
||||
|
||||
Creator-scoped highlight review. Requires JWT auth with creator ownership verification.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| GET | `/api/v1/creator/highlights` | List highlights for authenticated creator (status/shorts_only filters, score DESC) |
|
||||
| GET | `/api/v1/creator/highlights/{id}` | Detail with score_breakdown and key_moment |
|
||||
| PATCH | `/api/v1/creator/highlights/{id}/status` | Update status (approve/reject) with ownership verification |
|
||||
| PATCH | `/api/v1/creator/highlights/{id}/trim` | Update trim_start/trim_end (validation: non-negative, start < end) |
|
||||
|
||||
### Score Breakdown Response
|
||||
|
||||
```json
|
||||
{
|
||||
"video_id": "uuid",
|
||||
"candidates_created": 12,
|
||||
"candidates_updated": 0
|
||||
}
|
||||
```
|
||||
|
||||
### Candidate Response
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"key_moment_id": "uuid",
|
||||
"source_video_id": "uuid",
|
||||
"score": 0.847,
|
||||
"score_breakdown": {
|
||||
"duration_fitness": 0.95,
|
||||
"content_type_weight": 0.80,
|
||||
"specificity_density": 0.72,
|
||||
"plugin_richness": 0.60,
|
||||
"transcript_energy": 0.85,
|
||||
"source_quality_weight": 1.00,
|
||||
"video_type_weight": 0.50
|
||||
},
|
||||
"duration_secs": 45.0,
|
||||
"status": "candidate",
|
||||
"created_at": "...",
|
||||
"updated_at": "..."
|
||||
"duration_fitness": 0.95,
|
||||
"content_type_weight": 0.80,
|
||||
"specificity_density": 0.72,
|
||||
"plugin_richness": 0.60,
|
||||
"transcript_energy": 0.85,
|
||||
"source_quality_weight": 1.00,
|
||||
"video_type_weight": 0.50,
|
||||
"speech_rate_variance_score": 0.057,
|
||||
"pause_density_score": 0.0,
|
||||
"speaking_pace_score": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -114,30 +122,55 @@ All under `/api/v1/admin/highlights/`. Admin access.
|
|||
|
||||
- **Binding:** `bind=True, max_retries=3`
|
||||
- **Session:** Uses `_get_sync_session` (sync SQLAlchemy, per D004)
|
||||
- **Flow:** Load KeyMoments for video → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on named constraint `uq_highlight_candidate_moment`
|
||||
- **Flow:** Load KeyMoments for video → load transcript JSON → extract word timings per moment → score each via `score_moment()` → bulk upsert via `INSERT ON CONFLICT` on constraint `highlight_candidates_key_moment_id_key`
|
||||
- **Transcript handling:** Loads transcript JSON once per video via `SourceVideo.transcript_path`. Accepts both `{segments: [...]}` and bare `[...]` JSON formats.
|
||||
- **Fallback:** If transcript is missing or malformed, `word_timings=None` and scorer uses neutral values for audio dimensions
|
||||
- **Events:** Emits `pipeline_events` rows for start/complete/error with candidate count in payload
|
||||
|
||||
### Scoring Function
|
||||
|
||||
`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict, returns `(score, breakdown_dict)`. This separation enables easy unit testing (28 tests, runs in 0.03s).
|
||||
`score_moment()` in `backend/pipeline/highlight_scorer.py` is a **pure function** — no DB access, no side effects. Takes a KeyMoment-like dict and optional `word_timings` list, returns `(score, breakdown_dict)`. This separation enables easy unit testing (62 tests, runs in 0.09s).
|
||||
|
||||
### Word Timing Extraction
|
||||
|
||||
`extract_word_timings()` filters word-level timing dicts from transcript JSON by time window. Used by the Celery task to extract timings per KeyMoment before scoring.
|
||||
|
||||
## Frontend: Highlight Review Queue (M022/S01)
|
||||
|
||||
Route: `/creator/highlights` (JWT-protected, lazy-loaded)
|
||||
|
||||
### Components
|
||||
|
||||
- **Filter tabs** — All / Shorts / Approved / Rejected
|
||||
- **Candidate cards** — Key moment title, duration, composite score, status badge
|
||||
- **Score breakdown bars** — Visual bars for each of the 10 scoring dimensions (fetched lazily on expand)
|
||||
- **Action buttons** — Approve / Discard with ownership verification
|
||||
- **Inline trim panel** — Validated number inputs for trim_start / trim_end
|
||||
- **Sidebar link** — Star icon in creator dashboard SidebarNav
|
||||
|
||||
## Design Decisions
|
||||
|
||||
- **Pure function scoring** — No DB or side effects in `score_moment()`, enabling fast unit tests
|
||||
- **Piecewise linear duration** — Predictable behavior vs. Gaussian bell curve
|
||||
- **Named unique constraint** — `uq_highlight_candidate_moment` enables idempotent upserts via `ON CONFLICT`
|
||||
- **Lazy import** — `score_moment` imported inside Celery task to avoid circular imports at module load
|
||||
- **Neutral fallback at 0.5** — New audio dimensions don't penalize moments without word-level timing data (D041)
|
||||
- **Proportional weight reduction** — Original 7 dimensions reduced proportionally to make room for 0.22 audio weight
|
||||
- **Lazy detail fetch** — Score breakdown fetched on expand, not on list load (avoids N+1)
|
||||
- **Creator-scoped router** — Ownership verification pattern reusable for future creator endpoints
|
||||
|
||||
## Key Files
|
||||
|
||||
- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 7 dimensions
|
||||
- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown, HighlightCandidateResponse, HighlightBatchResult)
|
||||
- `backend/pipeline/highlight_scorer.py` — Pure scoring function with 10 dimensions, word timing extraction
|
||||
- `backend/pipeline/highlight_schemas.py` — Pydantic schemas (HighlightScoreBreakdown with 10 fields)
|
||||
- `backend/pipeline/stages.py` — `stage_highlight_detection` Celery task
|
||||
- `backend/routers/highlights.py` — 4 admin API endpoints
|
||||
- `backend/models.py` — HighlightCandidate model, HighlightStatus enum
|
||||
- `alembic/versions/019_add_highlight_candidates.py` — Migration
|
||||
- `backend/pipeline/test_highlight_scorer.py` — 28 unit tests
|
||||
- `backend/routers/creator_highlights.py` — 4 creator-scoped endpoints (M022/S01)
|
||||
- `backend/models.py` — HighlightCandidate model with trim columns
|
||||
- `alembic/versions/019_add_highlight_candidates.py` — Initial migration
|
||||
- `alembic/versions/021_add_highlight_trim_columns.py` — Trim columns migration
|
||||
- `backend/pipeline/test_highlight_scorer.py` — 62 unit tests
|
||||
- `frontend/src/pages/HighlightQueue.tsx` — Creator review queue page
|
||||
- `frontend/src/api/highlights.ts` — Highlight API client
|
||||
|
||||
---
|
||||
|
||||
*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]]*
|
||||
*See also: [[Pipeline]], [[Data-Model]], [[API-Surface]], [[Frontend]]*
|
||||
|
|
|
|||
39
Home.md
39
Home.md
|
|
@ -8,12 +8,38 @@ Producers can search for specific techniques and find timestamped key moments, s
|
|||
|
||||
- [[Architecture]] — System architecture, Docker services, network topology
|
||||
- [[Data-Model]] — SQLAlchemy models, relationships, enums
|
||||
- [[API-Surface]] — All 41 API endpoints grouped by domain
|
||||
- [[API-Surface]] — All 60+ API endpoints grouped by domain
|
||||
- [[Frontend]] — Routes, components, hooks, CSS architecture
|
||||
- [[Pipeline]] — 6-stage LLM extraction pipeline, prompt system
|
||||
- [[Chat-Engine]] — Streaming Q&A with multi-turn memory
|
||||
- [[Highlights]] — 10-dimension highlight detection and review queue
|
||||
- [[Personality-Profiles]] — LLM-extracted creator teaching personality
|
||||
- [[Search-Retrieval]] — LightRAG + Qdrant retrieval cascade
|
||||
- [[Deployment]] — Docker Compose setup, rebuild commands
|
||||
- [[Development-Guide]] — Local dev setup, common gotchas
|
||||
- [[Decisions]] — Architectural decisions register (D001–D035)
|
||||
- [[Decisions]] — Architectural decisions register (D001–D041)
|
||||
|
||||
## Features
|
||||
|
||||
### Core
|
||||
- **Technique Pages** — LLM-synthesized study guides with v2 body sections, signal chains, citations
|
||||
- **Search** — LightRAG primary + Qdrant fallback with 4-tier creator-scoped cascade
|
||||
- **Pipeline** — 6-stage LLM extraction (transcripts → key moments → classification → synthesis → embedding)
|
||||
- **Player** — Audio player with chapter markers
|
||||
|
||||
### Creator Tools
|
||||
- **Follow System** — User-to-creator follows with follower counts (M022)
|
||||
- **Personality Profiles** — LLM-extracted teaching style, vocabulary, and tone analysis (M022)
|
||||
- **Creator Tiers** — Free/Pro/Premium tier configuration with Coming Soon placeholders (M022)
|
||||
- **Highlight Detection v2** — 10-dimension scoring with audio proxy signals, creator review queue (M022)
|
||||
- **Chat Widget** — Floating creator-scoped chat bubble with streaming SSE and citations (M022)
|
||||
- **Multi-Turn Chat Memory** — Redis-backed conversation history with conversation_id threading (M022)
|
||||
- **Creator Dashboard** — Video management, chapter editing, consent controls
|
||||
|
||||
### Platform
|
||||
- **Authentication** — JWT with invite codes, admin/creator roles
|
||||
- **Consent System** — Per-video granular consent with audit trail
|
||||
- **Impersonation** — Admin-to-creator context switching with audit log
|
||||
|
||||
## Current Scale
|
||||
|
||||
|
|
@ -31,16 +57,11 @@ Producers can search for specific techniques and find timestamped key moments, s
|
|||
| Database | PostgreSQL 16 |
|
||||
| Cache/Broker | Redis 7 |
|
||||
| Vector Store | Qdrant 1.13.2 |
|
||||
| RAG Framework | LightRAG + NetworkX |
|
||||
| Embeddings | Ollama (nomic-embed-text) |
|
||||
| LLM | OpenAI-compatible API (DGX Sparks Qwen primary, local Ollama fallback) |
|
||||
| Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-04-04 — M021 chat engine, retrieval cascade, highlights, audio mode, chapters, impersonation write mode*
|
||||
inx reverse proxy on nuc01 |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-04-03 — M018/S02 initial bootstrap*
|
||||
” M018/S02 initial bootstrap*
|
||||
*Last updated: 2026-04-04 — M022 follow system, personality profiles, highlight v2, chat widget, multi-turn memory, creator tiers*
|
||||
|
|
|
|||
132
Personality-Profiles.md
Normal file
132
Personality-Profiles.md
Normal file
|
|
@ -0,0 +1,132 @@
|
|||
# Personality Profiles
|
||||
|
||||
LLM-powered extraction of creator teaching personality from transcript analysis. Added in M022/S06.
|
||||
|
||||
## Overview
|
||||
|
||||
Personality profiles capture each creator's distinctive teaching style — vocabulary patterns, tonal qualities, and stylistic markers — by analyzing their transcript corpus with a structured LLM extraction pipeline. Profiles are stored as JSONB on the Creator model and displayed on creator detail pages.
|
||||
|
||||
## Extraction Pipeline
|
||||
|
||||
### Transcript Sampling
|
||||
|
||||
Three-tier sampling strategy based on total transcript size:
|
||||
|
||||
| Tier | Condition | Strategy |
|
||||
|------|-----------|----------|
|
||||
| Small | < 20K chars | Use all transcript text |
|
||||
| Medium | 20K–60K chars | 300-character excerpts per key moment |
|
||||
| Large | > 60K chars | Topic-diverse random sampling via Redis classification data |
|
||||
|
||||
Large-tier sampling uses deterministic seeding and pulls from across topic categories to ensure the profile reflects the creator's full range, not just their most common topic.
|
||||
|
||||
### LLM Extraction
|
||||
|
||||
The prompt template at `prompts/personality_extraction.txt` instructs the LLM to analyze transcript excerpts and produce structured JSON. The LLM response is parsed and validated with a Pydantic model before storage.
|
||||
|
||||
**Celery task:** `extract_personality_profile` in `backend/pipeline/stages.py`
|
||||
- Joins KeyMoment → SourceVideo to load transcripts
|
||||
- Samples transcripts per the tier strategy
|
||||
- Calls LLM with `response_model=object` for JSON mode
|
||||
- Validates response with `PersonalityProfile` Pydantic model
|
||||
- Stores result as JSONB on Creator row
|
||||
- Emits pipeline_events for observability
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Zero-transcript creators: early return, no profile
|
||||
- Invalid JSON from LLM: retry
|
||||
- Pydantic validation failure: retry
|
||||
- Pipeline events track start/complete/error
|
||||
|
||||
## PersonalityProfile Schema
|
||||
|
||||
Stored as `Creator.personality_profile` JSONB column. Nested structure:
|
||||
|
||||
### VocabularyProfile
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| signature_phrases | list[str] | Characteristic phrases the creator uses repeatedly |
|
||||
| jargon_level | str | How technical their language is (e.g., "high", "moderate") |
|
||||
| filler_words | list[str] | Common filler words/phrases |
|
||||
| distinctive_terms | list[str] | Unique terminology or coined phrases |
|
||||
|
||||
### ToneProfile
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| formality | str | Formal to casual spectrum |
|
||||
| energy | str | Energy level descriptor |
|
||||
| humor | str | Humor style/frequency |
|
||||
| teaching_style | str | Overall teaching approach |
|
||||
|
||||
### StyleMarkersProfile
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| explanation_approach | str | How they explain concepts |
|
||||
| analogies | bool | Whether they use analogies frequently |
|
||||
| sound_words | bool | Whether they use onomatopoeia / sound words |
|
||||
| audience_engagement | str | How they address / engage viewers |
|
||||
|
||||
### Metadata
|
||||
|
||||
Each profile includes extraction metadata:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| extracted_at | ISO timestamp of extraction |
|
||||
| transcript_sample_size | Number of characters sampled |
|
||||
| model_used | LLM model identifier |
|
||||
|
||||
## API
|
||||
|
||||
### Admin Trigger
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/api/v1/admin/creators/{slug}/extract-profile` | Queue personality extraction task |
|
||||
|
||||
Returns immediately — extraction runs asynchronously via Celery. Check `pipeline_events` for status.
|
||||
|
||||
### Creator Detail
|
||||
|
||||
`GET /api/v1/creators/{slug}` includes `personality_profile` field (null if not yet extracted).
|
||||
|
||||
## Frontend Component
|
||||
|
||||
`PersonalityProfile.tsx` — collapsible section on creator detail pages.
|
||||
|
||||
### Layout
|
||||
|
||||
- **Collapsible header** with chevron toggle (CSS `grid-template-rows: 0fr/1fr` animation)
|
||||
- **Three sub-cards:**
|
||||
- **Teaching Style** — formality, energy, humor, teaching_style, explanation_approach, audience_engagement
|
||||
- **Vocabulary** — jargon_level summary, signature_phrases pills, filler_words pills, distinctive_terms pills
|
||||
- **Style** — analogies (checkmark/cross), sound_words (checkmark/cross), summary paragraph
|
||||
- **Metadata footer** — extraction date, sample size
|
||||
|
||||
Handles null profiles gracefully (renders nothing).
|
||||
|
||||
## Key Files
|
||||
|
||||
- `prompts/personality_extraction.txt` — LLM prompt template
|
||||
- `backend/pipeline/stages.py` — `extract_personality_profile` Celery task, `_sample_creator_transcripts()` helper
|
||||
- `backend/schemas.py` — PersonalityProfile, VocabularyProfile, ToneProfile, StyleMarkersProfile Pydantic models
|
||||
- `backend/models.py` — Creator.personality_profile JSONB column
|
||||
- `backend/routers/admin.py` — POST /admin/creators/{slug}/extract-profile endpoint
|
||||
- `backend/routers/creators.py` — Passthrough in GET /creators/{slug}
|
||||
- `alembic/versions/023_add_personality_profile.py` — Migration
|
||||
- `frontend/src/components/PersonalityProfile.tsx` — Collapsible profile component
|
||||
- `frontend/src/api/creators.ts` — TypeScript interfaces for profile sub-objects
|
||||
|
||||
## Design Decisions
|
||||
|
||||
- **3-tier transcript sampling** — Balances coverage vs. token cost. Topic-diverse random sampling for large creators prevents profile skew toward dominant topic.
|
||||
- **Admin trigger endpoint** — On-demand extraction rather than automatic on ingest. Profiles are expensive (large LLM call) and only needed once per creator.
|
||||
- **JSONB storage** — Profile schema may evolve; JSONB avoids migration for every field change.
|
||||
|
||||
---
|
||||
|
||||
*See also: [[Data-Model]], [[API-Surface]], [[Frontend]], [[Pipeline]]*
|
||||
|
|
@ -14,6 +14,7 @@
|
|||
- [[Chat-Engine]]
|
||||
- [[Search-Retrieval]]
|
||||
- [[Highlights]]
|
||||
- [[Personality-Profiles]]
|
||||
|
||||
**Reference**
|
||||
- [[API-Surface]]
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue