This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Personality Profiles
LLM-powered extraction of creator teaching personality from transcript analysis. Added in M022/S06.
Overview
Personality profiles capture each creator's distinctive teaching style — vocabulary patterns, tonal qualities, and stylistic markers — by analyzing their transcript corpus with a structured LLM extraction pipeline. Profiles are stored as JSONB on the Creator model and displayed on creator detail pages.
Extraction Pipeline
Transcript Sampling
Three-tier sampling strategy based on total transcript size:
| Tier | Condition | Strategy |
|---|---|---|
| Small | < 20K chars | Use all transcript text |
| Medium | 20K–60K chars | 300-character excerpts per key moment |
| Large | > 60K chars | Topic-diverse random sampling via Redis classification data |
Large-tier sampling uses deterministic seeding and pulls from across topic categories to ensure the profile reflects the creator's full range, not just their most common topic.
LLM Extraction
The prompt template at prompts/personality_extraction.txt instructs the LLM to analyze transcript excerpts and produce structured JSON. The LLM response is parsed and validated with a Pydantic model before storage.
Celery task: extract_personality_profile in backend/pipeline/stages.py
- Joins KeyMoment → SourceVideo to load transcripts
- Samples transcripts per the tier strategy
- Calls LLM with
response_model=objectfor JSON mode - Validates response with
PersonalityProfilePydantic model - Stores result as JSONB on Creator row
- Emits pipeline_events for observability
Error Handling
- Zero-transcript creators: early return, no profile
- Invalid JSON from LLM: retry
- Pydantic validation failure: retry
- Pipeline events track start/complete/error
PersonalityProfile Schema
Stored as Creator.personality_profile JSONB column. Nested structure:
VocabularyProfile
| Field | Type | Description |
|---|---|---|
| signature_phrases | list[str] | Characteristic phrases the creator uses repeatedly |
| jargon_level | str | How technical their language is (e.g., "high", "moderate") |
| filler_words | list[str] | Common filler words/phrases |
| distinctive_terms | list[str] | Unique terminology or coined phrases |
ToneProfile
| Field | Type | Description |
|---|---|---|
| formality | str | Formal to casual spectrum |
| energy | str | Energy level descriptor |
| humor | str | Humor style/frequency |
| teaching_style | str | Overall teaching approach |
StyleMarkersProfile
| Field | Type | Description |
|---|---|---|
| explanation_approach | str | How they explain concepts |
| analogies | bool | Whether they use analogies frequently |
| sound_words | bool | Whether they use onomatopoeia / sound words |
| audience_engagement | str | How they address / engage viewers |
Metadata
Each profile includes extraction metadata:
| Field | Description |
|---|---|
| extracted_at | ISO timestamp of extraction |
| transcript_sample_size | Number of characters sampled |
| model_used | LLM model identifier |
API
Admin Trigger
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/admin/creators/{slug}/extract-profile |
Queue personality extraction task |
Returns immediately — extraction runs asynchronously via Celery. Check pipeline_events for status.
Creator Detail
GET /api/v1/creators/{slug} includes personality_profile field (null if not yet extracted).
Frontend Component
PersonalityProfile.tsx — collapsible section on creator detail pages.
Layout
- Collapsible header with chevron toggle (CSS
grid-template-rows: 0fr/1franimation) - Three sub-cards:
- Teaching Style — formality, energy, humor, teaching_style, explanation_approach, audience_engagement
- Vocabulary — jargon_level summary, signature_phrases pills, filler_words pills, distinctive_terms pills
- Style — analogies (checkmark/cross), sound_words (checkmark/cross), summary paragraph
- Metadata footer — extraction date, sample size
Handles null profiles gracefully (renders nothing).
Key Files
prompts/personality_extraction.txt— LLM prompt templatebackend/pipeline/stages.py—extract_personality_profileCelery task,_sample_creator_transcripts()helperbackend/schemas.py— PersonalityProfile, VocabularyProfile, ToneProfile, StyleMarkersProfile Pydantic modelsbackend/models.py— Creator.personality_profile JSONB columnbackend/routers/admin.py— POST /admin/creators/{slug}/extract-profile endpointbackend/routers/creators.py— Passthrough in GET /creators/{slug}alembic/versions/023_add_personality_profile.py— Migrationfrontend/src/components/PersonalityProfile.tsx— Collapsible profile componentfrontend/src/api/creators.ts— TypeScript interfaces for profile sub-objects
Design Decisions
- 3-tier transcript sampling — Balances coverage vs. token cost. Topic-diverse random sampling for large creators prevents profile skew toward dominant topic.
- Admin trigger endpoint — On-demand extraction rather than automatic on ingest. Profiles are expensive (large LLM call) and only needed once per creator.
- JSONB storage — Profile schema may evolve; JSONB avoids migration for every field change.
See also: Data-Model, API-Surface, Frontend, Pipeline
Chrysopedia Wiki
Architecture
Features
- Chat-Engine
- Search-Retrieval
- Highlights
- Personality-Profiles
- Posts (via Post Editor)
Reference
Operations