Table of Contents

Personality Profiles

Transcript Sampling
LLM Extraction
Error Handling

PersonalityProfile Schema

VocabularyProfile
ToneProfile
StyleMarkersProfile
Metadata

API

Admin Trigger
Creator Detail

Frontend Component

Layout

Key Files
Design Decisions

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Personality Profiles

LLM-powered extraction of creator teaching personality from transcript analysis. Added in M022/S06.

Overview

Personality profiles capture each creator's distinctive teaching style — vocabulary patterns, tonal qualities, and stylistic markers — by analyzing their transcript corpus with a structured LLM extraction pipeline. Profiles are stored as JSONB on the Creator model and displayed on creator detail pages.

Extraction Pipeline

Transcript Sampling

Three-tier sampling strategy based on total transcript size:

Tier	Condition	Strategy
Small	< 20K chars	Use all transcript text
Medium	20K–60K chars	300-character excerpts per key moment
Large	> 60K chars	Topic-diverse random sampling via Redis classification data

Large-tier sampling uses deterministic seeding and pulls from across topic categories to ensure the profile reflects the creator's full range, not just their most common topic.

LLM Extraction

The prompt template at prompts/personality_extraction.txt instructs the LLM to analyze transcript excerpts and produce structured JSON. The LLM response is parsed and validated with a Pydantic model before storage.

Celery task: extract_personality_profile in backend/pipeline/stages.py

Joins KeyMoment → SourceVideo to load transcripts
Samples transcripts per the tier strategy
Calls LLM with response_model=object for JSON mode
Validates response with PersonalityProfile Pydantic model
Stores result as JSONB on Creator row
Emits pipeline_events for observability

Error Handling

Zero-transcript creators: early return, no profile
Invalid JSON from LLM: retry
Pydantic validation failure: retry
Pipeline events track start/complete/error

PersonalityProfile Schema

Stored as Creator.personality_profile JSONB column. Nested structure:

VocabularyProfile

Field	Type	Description
signature_phrases	list[str]	Characteristic phrases the creator uses repeatedly
jargon_level	str	How technical their language is (e.g., "high", "moderate")
filler_words	list[str]	Common filler words/phrases
distinctive_terms	list[str]	Unique terminology or coined phrases

ToneProfile

Field	Type	Description
formality	str	Formal to casual spectrum
energy	str	Energy level descriptor
humor	str	Humor style/frequency
teaching_style	str	Overall teaching approach

StyleMarkersProfile

Field	Type	Description
explanation_approach	str	How they explain concepts
analogies	bool	Whether they use analogies frequently
sound_words	bool	Whether they use onomatopoeia / sound words
audience_engagement	str	How they address / engage viewers

Metadata

Each profile includes extraction metadata:

Field	Description
extracted_at	ISO timestamp of extraction
transcript_sample_size	Number of characters sampled
model_used	LLM model identifier

API

Admin Trigger

Method	Path	Purpose
POST	`/api/v1/admin/creators/{slug}/extract-profile`	Queue personality extraction task

Returns immediately — extraction runs asynchronously via Celery. Check pipeline_events for status.

Creator Detail

GET /api/v1/creators/{slug} includes personality_profile field (null if not yet extracted).

Frontend Component

PersonalityProfile.tsx — collapsible section on creator detail pages.

Layout

Collapsible header with chevron toggle (CSS grid-template-rows: 0fr/1fr animation)
Three sub-cards:
- Teaching Style — formality, energy, humor, teaching_style, explanation_approach, audience_engagement
- Vocabulary — jargon_level summary, signature_phrases pills, filler_words pills, distinctive_terms pills
- Style — analogies (checkmark/cross), sound_words (checkmark/cross), summary paragraph
Metadata footer — extraction date, sample size

Handles null profiles gracefully (renders nothing).

Key Files

prompts/personality_extraction.txt — LLM prompt template
backend/pipeline/stages.py — extract_personality_profile Celery task, _sample_creator_transcripts() helper
backend/schemas.py — PersonalityProfile, VocabularyProfile, ToneProfile, StyleMarkersProfile Pydantic models
backend/models.py — Creator.personality_profile JSONB column
backend/routers/admin.py — POST /admin/creators/{slug}/extract-profile endpoint
backend/routers/creators.py — Passthrough in GET /creators/{slug}
alembic/versions/023_add_personality_profile.py — Migration
frontend/src/components/PersonalityProfile.tsx — Collapsible profile component
frontend/src/api/creators.ts — TypeScript interfaces for profile sub-objects

Design Decisions

3-tier transcript sampling — Balances coverage vs. token cost. Topic-diverse random sampling for large creators prevents profile skew toward dominant topic.
Admin trigger endpoint — On-demand extraction rather than automatic on ingest. Profiles are expensive (large LLM call) and only needed once per creator.
JSONB storage — Profile schema may evolve; JSONB avoids migration for every field change.

See also: Data-Model, API-Surface, Frontend, Pipeline

Chrysopedia Wiki

Architecture

Features

Reference

Operations