This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Highlight Detection
Heuristic scoring engine that ranks KeyMoment records into highlight candidates using 10 weighted dimensions. Originally added in M021/S04 with 7 dimensions, expanded to 10 in M022/S05.
Overview
Highlight detection scores every KeyMoment in a video to identify the most "highlightable" segments — moments that would work well as standalone clips or featured content. The scoring is a pure function (no ML model, no external API) based on 10 dimensions derived from existing KeyMoment metadata and word-level transcript timing data.
Scoring Dimensions
Total weight sums to 1.0. Each dimension produces a 0.0–1.0 score.
| Dimension | Weight | What It Measures |
|---|---|---|
duration_fitness |
0.20 | Piecewise linear curve peaking at 30–60 seconds (ideal clip length) |
content_type |
0.16 | Content type favorability: tutorial > tip > walkthrough > exploration |
specificity_density |
0.16 | Regex-based counting of specific units, ratios, and named parameters in summary text |
plugin_richness |
0.08 | Number of plugins/VSTs referenced (more = more actionable) |
transcript_energy |
0.08 | Teaching-phrase detection in transcript text (e.g., "the trick is", "key thing") |
source_quality |
0.08 | Source quality rating: high=1.0, medium=0.6, low=0.3 |
video_type |
0.02 | Video type favorability mapping |
speech_rate_variance |
~0.07 | Coefficient of variation of words-per-second in 5s sliding windows |
pause_density |
~0.08 | Count and weight of inter-word gaps (>0.5s short, >1.0s long) |
speaking_pace |
~0.07 | Bell-curve fitness around optimal 3–5 WPS teaching pace |
Audio Proxy Dimensions (M022/S05)
The three new dimensions (speech_rate_variance, pause_density, speaking_pace) are derived from word-level transcript timing data — not raw audio. This provides meaningful speech-pattern signals without requiring librosa or audio processing dependencies.
Neutral fallback: When word_timings are unavailable (no word-level data in transcript), all three audio proxy dimensions default to 0.5 (neutral score). This preserves backward compatibility — existing scoring paths are unaffected. The weights of the original 7 dimensions were reduced proportionally to accommodate the new 0.22 total weight for audio dimensions (D041).
Duration Fitness Curve
Uses piecewise linear (not Gaussian) for predictability:
- 0–10s → low score (too short)
- 10–30s → ramp up
- 30–60s → peak score (1.0)
- 60–120s → gradual decline
- 120s+ → low score (too long for a highlight)
Data Model
HighlightCandidate
| Field | Type | Notes |
|---|---|---|
| id | UUID PK | |
| key_moment_id | FK → KeyMoment | Unique constraint (highlight_candidates_key_moment_id_key) |
| source_video_id | FK → SourceVideo | Indexed |
| score | Float | Composite score 0.0–1.0 |
| score_breakdown | JSONB | Per-dimension scores (10 fields) |
| duration_secs | Float | Cached from KeyMoment for display |
| status | Enum(HighlightStatus) | candidate / approved / rejected |
| trim_start | Float | Nullable — trim start offset in seconds (M022/S01) |
| trim_end | Float | Nullable — trim end offset in seconds (M022/S01) |
| created_at | Timestamp | |
| updated_at | Timestamp |
HighlightStatus Enum
| Value | Meaning |
|---|---|
candidate |
Scored but not reviewed |
approved |
Admin-approved as a highlight |
rejected |
Admin-rejected |
Database Indexes
source_video_id— filter by videoscoreDESC — rank orderingstatus— filter by review state
Migrations
019_add_highlight_candidates.py— Creates table with indexes and unique constraint021_add_highlight_trim_columns.py— Adds trim_start and trim_end columns (M022/S01)
API Endpoints
Admin Endpoints
All under /api/v1/admin/highlights/. Admin access.
| Method | Path | Purpose |
|---|---|---|
| POST | /admin/highlights/detect/{video_id} |
Score all KeyMoments for a video, upsert candidates |
| POST | /admin/highlights/detect-all |
Score all videos (triggers Celery tasks) |
| GET | /admin/highlights/candidates |
Paginated candidate list, sorted by score DESC |
| GET | /admin/highlights/candidates/{id} |
Single candidate with full score_breakdown |
Creator Endpoints (M022/S01)
Creator-scoped highlight review. Requires JWT auth with creator ownership verification.
| Method | Path | Purpose |
|---|---|---|
| GET | /api/v1/creator/highlights |
List highlights for authenticated creator (status/shorts_only filters, score DESC) |
| GET | /api/v1/creator/highlights/{id} |
Detail with score_breakdown and key_moment |
| PATCH | /api/v1/creator/highlights/{id}/status |
Update status (approve/reject) with ownership verification |
| PATCH | /api/v1/creator/highlights/{id}/trim |
Update trim_start/trim_end (validation: non-negative, start < end) |
Score Breakdown Response
{
"duration_fitness": 0.95,
"content_type_weight": 0.80,
"specificity_density": 0.72,
"plugin_richness": 0.60,
"transcript_energy": 0.85,
"source_quality_weight": 1.00,
"video_type_weight": 0.50,
"speech_rate_variance_score": 0.057,
"pause_density_score": 0.0,
"speaking_pace_score": 1.0
}
Pipeline Integration
Celery Task: stage_highlight_detection
- Binding:
bind=True, max_retries=3 - Session: Uses
_get_sync_session(sync SQLAlchemy, per D004) - Flow: Load KeyMoments for video → load transcript JSON → extract word timings per moment → score each via
score_moment()→ bulk upsert viaINSERT ON CONFLICTon constrainthighlight_candidates_key_moment_id_key - Transcript handling: Loads transcript JSON once per video via
SourceVideo.transcript_path. Accepts both{segments: [...]}and bare[...]JSON formats. - Fallback: If transcript is missing or malformed,
word_timings=Noneand scorer uses neutral values for audio dimensions - Events: Emits
pipeline_eventsrows for start/complete/error with candidate count in payload
Scoring Function
score_moment() in backend/pipeline/highlight_scorer.py is a pure function — no DB access, no side effects. Takes a KeyMoment-like dict and optional word_timings list, returns (score, breakdown_dict). This separation enables easy unit testing (62 tests, runs in 0.09s).
Word Timing Extraction
extract_word_timings() filters word-level timing dicts from transcript JSON by time window. Used by the Celery task to extract timings per KeyMoment before scoring.
Frontend: Highlight Review Queue (M022/S01)
Route: /creator/highlights (JWT-protected, lazy-loaded)
Components
- Filter tabs — All / Shorts / Approved / Rejected
- Candidate cards — Key moment title, duration, composite score, status badge
- Score breakdown bars — Visual bars for each of the 10 scoring dimensions (fetched lazily on expand)
- Action buttons — Approve / Discard with ownership verification
- Inline trim panel — Validated number inputs for trim_start / trim_end
- Sidebar link — Star icon in creator dashboard SidebarNav
Design Decisions
- Pure function scoring — No DB or side effects in
score_moment(), enabling fast unit tests - Piecewise linear duration — Predictable behavior vs. Gaussian bell curve
- Neutral fallback at 0.5 — New audio dimensions don't penalize moments without word-level timing data (D041)
- Proportional weight reduction — Original 7 dimensions reduced proportionally to make room for 0.22 audio weight
- Lazy detail fetch — Score breakdown fetched on expand, not on list load (avoids N+1)
- Creator-scoped router — Ownership verification pattern reusable for future creator endpoints
Key Files
backend/pipeline/highlight_scorer.py— Pure scoring function with 10 dimensions, word timing extractionbackend/pipeline/highlight_schemas.py— Pydantic schemas (HighlightScoreBreakdown with 10 fields)backend/pipeline/stages.py—stage_highlight_detectionCelery taskbackend/routers/highlights.py— 4 admin API endpointsbackend/routers/creator_highlights.py— 4 creator-scoped endpoints (M022/S01)backend/models.py— HighlightCandidate model with trim columnsalembic/versions/019_add_highlight_candidates.py— Initial migrationalembic/versions/021_add_highlight_trim_columns.py— Trim columns migrationbackend/pipeline/test_highlight_scorer.py— 62 unit testsfrontend/src/pages/HighlightQueue.tsx— Creator review queue pagefrontend/src/api/highlights.ts— Highlight API client
See also: Pipeline, Data-Model, API-Surface, Frontend
Chrysopedia Wiki
Architecture
Features
- Chat-Engine
- Search-Retrieval
- Highlights
- Personality-Profiles
- Posts (via Post Editor)
Reference
Operations