7 Data Model
jlightner edited this page 2026-04-04 10:31:50 -05:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Data Model

23 SQLAlchemy models in backend/models.py.

Entity Relationship Overview

Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
    │                    │
    │                    ├──→ (N) KeyMoment
    │                    │
    │                    └──→ (0..1) VideoConsent ──→ (N) ConsentAuditLog
    │
    ├──→ (N) TechniquePage (M) ←──→ (N) Tag
    │           │
    │           ├──→ (N) TechniquePageVersion
    │           ├──→ (N) RelatedTechniqueLink
    │           └──→ (M:N) SourceVideo  (via TechniquePageVideo)
    │
    ├──→ (N) CreatorFollow ←── User
    │
    ├──→ (N) Post (1) ──→ (N) PostAttachment
    │
    └──→ (0..1) User ──→ (N) InviteCode (created_by)

HighlightCandidate (1) ──→ (N) GeneratedShort

Core Content Models

Creator

Field Type Notes
id UUID PK
name String Unique, from folder name
slug String URL-safe, unique
genres ARRAY(String) e.g. ["dubstep", "sound design"]
avatar_url String Optional
bio Text Admin-editable
social_links JSONB Platform → URL mapping
featured Boolean For homepage spotlight
personality_profile JSONB LLM-extracted personality data (M022/S06). See Personality-Profiles
shorts_template JSONB Nullable — intro/outro card config with parse_template_config normalizer (M024/S04)

SourceVideo

Field Type Notes
id UUID PK
creator_id FK → Creator
filename String Original video filename
youtube_url String Optional
folder_name String Filesystem folder name
processing_status Enum queued / in_progress / complete / errored / revoked
pipeline_stage Integer Current/last completed stage (1-6)

TranscriptSegment

Field Type Notes
id UUID PK
source_video_id FK → SourceVideo
start_time Float Seconds
end_time Float Seconds
text Text Segment transcript text

KeyMoment

Field Type Notes
id UUID PK
source_video_id FK → SourceVideo
title String
summary Text
start_time Float Seconds
end_time Float Seconds
topic_category String e.g. "Sound Design"
topic_tags ARRAY(String)
content_type Enum tutorial / tip / exploration / walkthrough
review_status String pending / approved / rejected
sort_order Integer Display ordering within video (M021/S06)

TechniquePage

Field Type Notes
id UUID PK
creator_id FK → Creator
title String
slug String Unique, URL-safe
summary Text
body_sections JSONB v1: dict, v2: list-of-objects with nesting (D024)
body_sections_format String "v1" or "v2" — format discriminator
signal_chains JSONB Signal flow descriptions
plugins ARRAY(String) Referenced plugins/VSTs
topic_category String
topic_tags ARRAY(String)
source_quality Enum high / medium / low
view_count Integer

TechniquePageVersion

Field Type Notes
id UUID PK
technique_page_id FK → TechniquePage
version_number Integer Sequential
content_snapshot JSONB Full page state at version time
pipeline_metadata JSONB Prompt SHA-256 hashes, model config

HighlightCandidate

Field Type Notes
id UUID PK
key_moment_id FK → KeyMoment Unique constraint
source_video_id FK → SourceVideo Indexed
score Float Composite score 0.01.0
score_breakdown JSONB Per-dimension scores (10 fields, see Highlights)
duration_secs Float Cached from KeyMoment
status Enum(HighlightStatus) candidate / approved / rejected
trim_start Float Nullable — trim offset in seconds (M022/S01)
trim_end Float Nullable — trim offset in seconds (M022/S01)
created_at Timestamp
updated_at Timestamp

CreatorFollow (M022/S02)

Field Type Notes
id UUID PK
user_id FK → User Part of unique constraint
creator_id FK → Creator Part of unique constraint
created_at Timestamp

Unique constraint on (user_id, creator_id). Idempotent follow via INSERT ON CONFLICT DO NOTHING.

Authentication & User Models

User

Field Type Notes
id UUID PK
email String(255) Unique
hashed_password String(255) bcrypt hash
display_name String(255)
role Enum(UserRole) admin / creator (default: creator)
creator_id FK → Creator Optional — links user to a creator profile
is_active Boolean Default true
created_at Timestamp
updated_at Timestamp

InviteCode

Field Type Notes
id UUID PK
code String(100) Unique
uses_remaining Integer Default 1 — decremented on each registration
created_by FK → User Optional — admin who created the code
expires_at Timestamp Optional — null means no expiry
created_at Timestamp

VideoConsent

Per-video consent state. One row per video, mutable. Full change history in ConsentAuditLog.

Field Type Notes
id UUID PK
source_video_id FK → SourceVideo Unique constraint
creator_id FK → Creator
kb_inclusion Boolean Default false — allow KB indexing
training_usage Boolean Default false — allow training use
public_display Boolean Default true — allow public display
updated_by FK → User Last user to modify
created_at Timestamp
updated_at Timestamp

ConsentAuditLog

Append-only versioned record of per-field consent changes.

Field Type Notes
id UUID PK
video_consent_id FK → VideoConsent Indexed
version Integer Sequential per video_consent_id
field_name String(50) ConsentField enum value
old_value Boolean Nullable (null on first set)
new_value Boolean
changed_by FK → User
ip_address String(45) Client IP at time of change
created_at Timestamp

Supporting Models

Model Purpose
RelatedTechniqueLink Directed link between technique pages (source → target with label)
Tag Normalized tag with M:N join to TechniquePage via technique_page_tags
TechniquePageVideo Join table: TechniquePage ↔ SourceVideo (multi-source pages)
ContentReport User-submitted content reports with status workflow (open/acknowledged/resolved/dismissed)
SearchLog Query logging for popular searches feature (D025)
PipelineRun Pipeline execution tracking per video with status and trigger type
PipelineEvent Granular pipeline stage events with token counts and JSONB payload

Enums

Enum Values
ContentType tutorial, tip, exploration, walkthrough
ProcessingStatus queued, in_progress, complete, errored, revoked
KeyMomentContentType technique, concept, workflow, reference
SourceQuality high, medium, low
RelationshipType related, prerequisite, builds_on
ReportType inaccuracy, missing_info, offensive, other
ReportStatus open, acknowledged, resolved, dismissed
PipelineRunStatus pending, running, completed, failed, revoked
PipelineRunTrigger auto, manual, retrigger, clean_retrigger
UserRole admin, creator
ConsentField kb_inclusion, training_usage, public_display
HighlightStatus candidate, approved, rejected (M021/S04)
ChapterStatus draft, approved, hidden (M021/S06)
FormatPreset vertical, square, horizontal (M023/S03)
ShortStatus pending, processing, complete, failed (M023/S03)

Migrations

Migration Description
019 Add highlight_candidates table
021 Add trim_start/trim_end to highlight_candidates (M022/S01)
022 Add creator_follows table (M022/S02)
023 Add personality_profile JSONB to creators (M022/S06)
024 Add posts and post_attachments tables (M023/S01)
025 Add generated_shorts table with FormatPreset and ShortStatus enums (M023/S03)
026 Add share_token to generated_shorts + backfill existing complete shorts + unique index (M024/S01)
027 Add captions_enabled boolean to generated_shorts (M024/S04)
028 Add shorts_template JSONB to creators (M024/S04)

Schema Notes

  • body_sections_format discriminator enables v1/v2 format coexistence (D024)
  • topic_category casing is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
  • Stage 4 classification data (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
  • Timestamp convention: datetime.now(timezone.utc).replace(tzinfo=None) — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
  • User passwords are stored as bcrypt hashes via bcrypt.hashpw()
  • Consent audit uses version numbers assigned in application code (max(version) + 1 per video_consent_id)

M025 Models

EmailDigestLog (M025/S01)

Field Type Notes
id UUID PK
user_id FK -> User
content_summary JSONB Digest content grouped by creator
sent_at Timestamp

Tracks every successful email send for deduplication. The digest task checks MAX(sent_at) per user to find new content since last send.

ChatUsageLog (M025/S04)

Field Type Notes
id UUID PK
user_id FK -> User Nullable (anonymous users)
client_ip String For IP-based rate limiting
creator_slug String Nullable
query Text User's chat query
prompt_tokens Integer
completion_tokens Integer
total_tokens Integer
cascade_tier String creator / domain / global / none
model String Actual model used (primary or fallback)
latency_ms Integer End-to-end response time
created_at Timestamp Indexed for time-range aggregation

Non-blocking INSERT -- errors logged but never block the SSE response.

User Model Changes (M025/S01, S03)

Field Type Notes
notification_preferences JSONB `{"digest_enabled": bool, "digest_frequency": "daily"
onboarding_completed Boolean Default false; set true via POST /auth/onboarding-complete

See also: Architecture, API-Surface, Pipeline, Authentication

Post (M023/S01)

Field Type Notes
id UUID PK
title Text Post title
body JSONB Tiptap rich text JSON (canonical format)
is_published Boolean Default false — draft until published
creator_id FK → Creator
created_at Timestamp
updated_at Timestamp

PostAttachment (M023/S01)

Field Type Notes
id UUID PK
post_id FK → Post Cascade delete
filename String Original upload filename
content_type String MIME type
object_key String MinIO object key
file_size Integer Bytes
created_at Timestamp

GeneratedShort (M023/S03)

Field Type Notes
id UUID PK
highlight_candidate_id FK → HighlightCandidate
preset Enum(FormatPreset) vertical / square / horizontal
status Enum(ShortStatus) pending / processing / complete / failed
object_key String MinIO key (shorts/{highlight_id}/{preset}.mp4)
file_size Integer Nullable — bytes
duration_secs Float Nullable
error_message Text Nullable — on failure
share_token String(16) Nullable, unique-indexed — generated via secrets.token_urlsafe(8) at pipeline completion (M024/S01)
captions_enabled Boolean Default false — indicates successful ASS subtitle generation (M024/S04)
created_at Timestamp
updated_at Timestamp