1 Newcomer Guide
jlightner edited this page 2026-04-04 10:34:35 -05:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Newcomer Guide

Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.

What is Chrysopedia

Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.

How Content Flows

Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:

  1. Video transcripts (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
  2. The 6-stage pipeline processes them automatically:
    • Stage 1 — Transcript Segmentation: Splits raw transcripts into coherent segments
    • Stage 2 — Key Moment Extraction: Identifies technique demonstrations, tips, and notable passages
    • Stage 3 — Classification & Tagging: Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
    • Stage 4 — Technique Page Synthesis: Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
    • Stage 5 — Embedding & Indexing: Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
    • Stage 6 — Highlight Detection: (Optional) Scores key moments across 10 dimensions for editorial curation
  3. Results appear as structured technique pages, searchable key moments, and cross-references between creators

Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See Pipeline for stage-level detail and prompt templates.

Using the Web UI

  • Homepage search bar or Cmd+K from any page opens the search interface
  • Combines semantic search (LightRAG knowledge graph + Qdrant vectors) with keyword search (PostgreSQL full-text)
  • Results show technique pages and key moments with creator attribution and relevance scores
  • Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content

Browse Topics

  • 7 top-level categories → sub-topics → technique pages grouped by creator
  • Categories expand/collapse with CSS grid animation
  • Each technique page card shows creator, topic tags, and key moment count

Browse Creators

  • Filterable list of all creators with genre tags
  • Randomized default sort (no alphabetical bias)
  • Click through to creator detail pages showing all their technique pages and key moments

Technique Pages

  • Study guide prose — LLM-synthesized content with structured sections (v2 body format)
  • Key moments index — timestamped references back to source videos with citation markers
  • Related techniques — cross-references to similar content from other creators
  • Table of contents — auto-generated from section headings, displayed in sidebar
  • Reading header — sticky section indicator bar that appears when scrolling past the page title
  • Inline player — audio/video player with chapter markers and key moment timeline pins

Chat

  • Creator-scoped AI chat — ask questions about any creator's techniques
  • Citation support — responses include numbered source references
  • Cascade retrieval — queries search creator-specific context first, then domain (topic category), then global knowledge
  • Multi-turn memory — conversation context persists within a session
  • Streaming responses — SSE-based token streaming with source metadata sent first
  • Quality toolkit — refined system prompt with baseline quality metrics

For Creators

Registration & Onboarding

  • Standard registration with email/password
  • 3-step onboarding wizard (shown once after first login):
    1. Welcome message explaining the platform
    2. Content consent selection (choose which content types to publish)
    3. Quick tour of available features
  • Onboarding completion tracked via onboarding_completed flag on user profile
  • Control which content types are published (technique pages, key moments, chat availability)
  • Granular per-content-type toggles
  • Changes take effect on next pipeline run

Creator Dashboard

  • Overview stats: total technique pages, key moments, video count
  • Recent posts and activity feed
  • Quick links to technique pages derived from your content

Transparency Page

  • View all entities, relationships, and technique pages derived from your content
  • Expandable/collapsible category sections (CSS grid animation)
  • Full audit trail of what the system extracted from your videos

Data Export

  • GDPR-style ZIP download of all derived content via GET /creator/export
  • Includes technique pages, key moments, classifications, and metadata
  • One-click download from creator dashboard

Notifications

  • Email digest of platform activity (new technique pages, key moments from your content)
  • Configurable frequency: daily, weekly, or disabled
  • Signed unsubscribe links (PyJWT tokens) — one-click unsubscribe without login
  • Managed via notification preferences in creator settings

Personality Profiles

  • 5-tier system for chat persona customization based on creator teaching style
  • LLM-extracted from creator's content patterns
  • Influences how the chat engine responds to questions about that creator's techniques
  • See Personality-Profiles for tier definitions

For Admins

Review Queue

  • Approve, reject, or edit key moments organized by source video
  • Bulk actions for efficient moderation
  • Filter by creator, video, or processing status

Pipeline Admin

  • Monitor processing stages for all videos in the system
  • Filter by creator, status (pending, processing, complete, failed)
  • View per-stage timing and error details

Usage Dashboard

  • Token consumption tracking — LLM API usage over time
  • Top creators and users — ranked by content volume and platform usage
  • Daily statistics — requests, chat sessions, search queries
  • Rate limiting visibility — sliding-window rate limiter status (Redis-backed, per-user)

Audit Log

  • All administrative actions tracked with timestamp, actor, and details
  • Searchable and filterable

User Management

  • Role assignment (admin, creator, user)
  • Account status management (active, suspended)
  • Impersonation support for debugging user-specific issues (see Impersonation)

Adding New Content

  1. Prepare transcripts — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
  2. Place in watched folder — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
  3. Pipeline processes automatically — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
  4. File stability check — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
  5. Quality control — New key moments appear in the Review Queue for admin approval before publishing.
  6. Re-processing — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).

Infrastructure & Deployment

Chrysopedia runs as a Docker Compose stack on ub01:

Service Purpose
chrysopedia-api FastAPI application server
chrysopedia-worker Celery worker + Beat scheduler (email digests, periodic tasks)
chrysopedia-web-8096 nginx reverse proxy serving frontend + API routing
chrysopedia-db PostgreSQL 16 (port 5433 externally)
chrysopedia-redis Redis — caching, rate limiting, Celery broker, classification data
chrysopedia-qdrant Qdrant vector database for semantic search
chrysopedia-ollama Ollama — local LLM fallback when primary DGX endpoint is unavailable
chrysopedia-lightrag LightRAG knowledge graph for entity-aware retrieval
  • Primary LLM: DGX endpoint with automatic Ollama fallback (fail-open, configurable via LLM_FALLBACK_URL / LLM_FALLBACK_MODEL)
  • Web UI: http://ub01:8096
  • External: https://chrysopedia.com via nuc01 nginx reverse proxy

For full deployment instructions and rebuild commands, see Deployment. For local development setup and common gotchas, see Development-Guide.

Where to Learn More

Every aspect of Chrysopedia is documented in the wiki:

Topic Wiki Page
System architecture, Docker services, network topology Architecture
All 80+ API endpoints grouped by domain API-Surface
SQLAlchemy models, relationships, enums Data-Model
Semantic + keyword search, LightRAG cascade Search-Retrieval
Streaming Q&A, multi-turn memory, fallback Chat-Engine
6-stage LLM extraction pipeline, prompt system Pipeline
Audio/video player, chapter markers, timeline pins Player
10-dimension highlight scoring and review Highlights
LLM-extracted creator teaching personality Personality-Profiles
JWT authentication, roles, permissions Authentication
Admin impersonation for debugging Impersonation
Environment variables and feature flags Configuration
Docker Compose setup, rebuild commands Deployment
Prometheus metrics, health checks, logging Monitoring
Local dev setup, common gotchas Development-Guide
Architectural decisions register (D001D048) Decisions
LLM agent context injection system Agent-Context