Table of Contents
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Newcomer Guide
Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.
What is Chrysopedia
Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.
How Content Flows
Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:
- Video transcripts (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
- The 6-stage pipeline processes them automatically:
- Stage 1 — Transcript Segmentation: Splits raw transcripts into coherent segments
- Stage 2 — Key Moment Extraction: Identifies technique demonstrations, tips, and notable passages
- Stage 3 — Classification & Tagging: Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
- Stage 4 — Technique Page Synthesis: Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
- Stage 5 — Embedding & Indexing: Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
- Stage 6 — Highlight Detection: (Optional) Scores key moments across 10 dimensions for editorial curation
- Results appear as structured technique pages, searchable key moments, and cross-references between creators
Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See Pipeline for stage-level detail and prompt templates.
Using the Web UI
Search
- Homepage search bar or Cmd+K from any page opens the search interface
- Combines semantic search (LightRAG knowledge graph + Qdrant vectors) with keyword search (PostgreSQL full-text)
- Results show technique pages and key moments with creator attribution and relevance scores
- Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content
Browse Topics
- 7 top-level categories → sub-topics → technique pages grouped by creator
- Categories expand/collapse with CSS grid animation
- Each technique page card shows creator, topic tags, and key moment count
Browse Creators
- Filterable list of all creators with genre tags
- Randomized default sort (no alphabetical bias)
- Click through to creator detail pages showing all their technique pages and key moments
Technique Pages
- Study guide prose — LLM-synthesized content with structured sections (v2 body format)
- Key moments index — timestamped references back to source videos with citation markers
- Related techniques — cross-references to similar content from other creators
- Table of contents — auto-generated from section headings, displayed in sidebar
- Reading header — sticky section indicator bar that appears when scrolling past the page title
- Inline player — audio/video player with chapter markers and key moment timeline pins
Chat
- Creator-scoped AI chat — ask questions about any creator's techniques
- Citation support — responses include numbered source references
- Cascade retrieval — queries search creator-specific context first, then domain (topic category), then global knowledge
- Multi-turn memory — conversation context persists within a session
- Streaming responses — SSE-based token streaming with source metadata sent first
- Quality toolkit — refined system prompt with baseline quality metrics
For Creators
Registration & Onboarding
- Standard registration with email/password
- 3-step onboarding wizard (shown once after first login):
- Welcome message explaining the platform
- Content consent selection (choose which content types to publish)
- Quick tour of available features
- Onboarding completion tracked via
onboarding_completedflag on user profile
Consent Dashboard
- Control which content types are published (technique pages, key moments, chat availability)
- Granular per-content-type toggles
- Changes take effect on next pipeline run
Creator Dashboard
- Overview stats: total technique pages, key moments, video count
- Recent posts and activity feed
- Quick links to technique pages derived from your content
Transparency Page
- View all entities, relationships, and technique pages derived from your content
- Expandable/collapsible category sections (CSS grid animation)
- Full audit trail of what the system extracted from your videos
Data Export
- GDPR-style ZIP download of all derived content via
GET /creator/export - Includes technique pages, key moments, classifications, and metadata
- One-click download from creator dashboard
Notifications
- Email digest of platform activity (new technique pages, key moments from your content)
- Configurable frequency: daily, weekly, or disabled
- Signed unsubscribe links (PyJWT tokens) — one-click unsubscribe without login
- Managed via notification preferences in creator settings
Personality Profiles
- 5-tier system for chat persona customization based on creator teaching style
- LLM-extracted from creator's content patterns
- Influences how the chat engine responds to questions about that creator's techniques
- See Personality-Profiles for tier definitions
For Admins
Review Queue
- Approve, reject, or edit key moments organized by source video
- Bulk actions for efficient moderation
- Filter by creator, video, or processing status
Pipeline Admin
- Monitor processing stages for all videos in the system
- Filter by creator, status (pending, processing, complete, failed)
- View per-stage timing and error details
Usage Dashboard
- Token consumption tracking — LLM API usage over time
- Top creators and users — ranked by content volume and platform usage
- Daily statistics — requests, chat sessions, search queries
- Rate limiting visibility — sliding-window rate limiter status (Redis-backed, per-user)
Audit Log
- All administrative actions tracked with timestamp, actor, and details
- Searchable and filterable
User Management
- Role assignment (admin, creator, user)
- Account status management (active, suspended)
- Impersonation support for debugging user-specific issues (see Impersonation)
Adding New Content
- Prepare transcripts — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
- Place in watched folder — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
- Pipeline processes automatically — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
- File stability check — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
- Quality control — New key moments appear in the Review Queue for admin approval before publishing.
- Re-processing — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).
Infrastructure & Deployment
Chrysopedia runs as a Docker Compose stack on ub01:
| Service | Purpose |
|---|---|
chrysopedia-api |
FastAPI application server |
chrysopedia-worker |
Celery worker + Beat scheduler (email digests, periodic tasks) |
chrysopedia-web-8096 |
nginx reverse proxy serving frontend + API routing |
chrysopedia-db |
PostgreSQL 16 (port 5433 externally) |
chrysopedia-redis |
Redis — caching, rate limiting, Celery broker, classification data |
chrysopedia-qdrant |
Qdrant vector database for semantic search |
chrysopedia-ollama |
Ollama — local LLM fallback when primary DGX endpoint is unavailable |
chrysopedia-lightrag |
LightRAG knowledge graph for entity-aware retrieval |
- Primary LLM: DGX endpoint with automatic Ollama fallback (fail-open, configurable via
LLM_FALLBACK_URL/LLM_FALLBACK_MODEL) - Web UI:
http://ub01:8096 - External:
https://chrysopedia.comvia nuc01 nginx reverse proxy
For full deployment instructions and rebuild commands, see Deployment. For local development setup and common gotchas, see Development-Guide.
Where to Learn More
Every aspect of Chrysopedia is documented in the wiki:
| Topic | Wiki Page |
|---|---|
| System architecture, Docker services, network topology | Architecture |
| All 80+ API endpoints grouped by domain | API-Surface |
| SQLAlchemy models, relationships, enums | Data-Model |
| Semantic + keyword search, LightRAG cascade | Search-Retrieval |
| Streaming Q&A, multi-turn memory, fallback | Chat-Engine |
| 6-stage LLM extraction pipeline, prompt system | Pipeline |
| Audio/video player, chapter markers, timeline pins | Player |
| 10-dimension highlight scoring and review | Highlights |
| LLM-extracted creator teaching personality | Personality-Profiles |
| JWT authentication, roles, permissions | Authentication |
| Admin impersonation for debugging | Impersonation |
| Environment variables and feature flags | Configuration |
| Docker Compose setup, rebuild commands | Deployment |
| Prometheus metrics, health checks, logging | Monitoring |
| Local dev setup, common gotchas | Development-Guide |
| Architectural decisions register (D001–D048) | Decisions |
| LLM agent context injection system | Agent-Context |
Chrysopedia Wiki
Architecture
Features
- Chat-Engine
- Search-Retrieval
- Highlights
- Personality-Profiles
- Posts (via Post Editor)
Reference
Operations