xpltdco/chrysopedia

Fork 0

Table of Contents

Newcomer Guide

What is Chrysopedia
How Content Flows
Using the Web UI

Search
Browse Topics
Browse Creators
Technique Pages
Chat

For Creators

Registration & Onboarding
Consent Dashboard
Creator Dashboard
Transparency Page
Data Export
Notifications
Personality Profiles

For Admins

Review Queue
Pipeline Admin
Usage Dashboard
Audit Log
User Management

Adding New Content
Infrastructure & Deployment
Where to Learn More

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Newcomer Guide

Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.

What is Chrysopedia

Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.

How Content Flows

Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:

Video transcripts (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
The 6-stage pipeline processes them automatically:
- Stage 1 — Transcript Segmentation: Splits raw transcripts into coherent segments
- Stage 2 — Key Moment Extraction: Identifies technique demonstrations, tips, and notable passages
- Stage 3 — Classification & Tagging: Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
- Stage 4 — Technique Page Synthesis: Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
- Stage 5 — Embedding & Indexing: Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
- Stage 6 — Highlight Detection: (Optional) Scores key moments across 10 dimensions for editorial curation
Results appear as structured technique pages, searchable key moments, and cross-references between creators

Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See Pipeline for stage-level detail and prompt templates.

Using the Web UI

Search

Homepage search bar or Cmd+K from any page opens the search interface
Combines semantic search (LightRAG knowledge graph + Qdrant vectors) with keyword search (PostgreSQL full-text)
Results show technique pages and key moments with creator attribution and relevance scores
Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content

Browse Topics

7 top-level categories → sub-topics → technique pages grouped by creator
Categories expand/collapse with CSS grid animation
Each technique page card shows creator, topic tags, and key moment count

Browse Creators

Filterable list of all creators with genre tags
Randomized default sort (no alphabetical bias)
Click through to creator detail pages showing all their technique pages and key moments

Technique Pages

Study guide prose — LLM-synthesized content with structured sections (v2 body format)
Key moments index — timestamped references back to source videos with citation markers
Related techniques — cross-references to similar content from other creators
Table of contents — auto-generated from section headings, displayed in sidebar
Reading header — sticky section indicator bar that appears when scrolling past the page title
Inline player — audio/video player with chapter markers and key moment timeline pins

Chat

Creator-scoped AI chat — ask questions about any creator's techniques
Citation support — responses include numbered source references
Cascade retrieval — queries search creator-specific context first, then domain (topic category), then global knowledge
Multi-turn memory — conversation context persists within a session
Streaming responses — SSE-based token streaming with source metadata sent first
Quality toolkit — refined system prompt with baseline quality metrics

For Creators

Registration & Onboarding

Standard registration with email/password
3-step onboarding wizard (shown once after first login):
1. Welcome message explaining the platform
2. Content consent selection (choose which content types to publish)
3. Quick tour of available features
Onboarding completion tracked via onboarding_completed flag on user profile

Control which content types are published (technique pages, key moments, chat availability)
Granular per-content-type toggles
Changes take effect on next pipeline run

Creator Dashboard

Overview stats: total technique pages, key moments, video count
Recent posts and activity feed
Quick links to technique pages derived from your content

Transparency Page

View all entities, relationships, and technique pages derived from your content
Expandable/collapsible category sections (CSS grid animation)
Full audit trail of what the system extracted from your videos

Data Export

GDPR-style ZIP download of all derived content via GET /creator/export
Includes technique pages, key moments, classifications, and metadata
One-click download from creator dashboard

Notifications

Email digest of platform activity (new technique pages, key moments from your content)
Configurable frequency: daily, weekly, or disabled
Signed unsubscribe links (PyJWT tokens) — one-click unsubscribe without login
Managed via notification preferences in creator settings

Personality Profiles

5-tier system for chat persona customization based on creator teaching style
LLM-extracted from creator's content patterns
Influences how the chat engine responds to questions about that creator's techniques
See Personality-Profiles for tier definitions

For Admins

Review Queue

Approve, reject, or edit key moments organized by source video
Bulk actions for efficient moderation
Filter by creator, video, or processing status

Pipeline Admin

Monitor processing stages for all videos in the system
Filter by creator, status (pending, processing, complete, failed)
View per-stage timing and error details

Usage Dashboard

Token consumption tracking — LLM API usage over time
Top creators and users — ranked by content volume and platform usage
Daily statistics — requests, chat sessions, search queries
Rate limiting visibility — sliding-window rate limiter status (Redis-backed, per-user)

Audit Log

All administrative actions tracked with timestamp, actor, and details
Searchable and filterable

User Management

Role assignment (admin, creator, user)
Account status management (active, suspended)
Impersonation support for debugging user-specific issues (see Impersonation)

Adding New Content

Prepare transcripts — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
Place in watched folder — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
Pipeline processes automatically — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
File stability check — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
Quality control — New key moments appear in the Review Queue for admin approval before publishing.
Re-processing — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).

Infrastructure & Deployment

Chrysopedia runs as a Docker Compose stack on ub01:

Service	Purpose
`chrysopedia-api`	FastAPI application server
`chrysopedia-worker`	Celery worker + Beat scheduler (email digests, periodic tasks)
`chrysopedia-web-8096`	nginx reverse proxy serving frontend + API routing
`chrysopedia-db`	PostgreSQL 16 (port 5433 externally)
`chrysopedia-redis`	Redis — caching, rate limiting, Celery broker, classification data
`chrysopedia-qdrant`	Qdrant vector database for semantic search
`chrysopedia-ollama`	Ollama — local LLM fallback when primary DGX endpoint is unavailable
`chrysopedia-lightrag`	LightRAG knowledge graph for entity-aware retrieval

Primary LLM: DGX endpoint with automatic Ollama fallback (fail-open, configurable via LLM_FALLBACK_URL / LLM_FALLBACK_MODEL)
Web UI: http://ub01:8096
External: https://chrysopedia.com via nuc01 nginx reverse proxy

For full deployment instructions and rebuild commands, see Deployment. For local development setup and common gotchas, see Development-Guide.

Where to Learn More

Every aspect of Chrysopedia is documented in the wiki:

Topic	Wiki Page
System architecture, Docker services, network topology	Architecture
All 80+ API endpoints grouped by domain	API-Surface
SQLAlchemy models, relationships, enums	Data-Model
Semantic + keyword search, LightRAG cascade	Search-Retrieval
Streaming Q&A, multi-turn memory, fallback	Chat-Engine
6-stage LLM extraction pipeline, prompt system	Pipeline
Audio/video player, chapter markers, timeline pins	Player
10-dimension highlight scoring and review	Highlights
LLM-extracted creator teaching personality	Personality-Profiles
JWT authentication, roles, permissions	Authentication
Admin impersonation for debugging	Impersonation
Environment variables and feature flags	Configuration
Docker Compose setup, rebuild commands	Deployment
Prometheus metrics, health checks, logging	Monitoring
Local dev setup, common gotchas	Development-Guide
Architectural decisions register (D001–D048)	Decisions
LLM agent context injection system	Agent-Context

Chrysopedia Wiki

Architecture

Features

Reference

Operations