docs: add Newcomer Guide — full platform walkthrough with 8 sections

jlightner 2026-04-04 10:34:35 -05:00
parent ba72230478
commit 0cde553f75

198
Newcomer-Guide.md Normal file

@ -0,0 +1,198 @@
# Newcomer Guide
Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.
## What is Chrysopedia
Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.
## How Content Flows
Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:
1. **Video transcripts** (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
2. The **6-stage pipeline** processes them automatically:
- **Stage 1 — Transcript Segmentation:** Splits raw transcripts into coherent segments
- **Stage 2 — Key Moment Extraction:** Identifies technique demonstrations, tips, and notable passages
- **Stage 3 — Classification & Tagging:** Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
- **Stage 4 — Technique Page Synthesis:** Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
- **Stage 5 — Embedding & Indexing:** Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
- **Stage 6 — Highlight Detection:** (Optional) Scores key moments across 10 dimensions for editorial curation
3. **Results** appear as structured technique pages, searchable key moments, and cross-references between creators
Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See [[Pipeline]] for stage-level detail and prompt templates.
## Using the Web UI
### Search
- **Homepage search bar** or **Cmd+K** from any page opens the search interface
- Combines **semantic search** (LightRAG knowledge graph + Qdrant vectors) with **keyword search** (PostgreSQL full-text)
- Results show technique pages and key moments with creator attribution and relevance scores
- Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content
### Browse Topics
- 7 top-level categories → sub-topics → technique pages grouped by creator
- Categories expand/collapse with CSS grid animation
- Each technique page card shows creator, topic tags, and key moment count
### Browse Creators
- Filterable list of all creators with genre tags
- Randomized default sort (no alphabetical bias)
- Click through to creator detail pages showing all their technique pages and key moments
### Technique Pages
- **Study guide prose** — LLM-synthesized content with structured sections (v2 body format)
- **Key moments index** — timestamped references back to source videos with citation markers
- **Related techniques** — cross-references to similar content from other creators
- **Table of contents** — auto-generated from section headings, displayed in sidebar
- **Reading header** — sticky section indicator bar that appears when scrolling past the page title
- **Inline player** — audio/video player with chapter markers and key moment timeline pins
### Chat
- **Creator-scoped AI chat** — ask questions about any creator's techniques
- **Citation support** — responses include numbered source references
- **Cascade retrieval** — queries search creator-specific context first, then domain (topic category), then global knowledge
- **Multi-turn memory** — conversation context persists within a session
- **Streaming responses** — SSE-based token streaming with source metadata sent first
- **Quality toolkit** — refined system prompt with baseline quality metrics
## For Creators
### Registration & Onboarding
- Standard registration with email/password
- **3-step onboarding wizard** (shown once after first login):
1. Welcome message explaining the platform
2. Content consent selection (choose which content types to publish)
3. Quick tour of available features
- Onboarding completion tracked via `onboarding_completed` flag on user profile
### Consent Dashboard
- Control which content types are published (technique pages, key moments, chat availability)
- Granular per-content-type toggles
- Changes take effect on next pipeline run
### Creator Dashboard
- Overview stats: total technique pages, key moments, video count
- Recent posts and activity feed
- Quick links to technique pages derived from your content
### Transparency Page
- View all entities, relationships, and technique pages derived from your content
- Expandable/collapsible category sections (CSS grid animation)
- Full audit trail of what the system extracted from your videos
### Data Export
- **GDPR-style ZIP download** of all derived content via `GET /creator/export`
- Includes technique pages, key moments, classifications, and metadata
- One-click download from creator dashboard
### Notifications
- **Email digest** of platform activity (new technique pages, key moments from your content)
- Configurable frequency: daily, weekly, or disabled
- **Signed unsubscribe links** (PyJWT tokens) — one-click unsubscribe without login
- Managed via notification preferences in creator settings
### Personality Profiles
- 5-tier system for chat persona customization based on creator teaching style
- LLM-extracted from creator's content patterns
- Influences how the chat engine responds to questions about that creator's techniques
- See [[Personality-Profiles]] for tier definitions
## For Admins
### Review Queue
- Approve, reject, or edit key moments organized by source video
- Bulk actions for efficient moderation
- Filter by creator, video, or processing status
### Pipeline Admin
- Monitor processing stages for all videos in the system
- Filter by creator, status (pending, processing, complete, failed)
- View per-stage timing and error details
### Usage Dashboard
- **Token consumption tracking** — LLM API usage over time
- **Top creators and users** — ranked by content volume and platform usage
- **Daily statistics** — requests, chat sessions, search queries
- **Rate limiting visibility** — sliding-window rate limiter status (Redis-backed, per-user)
### Audit Log
- All administrative actions tracked with timestamp, actor, and details
- Searchable and filterable
### User Management
- Role assignment (admin, creator, user)
- Account status management (active, suspended)
- Impersonation support for debugging user-specific issues (see [[Impersonation]])
## Adding New Content
1. **Prepare transcripts** — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
2. **Place in watched folder** — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
3. **Pipeline processes automatically** — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
4. **File stability check** — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
5. **Quality control** — New key moments appear in the Review Queue for admin approval before publishing.
6. **Re-processing** — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).
## Infrastructure & Deployment
Chrysopedia runs as a **Docker Compose stack on ub01**:
| Service | Purpose |
|---------|---------|
| `chrysopedia-api` | FastAPI application server |
| `chrysopedia-worker` | Celery worker + Beat scheduler (email digests, periodic tasks) |
| `chrysopedia-web-8096` | nginx reverse proxy serving frontend + API routing |
| `chrysopedia-db` | PostgreSQL 16 (port 5433 externally) |
| `chrysopedia-redis` | Redis — caching, rate limiting, Celery broker, classification data |
| `chrysopedia-qdrant` | Qdrant vector database for semantic search |
| `chrysopedia-ollama` | Ollama — local LLM fallback when primary DGX endpoint is unavailable |
| `chrysopedia-lightrag` | LightRAG knowledge graph for entity-aware retrieval |
- **Primary LLM:** DGX endpoint with **automatic Ollama fallback** (fail-open, configurable via `LLM_FALLBACK_URL` / `LLM_FALLBACK_MODEL`)
- **Web UI:** `http://ub01:8096`
- **External:** `https://chrysopedia.com` via nuc01 nginx reverse proxy
For full deployment instructions and rebuild commands, see [[Deployment]].
For local development setup and common gotchas, see [[Development-Guide]].
## Where to Learn More
Every aspect of Chrysopedia is documented in the wiki:
| Topic | Wiki Page |
|-------|-----------|
| System architecture, Docker services, network topology | [[Architecture]] |
| All 80+ API endpoints grouped by domain | [[API-Surface]] |
| SQLAlchemy models, relationships, enums | [[Data-Model]] |
| Semantic + keyword search, LightRAG cascade | [[Search-Retrieval]] |
| Streaming Q&A, multi-turn memory, fallback | [[Chat-Engine]] |
| 6-stage LLM extraction pipeline, prompt system | [[Pipeline]] |
| Audio/video player, chapter markers, timeline pins | [[Player]] |
| 10-dimension highlight scoring and review | [[Highlights]] |
| LLM-extracted creator teaching personality | [[Personality-Profiles]] |
| JWT authentication, roles, permissions | [[Authentication]] |
| Admin impersonation for debugging | [[Impersonation]] |
| Environment variables and feature flags | [[Configuration]] |
| Docker Compose setup, rebuild commands | [[Deployment]] |
| Prometheus metrics, health checks, logging | [[Monitoring]] |
| Local dev setup, common gotchas | [[Development-Guide]] |
| Architectural decisions register (D001D048) | [[Decisions]] |
| LLM agent context injection system | [[Agent-Context]] |