docs: add Newcomer Guide — full platform walkthrough with 8 sections

2026-04-04 10:34:35 -05:00 · 2026-04-04 10:34:35 -05:00 · 0cde553f75
commit 0cde553f75
parent ba72230478
1 changed files with 198 additions and 0 deletions
--- a/Newcomer-Guide.md
+++ b/Newcomer-Guide.md
@ -0,0 +1,198 @@
 # Newcomer Guide
 Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.
 ## What is Chrysopedia
 Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.
 ## How Content Flows
 Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:
 1. **Video transcripts** (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
 2. The **6-stage pipeline** processes them automatically:
   - **Stage 1 — Transcript Segmentation:** Splits raw transcripts into coherent segments
   - **Stage 2 — Key Moment Extraction:** Identifies technique demonstrations, tips, and notable passages
   - **Stage 3 — Classification & Tagging:** Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
   - **Stage 4 — Technique Page Synthesis:** Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
   - **Stage 5 — Embedding & Indexing:** Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
   - **Stage 6 — Highlight Detection:** (Optional) Scores key moments across 10 dimensions for editorial curation
 3. **Results** appear as structured technique pages, searchable key moments, and cross-references between creators
 Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See [[Pipeline]] for stage-level detail and prompt templates.
 ## Using the Web UI
 ### Search
 - **Homepage search bar** or **Cmd+K** from any page opens the search interface
 - Combines **semantic search** (LightRAG knowledge graph + Qdrant vectors) with **keyword search** (PostgreSQL full-text)
 - Results show technique pages and key moments with creator attribution and relevance scores
 - Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content
 ### Browse Topics
 - 7 top-level categories → sub-topics → technique pages grouped by creator
 - Categories expand/collapse with CSS grid animation
 - Each technique page card shows creator, topic tags, and key moment count
 ### Browse Creators
 - Filterable list of all creators with genre tags
 - Randomized default sort (no alphabetical bias)
 - Click through to creator detail pages showing all their technique pages and key moments
 ### Technique Pages
 - **Study guide prose** — LLM-synthesized content with structured sections (v2 body format)
 - **Key moments index** — timestamped references back to source videos with citation markers
 - **Related techniques** — cross-references to similar content from other creators
 - **Table of contents** — auto-generated from section headings, displayed in sidebar
 - **Reading header** — sticky section indicator bar that appears when scrolling past the page title
 - **Inline player** — audio/video player with chapter markers and key moment timeline pins
 ### Chat
 - **Creator-scoped AI chat** — ask questions about any creator's techniques
 - **Citation support** — responses include numbered source references
 - **Cascade retrieval** — queries search creator-specific context first, then domain (topic category), then global knowledge
 - **Multi-turn memory** — conversation context persists within a session
 - **Streaming responses** — SSE-based token streaming with source metadata sent first
 - **Quality toolkit** — refined system prompt with baseline quality metrics
 ## For Creators
 ### Registration & Onboarding
 - Standard registration with email/password
 - **3-step onboarding wizard** (shown once after first login):
  1. Welcome message explaining the platform
  2. Content consent selection (choose which content types to publish)
  3. Quick tour of available features
 - Onboarding completion tracked via `onboarding_completed` flag on user profile
 ### Consent Dashboard
 - Control which content types are published (technique pages, key moments, chat availability)
 - Granular per-content-type toggles
 - Changes take effect on next pipeline run
 ### Creator Dashboard
 - Overview stats: total technique pages, key moments, video count
 - Recent posts and activity feed
 - Quick links to technique pages derived from your content
 ### Transparency Page
 - View all entities, relationships, and technique pages derived from your content
 - Expandable/collapsible category sections (CSS grid animation)
 - Full audit trail of what the system extracted from your videos
 ### Data Export
 - **GDPR-style ZIP download** of all derived content via `GET /creator/export`
 - Includes technique pages, key moments, classifications, and metadata
 - One-click download from creator dashboard
 ### Notifications
 - **Email digest** of platform activity (new technique pages, key moments from your content)
 - Configurable frequency: daily, weekly, or disabled
 - **Signed unsubscribe links** (PyJWT tokens) — one-click unsubscribe without login
 - Managed via notification preferences in creator settings
 ### Personality Profiles
 - 5-tier system for chat persona customization based on creator teaching style
 - LLM-extracted from creator's content patterns
 - Influences how the chat engine responds to questions about that creator's techniques
 - See [[Personality-Profiles]] for tier definitions
 ## For Admins
 ### Review Queue
 - Approve, reject, or edit key moments organized by source video
 - Bulk actions for efficient moderation
 - Filter by creator, video, or processing status
 ### Pipeline Admin
 - Monitor processing stages for all videos in the system
 - Filter by creator, status (pending, processing, complete, failed)
 - View per-stage timing and error details
 ### Usage Dashboard
 - **Token consumption tracking** — LLM API usage over time
 - **Top creators and users** — ranked by content volume and platform usage
 - **Daily statistics** — requests, chat sessions, search queries
 - **Rate limiting visibility** — sliding-window rate limiter status (Redis-backed, per-user)
 ### Audit Log
 - All administrative actions tracked with timestamp, actor, and details
 - Searchable and filterable
 ### User Management
 - Role assignment (admin, creator, user)
 - Account status management (active, suspended)
 - Impersonation support for debugging user-specific issues (see [[Impersonation]])
 ## Adding New Content
 1. **Prepare transcripts** — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
 2. **Place in watched folder** — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
 3. **Pipeline processes automatically** — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
 4. **File stability check** — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
 5. **Quality control** — New key moments appear in the Review Queue for admin approval before publishing.
 6. **Re-processing** — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).
 ## Infrastructure & Deployment
 Chrysopedia runs as a **Docker Compose stack on ub01**:
 | Service | Purpose |
 |---------|---------|
 | `chrysopedia-api` | FastAPI application server |
 | `chrysopedia-worker` | Celery worker + Beat scheduler (email digests, periodic tasks) |
 | `chrysopedia-web-8096` | nginx reverse proxy serving frontend + API routing |
 | `chrysopedia-db` | PostgreSQL 16 (port 5433 externally) |
 | `chrysopedia-redis` | Redis — caching, rate limiting, Celery broker, classification data |
 | `chrysopedia-qdrant` | Qdrant vector database for semantic search |
 | `chrysopedia-ollama` | Ollama — local LLM fallback when primary DGX endpoint is unavailable |
 | `chrysopedia-lightrag` | LightRAG knowledge graph for entity-aware retrieval |
 - **Primary LLM:** DGX endpoint with **automatic Ollama fallback** (fail-open, configurable via `LLM_FALLBACK_URL` / `LLM_FALLBACK_MODEL`)
 - **Web UI:** `http://ub01:8096`
 - **External:** `https://chrysopedia.com` via nuc01 nginx reverse proxy
 For full deployment instructions and rebuild commands, see [[Deployment]].
 For local development setup and common gotchas, see [[Development-Guide]].
 ## Where to Learn More
 Every aspect of Chrysopedia is documented in the wiki:
 | Topic | Wiki Page |
 |-------|-----------|
 | System architecture, Docker services, network topology | [[Architecture]] |
 | All 80+ API endpoints grouped by domain | [[API-Surface]] |
 | SQLAlchemy models, relationships, enums | [[Data-Model]] |
 | Semantic + keyword search, LightRAG cascade | [[Search-Retrieval]] |
 | Streaming Q&A, multi-turn memory, fallback | [[Chat-Engine]] |
 | 6-stage LLM extraction pipeline, prompt system | [[Pipeline]] |
 | Audio/video player, chapter markers, timeline pins | [[Player]] |
 | 10-dimension highlight scoring and review | [[Highlights]] |
 | LLM-extracted creator teaching personality | [[Personality-Profiles]] |
 | JWT authentication, roles, permissions | [[Authentication]] |
 | Admin impersonation for debugging | [[Impersonation]] |
 | Environment variables and feature flags | [[Configuration]] |
 | Docker Compose setup, rebuild commands | [[Deployment]] |
 | Prometheus metrics, health checks, logging | [[Monitoring]] |
 | Local dev setup, common gotchas | [[Development-Guide]] |
 | Architectural decisions register (D001–D048) | [[Decisions]] |
 | LLM agent context injection system | [[Agent-Context]] |