docs: add Newcomer Guide — full platform walkthrough with 8 sections
parent
ba72230478
commit
0cde553f75
1 changed files with 198 additions and 0 deletions
198
Newcomer-Guide.md
Normal file
198
Newcomer-Guide.md
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
# Newcomer Guide
|
||||
|
||||
Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.
|
||||
|
||||
## What is Chrysopedia
|
||||
|
||||
Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.
|
||||
|
||||
## How Content Flows
|
||||
|
||||
Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:
|
||||
|
||||
1. **Video transcripts** (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
|
||||
2. The **6-stage pipeline** processes them automatically:
|
||||
- **Stage 1 — Transcript Segmentation:** Splits raw transcripts into coherent segments
|
||||
- **Stage 2 — Key Moment Extraction:** Identifies technique demonstrations, tips, and notable passages
|
||||
- **Stage 3 — Classification & Tagging:** Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
|
||||
- **Stage 4 — Technique Page Synthesis:** Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
|
||||
- **Stage 5 — Embedding & Indexing:** Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
|
||||
- **Stage 6 — Highlight Detection:** (Optional) Scores key moments across 10 dimensions for editorial curation
|
||||
3. **Results** appear as structured technique pages, searchable key moments, and cross-references between creators
|
||||
|
||||
Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See [[Pipeline]] for stage-level detail and prompt templates.
|
||||
|
||||
## Using the Web UI
|
||||
|
||||
### Search
|
||||
|
||||
- **Homepage search bar** or **Cmd+K** from any page opens the search interface
|
||||
- Combines **semantic search** (LightRAG knowledge graph + Qdrant vectors) with **keyword search** (PostgreSQL full-text)
|
||||
- Results show technique pages and key moments with creator attribution and relevance scores
|
||||
- Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content
|
||||
|
||||
### Browse Topics
|
||||
|
||||
- 7 top-level categories → sub-topics → technique pages grouped by creator
|
||||
- Categories expand/collapse with CSS grid animation
|
||||
- Each technique page card shows creator, topic tags, and key moment count
|
||||
|
||||
### Browse Creators
|
||||
|
||||
- Filterable list of all creators with genre tags
|
||||
- Randomized default sort (no alphabetical bias)
|
||||
- Click through to creator detail pages showing all their technique pages and key moments
|
||||
|
||||
### Technique Pages
|
||||
|
||||
- **Study guide prose** — LLM-synthesized content with structured sections (v2 body format)
|
||||
- **Key moments index** — timestamped references back to source videos with citation markers
|
||||
- **Related techniques** — cross-references to similar content from other creators
|
||||
- **Table of contents** — auto-generated from section headings, displayed in sidebar
|
||||
- **Reading header** — sticky section indicator bar that appears when scrolling past the page title
|
||||
- **Inline player** — audio/video player with chapter markers and key moment timeline pins
|
||||
|
||||
### Chat
|
||||
|
||||
- **Creator-scoped AI chat** — ask questions about any creator's techniques
|
||||
- **Citation support** — responses include numbered source references
|
||||
- **Cascade retrieval** — queries search creator-specific context first, then domain (topic category), then global knowledge
|
||||
- **Multi-turn memory** — conversation context persists within a session
|
||||
- **Streaming responses** — SSE-based token streaming with source metadata sent first
|
||||
- **Quality toolkit** — refined system prompt with baseline quality metrics
|
||||
|
||||
## For Creators
|
||||
|
||||
### Registration & Onboarding
|
||||
|
||||
- Standard registration with email/password
|
||||
- **3-step onboarding wizard** (shown once after first login):
|
||||
1. Welcome message explaining the platform
|
||||
2. Content consent selection (choose which content types to publish)
|
||||
3. Quick tour of available features
|
||||
- Onboarding completion tracked via `onboarding_completed` flag on user profile
|
||||
|
||||
### Consent Dashboard
|
||||
|
||||
- Control which content types are published (technique pages, key moments, chat availability)
|
||||
- Granular per-content-type toggles
|
||||
- Changes take effect on next pipeline run
|
||||
|
||||
### Creator Dashboard
|
||||
|
||||
- Overview stats: total technique pages, key moments, video count
|
||||
- Recent posts and activity feed
|
||||
- Quick links to technique pages derived from your content
|
||||
|
||||
### Transparency Page
|
||||
|
||||
- View all entities, relationships, and technique pages derived from your content
|
||||
- Expandable/collapsible category sections (CSS grid animation)
|
||||
- Full audit trail of what the system extracted from your videos
|
||||
|
||||
### Data Export
|
||||
|
||||
- **GDPR-style ZIP download** of all derived content via `GET /creator/export`
|
||||
- Includes technique pages, key moments, classifications, and metadata
|
||||
- One-click download from creator dashboard
|
||||
|
||||
### Notifications
|
||||
|
||||
- **Email digest** of platform activity (new technique pages, key moments from your content)
|
||||
- Configurable frequency: daily, weekly, or disabled
|
||||
- **Signed unsubscribe links** (PyJWT tokens) — one-click unsubscribe without login
|
||||
- Managed via notification preferences in creator settings
|
||||
|
||||
### Personality Profiles
|
||||
|
||||
- 5-tier system for chat persona customization based on creator teaching style
|
||||
- LLM-extracted from creator's content patterns
|
||||
- Influences how the chat engine responds to questions about that creator's techniques
|
||||
- See [[Personality-Profiles]] for tier definitions
|
||||
|
||||
## For Admins
|
||||
|
||||
### Review Queue
|
||||
|
||||
- Approve, reject, or edit key moments organized by source video
|
||||
- Bulk actions for efficient moderation
|
||||
- Filter by creator, video, or processing status
|
||||
|
||||
### Pipeline Admin
|
||||
|
||||
- Monitor processing stages for all videos in the system
|
||||
- Filter by creator, status (pending, processing, complete, failed)
|
||||
- View per-stage timing and error details
|
||||
|
||||
### Usage Dashboard
|
||||
|
||||
- **Token consumption tracking** — LLM API usage over time
|
||||
- **Top creators and users** — ranked by content volume and platform usage
|
||||
- **Daily statistics** — requests, chat sessions, search queries
|
||||
- **Rate limiting visibility** — sliding-window rate limiter status (Redis-backed, per-user)
|
||||
|
||||
### Audit Log
|
||||
|
||||
- All administrative actions tracked with timestamp, actor, and details
|
||||
- Searchable and filterable
|
||||
|
||||
### User Management
|
||||
|
||||
- Role assignment (admin, creator, user)
|
||||
- Account status management (active, suspended)
|
||||
- Impersonation support for debugging user-specific issues (see [[Impersonation]])
|
||||
|
||||
## Adding New Content
|
||||
|
||||
1. **Prepare transcripts** — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
|
||||
2. **Place in watched folder** — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
|
||||
3. **Pipeline processes automatically** — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
|
||||
4. **File stability check** — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
|
||||
5. **Quality control** — New key moments appear in the Review Queue for admin approval before publishing.
|
||||
6. **Re-processing** — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).
|
||||
|
||||
## Infrastructure & Deployment
|
||||
|
||||
Chrysopedia runs as a **Docker Compose stack on ub01**:
|
||||
|
||||
| Service | Purpose |
|
||||
|---------|---------|
|
||||
| `chrysopedia-api` | FastAPI application server |
|
||||
| `chrysopedia-worker` | Celery worker + Beat scheduler (email digests, periodic tasks) |
|
||||
| `chrysopedia-web-8096` | nginx reverse proxy serving frontend + API routing |
|
||||
| `chrysopedia-db` | PostgreSQL 16 (port 5433 externally) |
|
||||
| `chrysopedia-redis` | Redis — caching, rate limiting, Celery broker, classification data |
|
||||
| `chrysopedia-qdrant` | Qdrant vector database for semantic search |
|
||||
| `chrysopedia-ollama` | Ollama — local LLM fallback when primary DGX endpoint is unavailable |
|
||||
| `chrysopedia-lightrag` | LightRAG knowledge graph for entity-aware retrieval |
|
||||
|
||||
- **Primary LLM:** DGX endpoint with **automatic Ollama fallback** (fail-open, configurable via `LLM_FALLBACK_URL` / `LLM_FALLBACK_MODEL`)
|
||||
- **Web UI:** `http://ub01:8096`
|
||||
- **External:** `https://chrysopedia.com` via nuc01 nginx reverse proxy
|
||||
|
||||
For full deployment instructions and rebuild commands, see [[Deployment]].
|
||||
For local development setup and common gotchas, see [[Development-Guide]].
|
||||
|
||||
## Where to Learn More
|
||||
|
||||
Every aspect of Chrysopedia is documented in the wiki:
|
||||
|
||||
| Topic | Wiki Page |
|
||||
|-------|-----------|
|
||||
| System architecture, Docker services, network topology | [[Architecture]] |
|
||||
| All 80+ API endpoints grouped by domain | [[API-Surface]] |
|
||||
| SQLAlchemy models, relationships, enums | [[Data-Model]] |
|
||||
| Semantic + keyword search, LightRAG cascade | [[Search-Retrieval]] |
|
||||
| Streaming Q&A, multi-turn memory, fallback | [[Chat-Engine]] |
|
||||
| 6-stage LLM extraction pipeline, prompt system | [[Pipeline]] |
|
||||
| Audio/video player, chapter markers, timeline pins | [[Player]] |
|
||||
| 10-dimension highlight scoring and review | [[Highlights]] |
|
||||
| LLM-extracted creator teaching personality | [[Personality-Profiles]] |
|
||||
| JWT authentication, roles, permissions | [[Authentication]] |
|
||||
| Admin impersonation for debugging | [[Impersonation]] |
|
||||
| Environment variables and feature flags | [[Configuration]] |
|
||||
| Docker Compose setup, rebuild commands | [[Deployment]] |
|
||||
| Prometheus metrics, health checks, logging | [[Monitoring]] |
|
||||
| Local dev setup, common gotchas | [[Development-Guide]] |
|
||||
| Architectural decisions register (D001–D048) | [[Decisions]] |
|
||||
| LLM agent context injection system | [[Agent-Context]] |
|
||||
Loading…
Add table
Reference in a new issue