docs: add Newcomer Guide — full platform walkthrough with 8 sections
parent
ba72230478
commit
0cde553f75
1 changed files with 198 additions and 0 deletions
198
Newcomer-Guide.md
Normal file
198
Newcomer-Guide.md
Normal file
|
|
@ -0,0 +1,198 @@
|
||||||
|
# Newcomer Guide
|
||||||
|
|
||||||
|
Welcome to Chrysopedia — this guide takes you from zero to productive with the entire platform.
|
||||||
|
|
||||||
|
## What is Chrysopedia
|
||||||
|
|
||||||
|
Chrysopedia is an AI-powered knowledge base for music production techniques, built from video content by creators. It extracts, structures, and makes searchable the knowledge embedded in production tutorials and livestreams. The result is a browsable, searchable library of technique pages, key moments, and creator-attributed study guides — all derived automatically from video transcripts.
|
||||||
|
|
||||||
|
## How Content Flows
|
||||||
|
|
||||||
|
Content enters Chrysopedia as video transcripts and exits as structured, searchable knowledge:
|
||||||
|
|
||||||
|
1. **Video transcripts** (Whisper large-v3 on RTX 4090) land as JSON files in a watched folder
|
||||||
|
2. The **6-stage pipeline** processes them automatically:
|
||||||
|
- **Stage 1 — Transcript Segmentation:** Splits raw transcripts into coherent segments
|
||||||
|
- **Stage 2 — Key Moment Extraction:** Identifies technique demonstrations, tips, and notable passages
|
||||||
|
- **Stage 3 — Classification & Tagging:** Assigns topic categories and tags to each key moment (7 top-level categories: Sound Design, Mixing, Arrangement, Sampling, Music Theory, Workflow, Sound Selection)
|
||||||
|
- **Stage 4 — Technique Page Synthesis:** Generates study-guide prose with v2 structured body sections, signal chains, and citations referencing source key moments
|
||||||
|
- **Stage 5 — Embedding & Indexing:** Embeds technique pages and key moments into Qdrant vector store and LightRAG knowledge graph for semantic search
|
||||||
|
- **Stage 6 — Highlight Detection:** (Optional) Scores key moments across 10 dimensions for editorial curation
|
||||||
|
3. **Results** appear as structured technique pages, searchable key moments, and cross-references between creators
|
||||||
|
|
||||||
|
Each stage is a Celery task. The pipeline orchestrator chains them and tracks status per video. See [[Pipeline]] for stage-level detail and prompt templates.
|
||||||
|
|
||||||
|
## Using the Web UI
|
||||||
|
|
||||||
|
### Search
|
||||||
|
|
||||||
|
- **Homepage search bar** or **Cmd+K** from any page opens the search interface
|
||||||
|
- Combines **semantic search** (LightRAG knowledge graph + Qdrant vectors) with **keyword search** (PostgreSQL full-text)
|
||||||
|
- Results show technique pages and key moments with creator attribution and relevance scores
|
||||||
|
- Multi-token queries use AND logic with partial-match fallback — searching "keota snare" finds content where "keota" matches the creator and "snare" matches technique content
|
||||||
|
|
||||||
|
### Browse Topics
|
||||||
|
|
||||||
|
- 7 top-level categories → sub-topics → technique pages grouped by creator
|
||||||
|
- Categories expand/collapse with CSS grid animation
|
||||||
|
- Each technique page card shows creator, topic tags, and key moment count
|
||||||
|
|
||||||
|
### Browse Creators
|
||||||
|
|
||||||
|
- Filterable list of all creators with genre tags
|
||||||
|
- Randomized default sort (no alphabetical bias)
|
||||||
|
- Click through to creator detail pages showing all their technique pages and key moments
|
||||||
|
|
||||||
|
### Technique Pages
|
||||||
|
|
||||||
|
- **Study guide prose** — LLM-synthesized content with structured sections (v2 body format)
|
||||||
|
- **Key moments index** — timestamped references back to source videos with citation markers
|
||||||
|
- **Related techniques** — cross-references to similar content from other creators
|
||||||
|
- **Table of contents** — auto-generated from section headings, displayed in sidebar
|
||||||
|
- **Reading header** — sticky section indicator bar that appears when scrolling past the page title
|
||||||
|
- **Inline player** — audio/video player with chapter markers and key moment timeline pins
|
||||||
|
|
||||||
|
### Chat
|
||||||
|
|
||||||
|
- **Creator-scoped AI chat** — ask questions about any creator's techniques
|
||||||
|
- **Citation support** — responses include numbered source references
|
||||||
|
- **Cascade retrieval** — queries search creator-specific context first, then domain (topic category), then global knowledge
|
||||||
|
- **Multi-turn memory** — conversation context persists within a session
|
||||||
|
- **Streaming responses** — SSE-based token streaming with source metadata sent first
|
||||||
|
- **Quality toolkit** — refined system prompt with baseline quality metrics
|
||||||
|
|
||||||
|
## For Creators
|
||||||
|
|
||||||
|
### Registration & Onboarding
|
||||||
|
|
||||||
|
- Standard registration with email/password
|
||||||
|
- **3-step onboarding wizard** (shown once after first login):
|
||||||
|
1. Welcome message explaining the platform
|
||||||
|
2. Content consent selection (choose which content types to publish)
|
||||||
|
3. Quick tour of available features
|
||||||
|
- Onboarding completion tracked via `onboarding_completed` flag on user profile
|
||||||
|
|
||||||
|
### Consent Dashboard
|
||||||
|
|
||||||
|
- Control which content types are published (technique pages, key moments, chat availability)
|
||||||
|
- Granular per-content-type toggles
|
||||||
|
- Changes take effect on next pipeline run
|
||||||
|
|
||||||
|
### Creator Dashboard
|
||||||
|
|
||||||
|
- Overview stats: total technique pages, key moments, video count
|
||||||
|
- Recent posts and activity feed
|
||||||
|
- Quick links to technique pages derived from your content
|
||||||
|
|
||||||
|
### Transparency Page
|
||||||
|
|
||||||
|
- View all entities, relationships, and technique pages derived from your content
|
||||||
|
- Expandable/collapsible category sections (CSS grid animation)
|
||||||
|
- Full audit trail of what the system extracted from your videos
|
||||||
|
|
||||||
|
### Data Export
|
||||||
|
|
||||||
|
- **GDPR-style ZIP download** of all derived content via `GET /creator/export`
|
||||||
|
- Includes technique pages, key moments, classifications, and metadata
|
||||||
|
- One-click download from creator dashboard
|
||||||
|
|
||||||
|
### Notifications
|
||||||
|
|
||||||
|
- **Email digest** of platform activity (new technique pages, key moments from your content)
|
||||||
|
- Configurable frequency: daily, weekly, or disabled
|
||||||
|
- **Signed unsubscribe links** (PyJWT tokens) — one-click unsubscribe without login
|
||||||
|
- Managed via notification preferences in creator settings
|
||||||
|
|
||||||
|
### Personality Profiles
|
||||||
|
|
||||||
|
- 5-tier system for chat persona customization based on creator teaching style
|
||||||
|
- LLM-extracted from creator's content patterns
|
||||||
|
- Influences how the chat engine responds to questions about that creator's techniques
|
||||||
|
- See [[Personality-Profiles]] for tier definitions
|
||||||
|
|
||||||
|
## For Admins
|
||||||
|
|
||||||
|
### Review Queue
|
||||||
|
|
||||||
|
- Approve, reject, or edit key moments organized by source video
|
||||||
|
- Bulk actions for efficient moderation
|
||||||
|
- Filter by creator, video, or processing status
|
||||||
|
|
||||||
|
### Pipeline Admin
|
||||||
|
|
||||||
|
- Monitor processing stages for all videos in the system
|
||||||
|
- Filter by creator, status (pending, processing, complete, failed)
|
||||||
|
- View per-stage timing and error details
|
||||||
|
|
||||||
|
### Usage Dashboard
|
||||||
|
|
||||||
|
- **Token consumption tracking** — LLM API usage over time
|
||||||
|
- **Top creators and users** — ranked by content volume and platform usage
|
||||||
|
- **Daily statistics** — requests, chat sessions, search queries
|
||||||
|
- **Rate limiting visibility** — sliding-window rate limiter status (Redis-backed, per-user)
|
||||||
|
|
||||||
|
### Audit Log
|
||||||
|
|
||||||
|
- All administrative actions tracked with timestamp, actor, and details
|
||||||
|
- Searchable and filterable
|
||||||
|
|
||||||
|
### User Management
|
||||||
|
|
||||||
|
- Role assignment (admin, creator, user)
|
||||||
|
- Account status management (active, suspended)
|
||||||
|
- Impersonation support for debugging user-specific issues (see [[Impersonation]])
|
||||||
|
|
||||||
|
## Adding New Content
|
||||||
|
|
||||||
|
1. **Prepare transcripts** — Run Whisper large-v3 on video/audio files (RTX 4090 recommended for speed). Output format: JSON with timestamps.
|
||||||
|
2. **Place in watched folder** — Drop transcript JSON files into the configured watch directory. The folder watcher (PollingObserver, works on ZFS/NFS) detects new files automatically.
|
||||||
|
3. **Pipeline processes automatically** — All 6 stages run in sequence via Celery task chain. Monitor progress in the Pipeline Admin panel.
|
||||||
|
4. **File stability check** — The watcher waits for file size to stabilize (2-second check) before processing, handling partial SCP/rsync writes safely.
|
||||||
|
5. **Quality control** — New key moments appear in the Review Queue for admin approval before publishing.
|
||||||
|
6. **Re-processing** — To update existing content, re-drop the transcript. The pipeline handles upserts (though Qdrant point deduplication is a known improvement area).
|
||||||
|
|
||||||
|
## Infrastructure & Deployment
|
||||||
|
|
||||||
|
Chrysopedia runs as a **Docker Compose stack on ub01**:
|
||||||
|
|
||||||
|
| Service | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| `chrysopedia-api` | FastAPI application server |
|
||||||
|
| `chrysopedia-worker` | Celery worker + Beat scheduler (email digests, periodic tasks) |
|
||||||
|
| `chrysopedia-web-8096` | nginx reverse proxy serving frontend + API routing |
|
||||||
|
| `chrysopedia-db` | PostgreSQL 16 (port 5433 externally) |
|
||||||
|
| `chrysopedia-redis` | Redis — caching, rate limiting, Celery broker, classification data |
|
||||||
|
| `chrysopedia-qdrant` | Qdrant vector database for semantic search |
|
||||||
|
| `chrysopedia-ollama` | Ollama — local LLM fallback when primary DGX endpoint is unavailable |
|
||||||
|
| `chrysopedia-lightrag` | LightRAG knowledge graph for entity-aware retrieval |
|
||||||
|
|
||||||
|
- **Primary LLM:** DGX endpoint with **automatic Ollama fallback** (fail-open, configurable via `LLM_FALLBACK_URL` / `LLM_FALLBACK_MODEL`)
|
||||||
|
- **Web UI:** `http://ub01:8096`
|
||||||
|
- **External:** `https://chrysopedia.com` via nuc01 nginx reverse proxy
|
||||||
|
|
||||||
|
For full deployment instructions and rebuild commands, see [[Deployment]].
|
||||||
|
For local development setup and common gotchas, see [[Development-Guide]].
|
||||||
|
|
||||||
|
## Where to Learn More
|
||||||
|
|
||||||
|
Every aspect of Chrysopedia is documented in the wiki:
|
||||||
|
|
||||||
|
| Topic | Wiki Page |
|
||||||
|
|-------|-----------|
|
||||||
|
| System architecture, Docker services, network topology | [[Architecture]] |
|
||||||
|
| All 80+ API endpoints grouped by domain | [[API-Surface]] |
|
||||||
|
| SQLAlchemy models, relationships, enums | [[Data-Model]] |
|
||||||
|
| Semantic + keyword search, LightRAG cascade | [[Search-Retrieval]] |
|
||||||
|
| Streaming Q&A, multi-turn memory, fallback | [[Chat-Engine]] |
|
||||||
|
| 6-stage LLM extraction pipeline, prompt system | [[Pipeline]] |
|
||||||
|
| Audio/video player, chapter markers, timeline pins | [[Player]] |
|
||||||
|
| 10-dimension highlight scoring and review | [[Highlights]] |
|
||||||
|
| LLM-extracted creator teaching personality | [[Personality-Profiles]] |
|
||||||
|
| JWT authentication, roles, permissions | [[Authentication]] |
|
||||||
|
| Admin impersonation for debugging | [[Impersonation]] |
|
||||||
|
| Environment variables and feature flags | [[Configuration]] |
|
||||||
|
| Docker Compose setup, rebuild commands | [[Deployment]] |
|
||||||
|
| Prometheus metrics, health checks, logging | [[Monitoring]] |
|
||||||
|
| Local dev setup, common gotchas | [[Development-Guide]] |
|
||||||
|
| Architectural decisions register (D001–D048) | [[Decisions]] |
|
||||||
|
| LLM agent context injection system | [[Agent-Context]] |
|
||||||
Loading…
Add table
Reference in a new issue