Chrysopedia
From chrysopoeia (alchemical transmutation of base material into gold) + encyclopedia.
Chrysopedia transmutes raw video content into refined, searchable production knowledge.
A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Desktop (GPU workstation) │
│ ┌──────────────┐ │
│ │ whisper/ │ Transcribes video → JSON (Whisper large-v3) │
│ │ transcribe.py │ Runs locally with CUDA, outputs to /data │
│ └──────┬───────┘ │
│ │ JSON transcripts │
└─────────┼────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01) │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
│ │ chrysopedia-db │ │chrysopedia-redis│ │ chrysopedia-api │ │
│ │ PostgreSQL 16 │ │ Redis 7 │ │ FastAPI + Uvicorn│ │
│ │ :5433→5432 │ │ │ │ :8000 │ │
│ └────────────────┘ └────────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌──────────────────┐ ┌──────────────────────┐ │ │
│ │ chrysopedia-web │ │ chrysopedia-worker │ │ │
│ │ React + nginx │ │ Celery (LLM pipeline)│ │ │
│ │ :3000→80 │ │ │ │ │
│ └──────────────────┘ └──────────────────────┘ │ │
│ │ │
│ Network: chrysopedia (172.24.0.0/24) │ │
└──────────────────────────────────────────────────────────────────┘
Services
| Service |
Image / Build |
Port |
Purpose |
chrysopedia-db |
postgres:16-alpine |
5433 → 5432 |
Primary data store (7 entity schema) |
chrysopedia-redis |
redis:7-alpine |
— |
Celery broker / cache |
chrysopedia-api |
docker/Dockerfile.api |
8000 |
FastAPI REST API |
chrysopedia-worker |
docker/Dockerfile.api |
— |
Celery worker for LLM pipeline stages 2-5 |
chrysopedia-web |
docker/Dockerfile.web |
3000 → 80 |
React frontend (nginx) |
Data Model (7 entities)
- Creator — artists/producers whose content is indexed
- SourceVideo — original video files processed by the pipeline
- TranscriptSegment — timestamped text segments from Whisper
- KeyMoment — discrete insights extracted by LLM analysis
- TechniquePage — synthesized knowledge pages (primary output)
- RelatedTechniqueLink — cross-references between technique pages
- Tag — hierarchical topic/genre taxonomy
Prerequisites
- Docker ≥ 24.0 and Docker Compose ≥ 2.20
- Python 3.10+ (for the Whisper transcription script)
- ffmpeg (for audio extraction)
- NVIDIA GPU + CUDA (recommended for Whisper; CPU fallback available)
Quick Start
1. Clone and configure
git clone <repository-url>
cd content-to-kb-automator
# Create environment file from template
cp .env.example .env
# Edit .env with your actual values (see Environment Variables below)
2. Start the Docker Compose stack
docker compose up -d
This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI.
3. Run database migrations
# From inside the API container:
docker compose exec chrysopedia-api alembic upgrade head
# Or locally (requires Python venv with backend deps):
alembic upgrade head
4. Verify the stack
# Health check (with DB connectivity)
curl http://localhost:8000/health
# API health (lightweight, no DB)
curl http://localhost:8000/api/v1/health
# Docker Compose status
docker compose ps
5. Transcribe videos (desktop)
cd whisper
pip install -r requirements.txt
# Single file
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
# Batch (all videos in a directory)
python transcribe.py --input ./videos/ --output-dir ./transcripts
See whisper/README.md for full transcription documentation.
Environment Variables
Create .env from .env.example. All variables have sensible defaults for local development.
Database
| Variable |
Default |
Description |
POSTGRES_USER |
chrysopedia |
PostgreSQL username |
POSTGRES_PASSWORD |
changeme |
PostgreSQL password |
POSTGRES_DB |
chrysopedia |
Database name |
DATABASE_URL |
(composed) |
Full async connection string |
Services
| Variable |
Default |
Description |
REDIS_URL |
redis://chrysopedia-redis:6379/0 |
Redis connection string |
LLM Configuration
| Variable |
Default |
Description |
LLM_API_URL |
https://friend-openwebui.example.com/api |
Primary LLM endpoint (OpenAI-compatible) |
LLM_API_KEY |
sk-changeme |
API key for primary LLM |
LLM_MODEL |
qwen2.5-72b |
Primary model name |
LLM_FALLBACK_URL |
http://localhost:11434/v1 |
Fallback LLM endpoint (Ollama) |
LLM_FALLBACK_MODEL |
qwen2.5:14b-q8_0 |
Fallback model name |
Embedding / Vector
| Variable |
Default |
Description |
EMBEDDING_API_URL |
http://localhost:11434/v1 |
Embedding endpoint |
EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name |
QDRANT_URL |
http://qdrant:6333 |
Qdrant vector DB URL |
QDRANT_COLLECTION |
chrysopedia |
Qdrant collection name |
Application
| Variable |
Default |
Description |
APP_ENV |
production |
Environment (development / production) |
APP_LOG_LEVEL |
info |
Log level |
APP_SECRET_KEY |
changeme-generate-a-real-secret |
Application secret key |
TRANSCRIPT_STORAGE_PATH |
/data/transcripts |
Transcript JSON storage path |
VIDEO_METADATA_PATH |
/data/video_meta |
Video metadata storage path |
REVIEW_MODE |
true |
Enable human review workflow |
Development Workflow
Local development (without Docker)
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install backend dependencies
pip install -r backend/requirements.txt
# Start PostgreSQL and Redis (via Docker)
docker compose up -d chrysopedia-db chrysopedia-redis
# Run migrations
alembic upgrade head
# Start the API server with hot-reload
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
Database migrations
# Create a new migration after model changes
alembic revision --autogenerate -m "describe_change"
# Apply all pending migrations
alembic upgrade head
# Rollback one migration
alembic downgrade -1
Project structure
content-to-kb-automator/
├── backend/ # FastAPI application
│ ├── main.py # App entry point, middleware, routers
│ ├── config.py # pydantic-settings configuration
│ ├── database.py # SQLAlchemy async engine + session
│ ├── models.py # 7-entity ORM models
│ ├── schemas.py # Pydantic request/response schemas
│ ├── routers/ # API route handlers
│ │ ├── health.py # /health (DB check)
│ │ ├── creators.py # /api/v1/creators
│ │ └── videos.py # /api/v1/videos
│ └── requirements.txt # Python dependencies
├── whisper/ # Desktop transcription script
│ ├── transcribe.py # Whisper CLI tool
│ ├── requirements.txt # Whisper + ffmpeg deps
│ └── README.md # Transcription documentation
├── docker/ # Dockerfiles
│ ├── Dockerfile.api # FastAPI + Celery image
│ ├── Dockerfile.web # React + nginx image
│ └── nginx.conf # nginx reverse proxy config
├── alembic/ # Database migrations
│ ├── env.py # Migration environment
│ └── versions/ # Migration scripts
├── config/ # Configuration files
│ └── canonical_tags.yaml # 6 topic categories + genre taxonomy
├── prompts/ # LLM prompt templates (editable)
├── frontend/ # React web UI (placeholder)
├── tests/ # Test fixtures and test suites
│ └── fixtures/ # Sample data for testing
├── docker-compose.yml # Full stack definition
├── alembic.ini # Alembic configuration
├── .env.example # Environment variable template
└── chrysopedia-spec.md # Full project specification
API Endpoints
| Method |
Path |
Description |
| GET |
/health |
Health check with DB connectivity |
| GET |
/api/v1/health |
Lightweight health (no DB) |
| GET |
/api/v1/creators |
List all creators |
| GET |
/api/v1/creators/{slug} |
Get creator by slug |
| GET |
/api/v1/videos |
List all source videos |
XPLTD Conventions
This project follows XPLTD infrastructure conventions:
- Docker project name:
xpltd_chrysopedia
- Bind mounts: persistent data stored under
/vmPool/r/services/
- Network: dedicated bridge
chrysopedia (172.32.0.0/24)
- PostgreSQL host port:
5433 (avoids conflict with system PostgreSQL on 5432)
Deployment (ub01)
The production stack runs on ub01.a.xpltd.co:
# Clone (first time only — requires SSH agent forwarding)
ssh -A ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
git clone git@github.com:xpltdco/chrysopedia.git .
# Create .env from template
cp .env.example .env
# Edit .env with production secrets
# Build and start
docker compose build
docker compose up -d
# Run migrations
docker exec chrysopedia-api alembic upgrade head
# Pull embedding model (first time only)
docker exec chrysopedia-ollama ollama pull nomic-embed-text
Service URLs
Update Workflow
ssh -A ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
git pull
docker compose build && docker compose up -d