# Chrysopedia > From *chrysopoeia* (alchemical transmutation of base material into gold) + *encyclopedia*. > Chrysopedia transmutes raw video content into refined, searchable production knowledge. A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval. --- ## Architecture ``` ┌──────────────────────────────────────────────────────────────────┐ │ Desktop (GPU workstation) │ │ ┌──────────────┐ │ │ │ whisper/ │ Transcribes video → JSON (Whisper large-v3) │ │ │ transcribe.py │ Runs locally with CUDA, outputs to /data │ │ └──────┬───────┘ │ │ │ JSON transcripts │ └─────────┼────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────────────────────────────────┐ │ Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01) │ │ │ │ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │ │ │ chrysopedia-db │ │chrysopedia-redis│ │ chrysopedia-api │ │ │ │ PostgreSQL 16 │ │ Redis 7 │ │ FastAPI + Uvicorn│ │ │ │ :5433→5432 │ │ │ │ :8000 │ │ │ └────────────────┘ └────────────────┘ └────────┬─────────┘ │ │ │ │ │ ┌──────────────────┐ ┌──────────────────────┐ │ │ │ │ chrysopedia-web │ │ chrysopedia-worker │ │ │ │ │ React + nginx │ │ Celery (LLM pipeline)│ │ │ │ │ :3000→80 │ │ │ │ │ │ └──────────────────┘ └──────────────────────┘ │ │ │ │ │ │ Network: chrysopedia (172.24.0.0/24) │ │ └──────────────────────────────────────────────────────────────────┘ ``` ### Services | Service | Image / Build | Port | Purpose | |----------------------|------------------------|---------------|--------------------------------------------| | `chrysopedia-db` | `postgres:16-alpine` | `5433 → 5432` | Primary data store (7 entity schema) | | `chrysopedia-redis` | `redis:7-alpine` | — | Celery broker / cache | | `chrysopedia-api` | `docker/Dockerfile.api`| `8000` | FastAPI REST API | | `chrysopedia-worker` | `docker/Dockerfile.api`| — | Celery worker for LLM pipeline stages 2-5 | | `chrysopedia-web` | `docker/Dockerfile.web`| `3000 → 80` | React frontend (nginx) | ### Data Model (7 entities) - **Creator** — artists/producers whose content is indexed - **SourceVideo** — original video files processed by the pipeline - **TranscriptSegment** — timestamped text segments from Whisper - **KeyMoment** — discrete insights extracted by LLM analysis - **TechniquePage** — synthesized knowledge pages (primary output) - **RelatedTechniqueLink** — cross-references between technique pages - **Tag** — hierarchical topic/genre taxonomy --- ## Prerequisites - **Docker** ≥ 24.0 and **Docker Compose** ≥ 2.20 - **Python 3.10+** (for the Whisper transcription script) - **ffmpeg** (for audio extraction) - **NVIDIA GPU + CUDA** (recommended for Whisper; CPU fallback available) --- ## Quick Start ### 1. Clone and configure ```bash git clone cd content-to-kb-automator # Create environment file from template cp .env.example .env # Edit .env with your actual values (see Environment Variables below) ``` ### 2. Start the Docker Compose stack ```bash docker compose up -d ``` This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI. ### 3. Run database migrations ```bash # From inside the API container: docker compose exec chrysopedia-api alembic upgrade head # Or locally (requires Python venv with backend deps): alembic upgrade head ``` ### 4. Verify the stack ```bash # Health check (with DB connectivity) curl http://localhost:8000/health # API health (lightweight, no DB) curl http://localhost:8000/api/v1/health # Docker Compose status docker compose ps ``` ### 5. Transcribe videos (desktop) ```bash cd whisper pip install -r requirements.txt # Single file python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts # Batch (all videos in a directory) python transcribe.py --input ./videos/ --output-dir ./transcripts ``` See [`whisper/README.md`](whisper/README.md) for full transcription documentation. --- ## Environment Variables Create `.env` from `.env.example`. All variables have sensible defaults for local development. ### Database | Variable | Default | Description | |--------------------|----------------|---------------------------------| | `POSTGRES_USER` | `chrysopedia` | PostgreSQL username | | `POSTGRES_PASSWORD`| `changeme` | PostgreSQL password | | `POSTGRES_DB` | `chrysopedia` | Database name | | `DATABASE_URL` | *(composed)* | Full async connection string | ### Services | Variable | Default | Description | |-----------------|------------------------------------|--------------------------| | `REDIS_URL` | `redis://chrysopedia-redis:6379/0` | Redis connection string | ### LLM Configuration | Variable | Default | Description | |---------------------|-------------------------------------------|------------------------------------| | `LLM_API_URL` | `https://friend-openwebui.example.com/api`| Primary LLM endpoint (OpenAI-compatible) | | `LLM_API_KEY` | `sk-changeme` | API key for primary LLM | | `LLM_MODEL` | `qwen2.5-72b` | Primary model name | | `LLM_FALLBACK_URL` | `http://localhost:11434/v1` | Fallback LLM endpoint (Ollama) | | `LLM_FALLBACK_MODEL`| `qwen2.5:14b-q8_0` | Fallback model name | ### Embedding / Vector | Variable | Default | Description | |-----------------------|-------------------------------|--------------------------| | `EMBEDDING_API_URL` | `http://localhost:11434/v1` | Embedding endpoint | | `EMBEDDING_MODEL` | `nomic-embed-text` | Embedding model name | | `QDRANT_URL` | `http://qdrant:6333` | Qdrant vector DB URL | | `QDRANT_COLLECTION` | `chrysopedia` | Qdrant collection name | ### Application | Variable | Default | Description | |--------------------------|----------------------------------|--------------------------------| | `APP_ENV` | `production` | Environment (`development` / `production`) | | `APP_LOG_LEVEL` | `info` | Log level | | `APP_SECRET_KEY` | `changeme-generate-a-real-secret`| Application secret key | | `TRANSCRIPT_STORAGE_PATH`| `/data/transcripts` | Transcript JSON storage path | | `VIDEO_METADATA_PATH` | `/data/video_meta` | Video metadata storage path | | `REVIEW_MODE` | `true` | Enable human review workflow | --- ## Development Workflow ### Local development (without Docker) ```bash # Create virtual environment python -m venv .venv source .venv/bin/activate # Install backend dependencies pip install -r backend/requirements.txt # Start PostgreSQL and Redis (via Docker) docker compose up -d chrysopedia-db chrysopedia-redis # Run migrations alembic upgrade head # Start the API server with hot-reload cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000 ``` ### Database migrations ```bash # Create a new migration after model changes alembic revision --autogenerate -m "describe_change" # Apply all pending migrations alembic upgrade head # Rollback one migration alembic downgrade -1 ``` ### Project structure ``` content-to-kb-automator/ ├── backend/ # FastAPI application │ ├── main.py # App entry point, middleware, routers │ ├── config.py # pydantic-settings configuration │ ├── database.py # SQLAlchemy async engine + session │ ├── models.py # 7-entity ORM models │ ├── schemas.py # Pydantic request/response schemas │ ├── routers/ # API route handlers │ │ ├── health.py # /health (DB check) │ │ ├── creators.py # /api/v1/creators │ │ └── videos.py # /api/v1/videos │ └── requirements.txt # Python dependencies ├── whisper/ # Desktop transcription script │ ├── transcribe.py # Whisper CLI tool │ ├── requirements.txt # Whisper + ffmpeg deps │ └── README.md # Transcription documentation ├── docker/ # Dockerfiles │ ├── Dockerfile.api # FastAPI + Celery image │ ├── Dockerfile.web # React + nginx image │ └── nginx.conf # nginx reverse proxy config ├── alembic/ # Database migrations │ ├── env.py # Migration environment │ └── versions/ # Migration scripts ├── config/ # Configuration files │ └── canonical_tags.yaml # 6 topic categories + genre taxonomy ├── prompts/ # LLM prompt templates (editable) ├── frontend/ # React web UI (placeholder) ├── tests/ # Test fixtures and test suites │ └── fixtures/ # Sample data for testing ├── docker-compose.yml # Full stack definition ├── alembic.ini # Alembic configuration ├── .env.example # Environment variable template └── chrysopedia-spec.md # Full project specification ``` --- ## API Endpoints | Method | Path | Description | |--------|-----------------------------|---------------------------------| | GET | `/health` | Health check with DB connectivity | | GET | `/api/v1/health` | Lightweight health (no DB) | | GET | `/api/v1/creators` | List all creators | | GET | `/api/v1/creators/{slug}` | Get creator by slug | | GET | `/api/v1/videos` | List all source videos | --- ## XPLTD Conventions This project follows XPLTD infrastructure conventions: - **Docker project name:** `xpltd_chrysopedia` - **Bind mounts:** persistent data stored under `/vmPool/r/services/` - **Network:** dedicated bridge `chrysopedia` (`172.32.0.0/24`) - **PostgreSQL host port:** `5433` (avoids conflict with system PostgreSQL on `5432`) --- ## Deployment (ub01) The production stack runs on **ub01.a.xpltd.co**: ```bash # Clone (first time only — requires SSH agent forwarding) ssh -A ub01 cd /vmPool/r/repos/xpltdco/chrysopedia git clone git@github.com:xpltdco/chrysopedia.git . # Create .env from template cp .env.example .env # Edit .env with production secrets # Build and start docker compose build docker compose up -d # Run migrations docker exec chrysopedia-api alembic upgrade head # Pull embedding model (first time only) docker exec chrysopedia-ollama ollama pull nomic-embed-text ``` ### Service URLs | Service | URL | |---------|-----| | Web UI | http://ub01:8096 | | API Health | http://ub01:8096/health | | PostgreSQL | ub01:5433 | | Compose config | `/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml` | ### Update Workflow ```bash ssh -A ub01 cd /vmPool/r/repos/xpltdco/chrysopedia git pull docker compose build && docker compose up -d ```