Knowledge base for music production techniques — extracted from video content via LLM pipeline
Find a file
2026-03-30 00:29:45 +00:00
.gsd chore: auto-commit after complete-milestone 2026-03-30 00:29:45 +00:00
alembic fix: Created SQLAlchemy models for all 7 entities, Alembic async migrat… 2026-03-29 21:48:36 +00:00
backend test: Added 18 integration tests for search and public API endpoints (t… 2026-03-30 00:01:32 +00:00
config feat: Created full Docker Compose project (xpltd_chrysopedia) with Post… 2026-03-29 21:42:56 +00:00
docker feat: Created full Docker Compose project (xpltd_chrysopedia) with Post… 2026-03-29 21:42:56 +00:00
frontend feat: Built CreatorsBrowse (randomized default sort, genre filter, name… 2026-03-30 00:13:11 +00:00
prompts feat: Created 4 prompt templates and implemented 5 Celery tasks (stages… 2026-03-29 22:36:06 +00:00
tests/fixtures docs: Created comprehensive README.md with architecture diagram, setup… 2026-03-29 22:00:41 +00:00
whisper test: Created desktop Whisper transcription script with single-file/bat… 2026-03-29 21:57:42 +00:00
.env.example feat: Created full Docker Compose project (xpltd_chrysopedia) with Post… 2026-03-29 21:42:56 +00:00
.gitignore feat: Created full Docker Compose project (xpltd_chrysopedia) with Post… 2026-03-29 21:42:56 +00:00
alembic.ini fix: Created SQLAlchemy models for all 7 entities, Alembic async migrat… 2026-03-29 21:48:36 +00:00
chrysopedia-spec.md Initial commit: chrysopedia spec 2026-03-29 21:36:04 +00:00
docker-compose.yml fix: Created SQLAlchemy models for all 7 entities, Alembic async migrat… 2026-03-29 21:48:36 +00:00
README.md docs: Created comprehensive README.md with architecture diagram, setup… 2026-03-29 22:00:41 +00:00

Chrysopedia

From chrysopoeia (alchemical transmutation of base material into gold) + encyclopedia. Chrysopedia transmutes raw video content into refined, searchable production knowledge.

A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│  Desktop (GPU workstation)                                       │
│  ┌──────────────┐                                                │
│  │ whisper/      │  Transcribes video → JSON (Whisper large-v3)  │
│  │ transcribe.py │  Runs locally with CUDA, outputs to /data     │
│  └──────┬───────┘                                                │
│         │ JSON transcripts                                       │
└─────────┼────────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────────────────────────────────────────────────────┐
│  Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01)         │
│                                                                  │
│  ┌────────────────┐  ┌────────────────┐  ┌──────────────────┐   │
│  │ chrysopedia-db │  │chrysopedia-redis│  │ chrysopedia-api  │   │
│  │ PostgreSQL 16  │  │  Redis 7       │  │ FastAPI + Uvicorn│   │
│  │ :5433→5432     │  │                │  │ :8000            │   │
│  └────────────────┘  └────────────────┘  └────────┬─────────┘   │
│                                                    │             │
│  ┌──────────────────┐  ┌──────────────────────┐    │             │
│  │ chrysopedia-web  │  │ chrysopedia-worker   │    │             │
│  │ React + nginx    │  │ Celery (LLM pipeline)│    │             │
│  │ :3000→80         │  │                      │    │             │
│  └──────────────────┘  └──────────────────────┘    │             │
│                                                    │             │
│  Network: chrysopedia (172.24.0.0/24)              │             │
└──────────────────────────────────────────────────────────────────┘

Services

Service Image / Build Port Purpose
chrysopedia-db postgres:16-alpine 5433 → 5432 Primary data store (7 entity schema)
chrysopedia-redis redis:7-alpine Celery broker / cache
chrysopedia-api docker/Dockerfile.api 8000 FastAPI REST API
chrysopedia-worker docker/Dockerfile.api Celery worker for LLM pipeline stages 2-5
chrysopedia-web docker/Dockerfile.web 3000 → 80 React frontend (nginx)

Data Model (7 entities)

  • Creator — artists/producers whose content is indexed
  • SourceVideo — original video files processed by the pipeline
  • TranscriptSegment — timestamped text segments from Whisper
  • KeyMoment — discrete insights extracted by LLM analysis
  • TechniquePage — synthesized knowledge pages (primary output)
  • RelatedTechniqueLink — cross-references between technique pages
  • Tag — hierarchical topic/genre taxonomy

Prerequisites

  • Docker ≥ 24.0 and Docker Compose ≥ 2.20
  • Python 3.10+ (for the Whisper transcription script)
  • ffmpeg (for audio extraction)
  • NVIDIA GPU + CUDA (recommended for Whisper; CPU fallback available)

Quick Start

1. Clone and configure

git clone <repository-url>
cd content-to-kb-automator

# Create environment file from template
cp .env.example .env
# Edit .env with your actual values (see Environment Variables below)

2. Start the Docker Compose stack

docker compose up -d

This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI.

3. Run database migrations

# From inside the API container:
docker compose exec chrysopedia-api alembic upgrade head

# Or locally (requires Python venv with backend deps):
alembic upgrade head

4. Verify the stack

# Health check (with DB connectivity)
curl http://localhost:8000/health

# API health (lightweight, no DB)
curl http://localhost:8000/api/v1/health

# Docker Compose status
docker compose ps

5. Transcribe videos (desktop)

cd whisper
pip install -r requirements.txt

# Single file
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts

# Batch (all videos in a directory)
python transcribe.py --input ./videos/ --output-dir ./transcripts

See whisper/README.md for full transcription documentation.


Environment Variables

Create .env from .env.example. All variables have sensible defaults for local development.

Database

Variable Default Description
POSTGRES_USER chrysopedia PostgreSQL username
POSTGRES_PASSWORD changeme PostgreSQL password
POSTGRES_DB chrysopedia Database name
DATABASE_URL (composed) Full async connection string

Services

Variable Default Description
REDIS_URL redis://chrysopedia-redis:6379/0 Redis connection string

LLM Configuration

Variable Default Description
LLM_API_URL https://friend-openwebui.example.com/api Primary LLM endpoint (OpenAI-compatible)
LLM_API_KEY sk-changeme API key for primary LLM
LLM_MODEL qwen2.5-72b Primary model name
LLM_FALLBACK_URL http://localhost:11434/v1 Fallback LLM endpoint (Ollama)
LLM_FALLBACK_MODEL qwen2.5:14b-q8_0 Fallback model name

Embedding / Vector

Variable Default Description
EMBEDDING_API_URL http://localhost:11434/v1 Embedding endpoint
EMBEDDING_MODEL nomic-embed-text Embedding model name
QDRANT_URL http://qdrant:6333 Qdrant vector DB URL
QDRANT_COLLECTION chrysopedia Qdrant collection name

Application

Variable Default Description
APP_ENV production Environment (development / production)
APP_LOG_LEVEL info Log level
APP_SECRET_KEY changeme-generate-a-real-secret Application secret key
TRANSCRIPT_STORAGE_PATH /data/transcripts Transcript JSON storage path
VIDEO_METADATA_PATH /data/video_meta Video metadata storage path
REVIEW_MODE true Enable human review workflow

Development Workflow

Local development (without Docker)

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install backend dependencies
pip install -r backend/requirements.txt

# Start PostgreSQL and Redis (via Docker)
docker compose up -d chrysopedia-db chrysopedia-redis

# Run migrations
alembic upgrade head

# Start the API server with hot-reload
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000

Database migrations

# Create a new migration after model changes
alembic revision --autogenerate -m "describe_change"

# Apply all pending migrations
alembic upgrade head

# Rollback one migration
alembic downgrade -1

Project structure

content-to-kb-automator/
├── backend/               # FastAPI application
│   ├── main.py            # App entry point, middleware, routers
│   ├── config.py          # pydantic-settings configuration
│   ├── database.py        # SQLAlchemy async engine + session
│   ├── models.py          # 7-entity ORM models
│   ├── schemas.py         # Pydantic request/response schemas
│   ├── routers/           # API route handlers
│   │   ├── health.py      # /health (DB check)
│   │   ├── creators.py    # /api/v1/creators
│   │   └── videos.py      # /api/v1/videos
│   └── requirements.txt   # Python dependencies
├── whisper/               # Desktop transcription script
│   ├── transcribe.py      # Whisper CLI tool
│   ├── requirements.txt   # Whisper + ffmpeg deps
│   └── README.md          # Transcription documentation
├── docker/                # Dockerfiles
│   ├── Dockerfile.api     # FastAPI + Celery image
│   ├── Dockerfile.web     # React + nginx image
│   └── nginx.conf         # nginx reverse proxy config
├── alembic/               # Database migrations
│   ├── env.py             # Migration environment
│   └── versions/          # Migration scripts
├── config/                # Configuration files
│   └── canonical_tags.yaml # 6 topic categories + genre taxonomy
├── prompts/               # LLM prompt templates (editable)
├── frontend/              # React web UI (placeholder)
├── tests/                 # Test fixtures and test suites
│   └── fixtures/          # Sample data for testing
├── docker-compose.yml     # Full stack definition
├── alembic.ini            # Alembic configuration
├── .env.example           # Environment variable template
└── chrysopedia-spec.md    # Full project specification

API Endpoints

Method Path Description
GET /health Health check with DB connectivity
GET /api/v1/health Lightweight health (no DB)
GET /api/v1/creators List all creators
GET /api/v1/creators/{slug} Get creator by slug
GET /api/v1/videos List all source videos

XPLTD Conventions

This project follows XPLTD infrastructure conventions:

  • Docker project name: xpltd_chrysopedia
  • Bind mounts: persistent data stored under /vmPool/r/services/
  • Network: dedicated bridge chrysopedia (172.24.0.0/24)
  • PostgreSQL host port: 5433 (avoids conflict with system PostgreSQL on 5432)