jlightner 9fdef3b720 docs: Added CLAUDE.md redirect to ub01 canonical path, updated README with deployment section

2026-03-30 01:28:26 +00:00

13 KiB

Raw Blame History

Chrysopedia

From chrysopoeia (alchemical transmutation of base material into gold) + encyclopedia. Chrysopedia transmutes raw video content into refined, searchable production knowledge.

A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│  Desktop (GPU workstation)                                       │
│  ┌──────────────┐                                                │
│  │ whisper/      │  Transcribes video → JSON (Whisper large-v3)  │
│  │ transcribe.py │  Runs locally with CUDA, outputs to /data     │
│  └──────┬───────┘                                                │
│         │ JSON transcripts                                       │
└─────────┼────────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────────────────────────────────────────────────────┐
│  Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01)         │
│                                                                  │
│  ┌────────────────┐  ┌────────────────┐  ┌──────────────────┐   │
│  │ chrysopedia-db │  │chrysopedia-redis│  │ chrysopedia-api  │   │
│  │ PostgreSQL 16  │  │  Redis 7       │  │ FastAPI + Uvicorn│   │
│  │ :5433→5432     │  │                │  │ :8000            │   │
│  └────────────────┘  └────────────────┘  └────────┬─────────┘   │
│                                                    │             │
│  ┌──────────────────┐  ┌──────────────────────┐    │             │
│  │ chrysopedia-web  │  │ chrysopedia-worker   │    │             │
│  │ React + nginx    │  │ Celery (LLM pipeline)│    │             │
│  │ :3000→80         │  │                      │    │             │
│  └──────────────────┘  └──────────────────────┘    │             │
│                                                    │             │
│  Network: chrysopedia (172.24.0.0/24)              │             │
└──────────────────────────────────────────────────────────────────┘

Services

Service	Image / Build	Port	Purpose
`chrysopedia-db`	`postgres:16-alpine`	`5433 → 5432`	Primary data store (7 entity schema)
`chrysopedia-redis`	`redis:7-alpine`	—	Celery broker / cache
`chrysopedia-api`	`docker/Dockerfile.api`	`8000`	FastAPI REST API
`chrysopedia-worker`	`docker/Dockerfile.api`	—	Celery worker for LLM pipeline stages 2-5
`chrysopedia-web`	`docker/Dockerfile.web`	`3000 → 80`	React frontend (nginx)

Data Model (7 entities)

Creator — artists/producers whose content is indexed
SourceVideo — original video files processed by the pipeline
TranscriptSegment — timestamped text segments from Whisper
KeyMoment — discrete insights extracted by LLM analysis
TechniquePage — synthesized knowledge pages (primary output)
RelatedTechniqueLink — cross-references between technique pages
Tag — hierarchical topic/genre taxonomy

Prerequisites

Docker ≥ 24.0 and Docker Compose ≥ 2.20
Python 3.10+ (for the Whisper transcription script)
ffmpeg (for audio extraction)
NVIDIA GPU + CUDA (recommended for Whisper; CPU fallback available)

Quick Start

1. Clone and configure

git clone <repository-url>
cd content-to-kb-automator

# Create environment file from template
cp .env.example .env
# Edit .env with your actual values (see Environment Variables below)

2. Start the Docker Compose stack

docker compose up -d

This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI.

3. Run database migrations

# From inside the API container:
docker compose exec chrysopedia-api alembic upgrade head

# Or locally (requires Python venv with backend deps):
alembic upgrade head

4. Verify the stack

# Health check (with DB connectivity)
curl http://localhost:8000/health

# API health (lightweight, no DB)
curl http://localhost:8000/api/v1/health

# Docker Compose status
docker compose ps

5. Transcribe videos (desktop)

cd whisper
pip install -r requirements.txt

# Single file
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts

# Batch (all videos in a directory)
python transcribe.py --input ./videos/ --output-dir ./transcripts

See whisper/README.md for full transcription documentation.

Environment Variables

Create .env from .env.example. All variables have sensible defaults for local development.

Database

Variable	Default	Description
`POSTGRES_USER`	`chrysopedia`	PostgreSQL username
`POSTGRES_PASSWORD`	`changeme`	PostgreSQL password
`POSTGRES_DB`	`chrysopedia`	Database name
`DATABASE_URL`	(composed)	Full async connection string

Services

Variable	Default	Description
`REDIS_URL`	`redis://chrysopedia-redis:6379/0`	Redis connection string

LLM Configuration

Variable	Default	Description
`LLM_API_URL`	`https://friend-openwebui.example.com/api`	Primary LLM endpoint (OpenAI-compatible)
`LLM_API_KEY`	`sk-changeme`	API key for primary LLM
`LLM_MODEL`	`qwen2.5-72b`	Primary model name
`LLM_FALLBACK_URL`	`http://localhost:11434/v1`	Fallback LLM endpoint (Ollama)
`LLM_FALLBACK_MODEL`	`qwen2.5:14b-q8_0`	Fallback model name

Embedding / Vector

Variable	Default	Description
`EMBEDDING_API_URL`	`http://localhost:11434/v1`	Embedding endpoint
`EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model name
`QDRANT_URL`	`http://qdrant:6333`	Qdrant vector DB URL
`QDRANT_COLLECTION`	`chrysopedia`	Qdrant collection name

Application

Variable	Default	Description
`APP_ENV`	`production`	Environment (`development` / `production`)
`APP_LOG_LEVEL`	`info`	Log level
`APP_SECRET_KEY`	`changeme-generate-a-real-secret`	Application secret key
`TRANSCRIPT_STORAGE_PATH`	`/data/transcripts`	Transcript JSON storage path
`VIDEO_METADATA_PATH`	`/data/video_meta`	Video metadata storage path
`REVIEW_MODE`	`true`	Enable human review workflow

Development Workflow

Local development (without Docker)

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install backend dependencies
pip install -r backend/requirements.txt

# Start PostgreSQL and Redis (via Docker)
docker compose up -d chrysopedia-db chrysopedia-redis

# Run migrations
alembic upgrade head

# Start the API server with hot-reload
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000

Database migrations

# Create a new migration after model changes
alembic revision --autogenerate -m "describe_change"

# Apply all pending migrations
alembic upgrade head

# Rollback one migration
alembic downgrade -1

Project structure

content-to-kb-automator/
├── backend/               # FastAPI application
│   ├── main.py            # App entry point, middleware, routers
│   ├── config.py          # pydantic-settings configuration
│   ├── database.py        # SQLAlchemy async engine + session
│   ├── models.py          # 7-entity ORM models
│   ├── schemas.py         # Pydantic request/response schemas
│   ├── routers/           # API route handlers
│   │   ├── health.py      # /health (DB check)
│   │   ├── creators.py    # /api/v1/creators
│   │   └── videos.py      # /api/v1/videos
│   └── requirements.txt   # Python dependencies
├── whisper/               # Desktop transcription script
│   ├── transcribe.py      # Whisper CLI tool
│   ├── requirements.txt   # Whisper + ffmpeg deps
│   └── README.md          # Transcription documentation
├── docker/                # Dockerfiles
│   ├── Dockerfile.api     # FastAPI + Celery image
│   ├── Dockerfile.web     # React + nginx image
│   └── nginx.conf         # nginx reverse proxy config
├── alembic/               # Database migrations
│   ├── env.py             # Migration environment
│   └── versions/          # Migration scripts
├── config/                # Configuration files
│   └── canonical_tags.yaml # 6 topic categories + genre taxonomy
├── prompts/               # LLM prompt templates (editable)
├── frontend/              # React web UI (placeholder)
├── tests/                 # Test fixtures and test suites
│   └── fixtures/          # Sample data for testing
├── docker-compose.yml     # Full stack definition
├── alembic.ini            # Alembic configuration
├── .env.example           # Environment variable template
└── chrysopedia-spec.md    # Full project specification

API Endpoints

Method	Path	Description
GET	`/health`	Health check with DB connectivity
GET	`/api/v1/health`	Lightweight health (no DB)
GET	`/api/v1/creators`	List all creators
GET	`/api/v1/creators/{slug}`	Get creator by slug
GET	`/api/v1/videos`	List all source videos

XPLTD Conventions

This project follows XPLTD infrastructure conventions:

Docker project name: xpltd_chrysopedia
Bind mounts: persistent data stored under /vmPool/r/services/
Network: dedicated bridge chrysopedia (172.32.0.0/24)
PostgreSQL host port: 5433 (avoids conflict with system PostgreSQL on 5432)

Deployment (ub01)

The production stack runs on ub01.a.xpltd.co:

# Clone (first time only — requires SSH agent forwarding)
ssh -A ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
git clone git@github.com:xpltdco/chrysopedia.git .

# Create .env from template
cp .env.example .env
# Edit .env with production secrets

# Build and start
docker compose build
docker compose up -d

# Run migrations
docker exec chrysopedia-api alembic upgrade head

# Pull embedding model (first time only)
docker exec chrysopedia-ollama ollama pull nomic-embed-text

Service URLs

Service	URL
Web UI	http://ub01:8096
API Health	http://ub01:8096/health
PostgreSQL	ub01:5433
Compose config	`/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml`

Update Workflow

ssh -A ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
git pull
docker compose build && docker compose up -d

13 KiB Raw Blame History

Chrysopedia

Architecture

Services

Data Model (7 entities)

Prerequisites

Quick Start

1. Clone and configure

2. Start the Docker Compose stack

3. Run database migrations

4. Verify the stack

5. Transcribe videos (desktop)

Environment Variables

Database

Services

LLM Configuration

Embedding / Vector

Application

Development Workflow

Local development (without Docker)

Database migrations

Project structure

API Endpoints

XPLTD Conventions

Deployment (ub01)

Service URLs

Update Workflow

13 KiB

Raw Blame History