322 lines
13 KiB
Markdown
322 lines
13 KiB
Markdown
# Chrysopedia
|
|
|
|
> From *chrysopoeia* (alchemical transmutation of base material into gold) + *encyclopedia*.
|
|
> Chrysopedia transmutes raw video content into refined, searchable production knowledge.
|
|
|
|
A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval.
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Desktop (GPU workstation) │
|
|
│ ┌──────────────┐ │
|
|
│ │ whisper/ │ Transcribes video → JSON (Whisper large-v3) │
|
|
│ │ transcribe.py │ Runs locally with CUDA, outputs to /data │
|
|
│ └──────┬───────┘ │
|
|
│ │ JSON transcripts │
|
|
└─────────┼────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01) │
|
|
│ │
|
|
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
|
|
│ │ chrysopedia-db │ │chrysopedia-redis│ │ chrysopedia-api │ │
|
|
│ │ PostgreSQL 16 │ │ Redis 7 │ │ FastAPI + Uvicorn│ │
|
|
│ │ :5433→5432 │ │ │ │ :8000 │ │
|
|
│ └────────────────┘ └────────────────┘ └────────┬─────────┘ │
|
|
│ │ │
|
|
│ ┌──────────────────┐ ┌──────────────────────┐ │ │
|
|
│ │ chrysopedia-web │ │ chrysopedia-worker │ │ │
|
|
│ │ React + nginx │ │ Celery (LLM pipeline)│ │ │
|
|
│ │ :3000→80 │ │ │ │ │
|
|
│ └──────────────────┘ └──────────────────────┘ │ │
|
|
│ │ │
|
|
│ Network: chrysopedia (172.24.0.0/24) │ │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Services
|
|
|
|
| Service | Image / Build | Port | Purpose |
|
|
|----------------------|------------------------|---------------|--------------------------------------------|
|
|
| `chrysopedia-db` | `postgres:16-alpine` | `5433 → 5432` | Primary data store (7 entity schema) |
|
|
| `chrysopedia-redis` | `redis:7-alpine` | — | Celery broker / cache |
|
|
| `chrysopedia-api` | `docker/Dockerfile.api`| `8000` | FastAPI REST API |
|
|
| `chrysopedia-worker` | `docker/Dockerfile.api`| — | Celery worker for LLM pipeline stages 2-5 |
|
|
| `chrysopedia-web` | `docker/Dockerfile.web`| `3000 → 80` | React frontend (nginx) |
|
|
|
|
### Data Model (7 entities)
|
|
|
|
- **Creator** — artists/producers whose content is indexed
|
|
- **SourceVideo** — original video files processed by the pipeline
|
|
- **TranscriptSegment** — timestamped text segments from Whisper
|
|
- **KeyMoment** — discrete insights extracted by LLM analysis
|
|
- **TechniquePage** — synthesized knowledge pages (primary output)
|
|
- **RelatedTechniqueLink** — cross-references between technique pages
|
|
- **Tag** — hierarchical topic/genre taxonomy
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- **Docker** ≥ 24.0 and **Docker Compose** ≥ 2.20
|
|
- **Python 3.10+** (for the Whisper transcription script)
|
|
- **ffmpeg** (for audio extraction)
|
|
- **NVIDIA GPU + CUDA** (recommended for Whisper; CPU fallback available)
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### 1. Clone and configure
|
|
|
|
```bash
|
|
git clone <repository-url>
|
|
cd content-to-kb-automator
|
|
|
|
# Create environment file from template
|
|
cp .env.example .env
|
|
# Edit .env with your actual values (see Environment Variables below)
|
|
```
|
|
|
|
### 2. Start the Docker Compose stack
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI.
|
|
|
|
### 3. Run database migrations
|
|
|
|
```bash
|
|
# From inside the API container:
|
|
docker compose exec chrysopedia-api alembic upgrade head
|
|
|
|
# Or locally (requires Python venv with backend deps):
|
|
alembic upgrade head
|
|
```
|
|
|
|
### 4. Verify the stack
|
|
|
|
```bash
|
|
# Health check (with DB connectivity)
|
|
curl http://localhost:8000/health
|
|
|
|
# API health (lightweight, no DB)
|
|
curl http://localhost:8000/api/v1/health
|
|
|
|
# Docker Compose status
|
|
docker compose ps
|
|
```
|
|
|
|
### 5. Transcribe videos (desktop)
|
|
|
|
```bash
|
|
cd whisper
|
|
pip install -r requirements.txt
|
|
|
|
# Single file
|
|
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
|
|
|
|
# Batch (all videos in a directory)
|
|
python transcribe.py --input ./videos/ --output-dir ./transcripts
|
|
```
|
|
|
|
See [`whisper/README.md`](whisper/README.md) for full transcription documentation.
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
Create `.env` from `.env.example`. All variables have sensible defaults for local development.
|
|
|
|
### Database
|
|
|
|
| Variable | Default | Description |
|
|
|--------------------|----------------|---------------------------------|
|
|
| `POSTGRES_USER` | `chrysopedia` | PostgreSQL username |
|
|
| `POSTGRES_PASSWORD`| `changeme` | PostgreSQL password |
|
|
| `POSTGRES_DB` | `chrysopedia` | Database name |
|
|
| `DATABASE_URL` | *(composed)* | Full async connection string |
|
|
|
|
### Services
|
|
|
|
| Variable | Default | Description |
|
|
|-----------------|------------------------------------|--------------------------|
|
|
| `REDIS_URL` | `redis://chrysopedia-redis:6379/0` | Redis connection string |
|
|
|
|
### LLM Configuration
|
|
|
|
| Variable | Default | Description |
|
|
|---------------------|-------------------------------------------|------------------------------------|
|
|
| `LLM_API_URL` | `https://friend-openwebui.example.com/api`| Primary LLM endpoint (OpenAI-compatible) |
|
|
| `LLM_API_KEY` | `sk-changeme` | API key for primary LLM |
|
|
| `LLM_MODEL` | `qwen2.5-72b` | Primary model name |
|
|
| `LLM_FALLBACK_URL` | `http://localhost:11434/v1` | Fallback LLM endpoint (Ollama) |
|
|
| `LLM_FALLBACK_MODEL`| `qwen2.5:14b-q8_0` | Fallback model name |
|
|
|
|
### Embedding / Vector
|
|
|
|
| Variable | Default | Description |
|
|
|-----------------------|-------------------------------|--------------------------|
|
|
| `EMBEDDING_API_URL` | `http://localhost:11434/v1` | Embedding endpoint |
|
|
| `EMBEDDING_MODEL` | `nomic-embed-text` | Embedding model name |
|
|
| `QDRANT_URL` | `http://qdrant:6333` | Qdrant vector DB URL |
|
|
| `QDRANT_COLLECTION` | `chrysopedia` | Qdrant collection name |
|
|
|
|
### Application
|
|
|
|
| Variable | Default | Description |
|
|
|--------------------------|----------------------------------|--------------------------------|
|
|
| `APP_ENV` | `production` | Environment (`development` / `production`) |
|
|
| `APP_LOG_LEVEL` | `info` | Log level |
|
|
| `APP_SECRET_KEY` | `changeme-generate-a-real-secret`| Application secret key |
|
|
| `TRANSCRIPT_STORAGE_PATH`| `/data/transcripts` | Transcript JSON storage path |
|
|
| `VIDEO_METADATA_PATH` | `/data/video_meta` | Video metadata storage path |
|
|
| `REVIEW_MODE` | `true` | Enable human review workflow |
|
|
|
|
---
|
|
|
|
## Development Workflow
|
|
|
|
### Local development (without Docker)
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
|
|
# Install backend dependencies
|
|
pip install -r backend/requirements.txt
|
|
|
|
# Start PostgreSQL and Redis (via Docker)
|
|
docker compose up -d chrysopedia-db chrysopedia-redis
|
|
|
|
# Run migrations
|
|
alembic upgrade head
|
|
|
|
# Start the API server with hot-reload
|
|
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
### Database migrations
|
|
|
|
```bash
|
|
# Create a new migration after model changes
|
|
alembic revision --autogenerate -m "describe_change"
|
|
|
|
# Apply all pending migrations
|
|
alembic upgrade head
|
|
|
|
# Rollback one migration
|
|
alembic downgrade -1
|
|
```
|
|
|
|
### Project structure
|
|
|
|
```
|
|
content-to-kb-automator/
|
|
├── backend/ # FastAPI application
|
|
│ ├── main.py # App entry point, middleware, routers
|
|
│ ├── config.py # pydantic-settings configuration
|
|
│ ├── database.py # SQLAlchemy async engine + session
|
|
│ ├── models.py # 7-entity ORM models
|
|
│ ├── schemas.py # Pydantic request/response schemas
|
|
│ ├── routers/ # API route handlers
|
|
│ │ ├── health.py # /health (DB check)
|
|
│ │ ├── creators.py # /api/v1/creators
|
|
│ │ └── videos.py # /api/v1/videos
|
|
│ └── requirements.txt # Python dependencies
|
|
├── whisper/ # Desktop transcription script
|
|
│ ├── transcribe.py # Whisper CLI tool
|
|
│ ├── requirements.txt # Whisper + ffmpeg deps
|
|
│ └── README.md # Transcription documentation
|
|
├── docker/ # Dockerfiles
|
|
│ ├── Dockerfile.api # FastAPI + Celery image
|
|
│ ├── Dockerfile.web # React + nginx image
|
|
│ └── nginx.conf # nginx reverse proxy config
|
|
├── alembic/ # Database migrations
|
|
│ ├── env.py # Migration environment
|
|
│ └── versions/ # Migration scripts
|
|
├── config/ # Configuration files
|
|
│ └── canonical_tags.yaml # 6 topic categories + genre taxonomy
|
|
├── prompts/ # LLM prompt templates (editable)
|
|
├── frontend/ # React web UI (placeholder)
|
|
├── tests/ # Test fixtures and test suites
|
|
│ └── fixtures/ # Sample data for testing
|
|
├── docker-compose.yml # Full stack definition
|
|
├── alembic.ini # Alembic configuration
|
|
├── .env.example # Environment variable template
|
|
└── chrysopedia-spec.md # Full project specification
|
|
```
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Path | Description |
|
|
|--------|-----------------------------|---------------------------------|
|
|
| GET | `/health` | Health check with DB connectivity |
|
|
| GET | `/api/v1/health` | Lightweight health (no DB) |
|
|
| GET | `/api/v1/creators` | List all creators |
|
|
| GET | `/api/v1/creators/{slug}` | Get creator by slug |
|
|
| GET | `/api/v1/videos` | List all source videos |
|
|
|
|
---
|
|
|
|
## XPLTD Conventions
|
|
|
|
This project follows XPLTD infrastructure conventions:
|
|
|
|
- **Docker project name:** `xpltd_chrysopedia`
|
|
- **Bind mounts:** persistent data stored under `/vmPool/r/services/`
|
|
- **Network:** dedicated bridge `chrysopedia` (`172.32.0.0/24`)
|
|
- **PostgreSQL host port:** `5433` (avoids conflict with system PostgreSQL on `5432`)
|
|
|
|
---
|
|
|
|
## Deployment (ub01)
|
|
|
|
The production stack runs on **ub01.a.xpltd.co**:
|
|
|
|
```bash
|
|
# Clone (first time only — requires SSH agent forwarding)
|
|
ssh -A ub01
|
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
|
git clone git@github.com:xpltdco/chrysopedia.git .
|
|
|
|
# Create .env from template
|
|
cp .env.example .env
|
|
# Edit .env with production secrets
|
|
|
|
# Build and start
|
|
docker compose build
|
|
docker compose up -d
|
|
|
|
# Run migrations
|
|
docker exec chrysopedia-api alembic upgrade head
|
|
|
|
# Pull embedding model (first time only)
|
|
docker exec chrysopedia-ollama ollama pull nomic-embed-text
|
|
```
|
|
|
|
### Service URLs
|
|
| Service | URL |
|
|
|---------|-----|
|
|
| Web UI | http://ub01:8096 |
|
|
| API Health | http://ub01:8096/health |
|
|
| PostgreSQL | ub01:5433 |
|
|
| Compose config | `/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml` |
|
|
|
|
### Update Workflow
|
|
```bash
|
|
ssh -A ub01
|
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
|
git pull
|
|
docker compose build && docker compose up -d
|
|
```
|