docs: Rewrite README with information flow stages and updated architecture
Replaces outdated README with streamlined version covering all 8 services, complete API endpoints, 6-stage information flow diagram, and current project structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e8bc3fd9a2
commit
64ff263da2
1 changed files with 256 additions and 258 deletions
514
README.md
514
README.md
|
|
@ -3,320 +3,318 @@
|
||||||
> From *chrysopoeia* (alchemical transmutation of base material into gold) + *encyclopedia*.
|
> From *chrysopoeia* (alchemical transmutation of base material into gold) + *encyclopedia*.
|
||||||
> Chrysopedia transmutes raw video content into refined, searchable production knowledge.
|
> Chrysopedia transmutes raw video content into refined, searchable production knowledge.
|
||||||
|
|
||||||
A self-hosted knowledge extraction and retrieval system for electronic music production content. Transcribes video libraries with Whisper, extracts key moments and techniques with LLM analysis, and serves a search-first web UI for mid-session retrieval.
|
A self-hosted knowledge extraction system for electronic music production content. Video libraries are transcribed with Whisper, analyzed through a multi-stage LLM pipeline, curated via an admin review workflow, and served through a search-first web UI designed for mid-session retrieval.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Information Flow
|
||||||
|
|
||||||
|
Content moves through six stages from raw video to searchable knowledge:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 1 · Transcription [Desktop / GPU] │
|
||||||
|
│ │
|
||||||
|
│ Video files → Whisper large-v3 (CUDA) → JSON transcripts │
|
||||||
|
│ Output: timestamped segments with speaker text │
|
||||||
|
└────────────────────────────────┬────────────────────────────────────────┘
|
||||||
|
│ JSON files (manual or folder watcher)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 2 · Ingestion [API + Watcher] │
|
||||||
|
│ │
|
||||||
|
│ POST /api/v1/ingest ← watcher auto-submits from /watch folder │
|
||||||
|
│ • Validate JSON structure │
|
||||||
|
│ • Compute content hash (SHA-256) for deduplication │
|
||||||
|
│ • Find-or-create Creator from folder name │
|
||||||
|
│ • Upsert SourceVideo (exact filename → content hash → fuzzy match) │
|
||||||
|
│ • Bulk-insert TranscriptSegment rows │
|
||||||
|
│ • Dispatch pipeline to Celery worker │
|
||||||
|
└────────────────────────────────┬────────────────────────────────────────┘
|
||||||
|
│ Celery task: run_pipeline(video_id)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 3 · LLM Extraction Pipeline [Celery Worker] │
|
||||||
|
│ │
|
||||||
|
│ Four sequential LLM stages, each with its own prompt template: │
|
||||||
|
│ │
|
||||||
|
│ 3a. Segmentation — Split transcript into semantic topic boundaries │
|
||||||
|
│ Model: chat (fast) Prompt: stage2_segmentation.txt │
|
||||||
|
│ │
|
||||||
|
│ 3b. Extraction — Identify key moments (title, summary, timestamps) │
|
||||||
|
│ Model: reasoning (think) Prompt: stage3_extraction.txt │
|
||||||
|
│ │
|
||||||
|
│ 3c. Classification — Assign content types + extract plugin names │
|
||||||
|
│ Model: chat (fast) Prompt: stage4_classification.txt │
|
||||||
|
│ │
|
||||||
|
│ 3d. Synthesis — Compose technique pages from approved moments │
|
||||||
|
│ Model: reasoning (think) Prompt: stage5_synthesis.txt │
|
||||||
|
│ │
|
||||||
|
│ Each stage emits PipelineEvent rows (tokens, duration, model, errors) │
|
||||||
|
└────────────────────────────────┬────────────────────────────────────────┘
|
||||||
|
│ KeyMoment rows (review_status: pending)
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 4 · Review & Curation [Admin UI] │
|
||||||
|
│ │
|
||||||
|
│ Admin reviews extracted KeyMoments before they become technique pages: │
|
||||||
|
│ • Approve — moment proceeds to synthesis │
|
||||||
|
│ • Edit — correct title, summary, content type, plugins, then approve │
|
||||||
|
│ • Reject — moment is excluded from knowledge base │
|
||||||
|
│ (When REVIEW_MODE=false, moments auto-approve and skip this stage) │
|
||||||
|
└────────────────────────────────┬────────────────────────────────────────┘
|
||||||
|
│ Approved moments → Stage 3d synthesis
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 5 · Knowledge Base [Web UI] │
|
||||||
|
│ │
|
||||||
|
│ TechniquePages — the primary output: │
|
||||||
|
│ • Structured body sections, signal chains, plugin lists │
|
||||||
|
│ • Linked to source KeyMoments with video timestamps │
|
||||||
|
│ • Cross-referenced via RelatedTechniqueLinks │
|
||||||
|
│ • Versioned (snapshots before each re-synthesis) │
|
||||||
|
│ • Organized by topic taxonomy (6 categories from canonical_tags.yaml) │
|
||||||
|
└────────────────────────────────┬────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ STAGE 6 · Search & Retrieval [Web UI] │
|
||||||
|
│ │
|
||||||
|
│ • Semantic search: query → embedding → Qdrant vector similarity │
|
||||||
|
│ • Keyword fallback: ILIKE search on title/summary (300ms timeout) │
|
||||||
|
│ • Browse by topic hierarchy, creator, or content type │
|
||||||
|
│ • Typeahead search from home page (debounced, top 5 results) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
┌──────────────────────────────────────────────────────────────────┐
|
┌──────────────────────────────────────────────────────────────────────────┐
|
||||||
│ Desktop (GPU workstation) │
|
│ Desktop (GPU workstation — hal0022) │
|
||||||
│ ┌──────────────┐ │
|
│ whisper/transcribe.py → JSON transcripts → copy to /watch folder │
|
||||||
│ │ whisper/ │ Transcribes video → JSON (Whisper large-v3) │
|
└────────────────────────────┬─────────────────────────────────────────────┘
|
||||||
│ │ transcribe.py │ Runs locally with CUDA, outputs to /data │
|
│
|
||||||
│ └──────┬───────┘ │
|
▼
|
||||||
│ │ JSON transcripts │
|
┌──────────────────────────────────────────────────────────────────────────┐
|
||||||
└─────────┼────────────────────────────────────────────────────────┘
|
│ Docker Compose: xpltd_chrysopedia (ub01) │
|
||||||
│
|
│ Network: chrysopedia (172.32.0.0/24) │
|
||||||
▼
|
│ │
|
||||||
┌──────────────────────────────────────────────────────────────────┐
|
│ ┌────────────┐ ┌─────────────┐ ┌───────────────┐ ┌──────────────┐ │
|
||||||
│ Docker Compose (xpltd_chrysopedia) — Server (e.g. ub01) │
|
│ │ PostgreSQL │ │ Redis │ │ Qdrant │ │ Ollama │ │
|
||||||
│ │
|
│ │ :5433 │ │ broker + │ │ vector DB │ │ embeddings │ │
|
||||||
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────────┐ │
|
│ │ 7 entities │ │ cache │ │ semantic │ │ nomic-embed │ │
|
||||||
│ │ chrysopedia-db │ │chrysopedia-redis│ │ chrysopedia-api │ │
|
│ └─────┬───────┘ └──────┬──────┘ └───────┬───────┘ └──────┬───────┘ │
|
||||||
│ │ PostgreSQL 16 │ │ Redis 7 │ │ FastAPI + Uvicorn│ │
|
│ │ │ │ │ │
|
||||||
│ │ :5433→5432 │ │ │ │ :8000 │ │
|
│ ┌─────┴─────────────────┴─────────────────┴─────────────────┴────────┐ │
|
||||||
│ └────────────────┘ └────────────────┘ └────────┬─────────┘ │
|
│ │ FastAPI (API) │ │
|
||||||
│ │ │
|
│ │ Ingest · Pipeline control · Review · Search · CRUD · Reports │ │
|
||||||
│ ┌──────────────────┐ ┌──────────────────────┐ │ │
|
│ └──────────────────────────────┬────────────────────────────────────┘ │
|
||||||
│ │ chrysopedia-web │ │ chrysopedia-worker │ │ │
|
│ │ │
|
||||||
│ │ React + nginx │ │ Celery (LLM pipeline)│ │ │
|
│ ┌──────────────┐ ┌────────────┴───┐ ┌──────────────────────────┐ │
|
||||||
│ │ :3000→80 │ │ │ │ │
|
│ │ Watcher │ │ Celery Worker │ │ Web UI (React) │ │
|
||||||
│ └──────────────────┘ └──────────────────────┘ │ │
|
│ │ /watch → │ │ LLM pipeline │ │ nginx → :8096 │ │
|
||||||
│ │ │
|
│ │ auto-ingest │ │ stages 2-5 │ │ search-first interface │ │
|
||||||
│ Network: chrysopedia (172.24.0.0/24) │ │
|
│ └──────────────┘ └────────────────┘ └──────────────────────────┘ │
|
||||||
└──────────────────────────────────────────────────────────────────┘
|
└──────────────────────────────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
### Services
|
### Services
|
||||||
|
|
||||||
| Service | Image / Build | Port | Purpose |
|
| Service | Image | Port | Purpose |
|
||||||
|----------------------|------------------------|---------------|--------------------------------------------|
|
|---------|-------|------|---------|
|
||||||
| `chrysopedia-db` | `postgres:16-alpine` | `5433 → 5432` | Primary data store (7 entity schema) |
|
| `chrysopedia-db` | `postgres:16-alpine` | `5433 → 5432` | Primary data store |
|
||||||
| `chrysopedia-redis` | `redis:7-alpine` | — | Celery broker / cache |
|
| `chrysopedia-redis` | `redis:7-alpine` | — | Celery broker + feature flag cache |
|
||||||
| `chrysopedia-api` | `docker/Dockerfile.api`| `8000` | FastAPI REST API |
|
| `chrysopedia-qdrant` | `qdrant/qdrant:v1.13.2` | — | Vector DB for semantic search |
|
||||||
| `chrysopedia-worker` | `docker/Dockerfile.api`| — | Celery worker for LLM pipeline stages 2-5 |
|
| `chrysopedia-ollama` | `ollama/ollama` | — | Embedding model server (nomic-embed-text) |
|
||||||
| `chrysopedia-web` | `docker/Dockerfile.web`| `3000 → 80` | React frontend (nginx) |
|
| `chrysopedia-api` | `Dockerfile.api` | `8000` | FastAPI REST API |
|
||||||
|
| `chrysopedia-worker` | `Dockerfile.api` | — | Celery worker (LLM pipeline) |
|
||||||
|
| `chrysopedia-watcher` | `Dockerfile.api` | — | Folder monitor → auto-ingest |
|
||||||
|
| `chrysopedia-web` | `Dockerfile.web` | `8096 → 80` | React frontend (nginx) |
|
||||||
|
|
||||||
### Data Model (7 entities)
|
### Data Model
|
||||||
|
|
||||||
- **Creator** — artists/producers whose content is indexed
|
| Entity | Purpose |
|
||||||
- **SourceVideo** — original video files processed by the pipeline
|
|--------|---------|
|
||||||
- **TranscriptSegment** — timestamped text segments from Whisper
|
| **Creator** | Artists/producers whose content is indexed |
|
||||||
- **KeyMoment** — discrete insights extracted by LLM analysis
|
| **SourceVideo** | Video files processed by the pipeline (with content hash dedup) |
|
||||||
- **TechniquePage** — synthesized knowledge pages (primary output)
|
| **TranscriptSegment** | Timestamped text segments from Whisper |
|
||||||
- **RelatedTechniqueLink** — cross-references between technique pages
|
| **KeyMoment** | Discrete insights extracted by LLM analysis |
|
||||||
- **Tag** — hierarchical topic/genre taxonomy
|
| **TechniquePage** | Synthesized knowledge pages — the primary output |
|
||||||
|
| **TechniquePageVersion** | Snapshots before re-synthesis overwrites |
|
||||||
---
|
| **RelatedTechniqueLink** | Cross-references between technique pages |
|
||||||
|
| **Tag** | Hierarchical topic taxonomy |
|
||||||
## Prerequisites
|
| **ContentReport** | User-submitted content issues |
|
||||||
|
| **PipelineEvent** | Structured pipeline execution logs (tokens, timing, errors) |
|
||||||
- **Docker** ≥ 24.0 and **Docker Compose** ≥ 2.20
|
|
||||||
- **Python 3.10+** (for the Whisper transcription script)
|
|
||||||
- **ffmpeg** (for audio extraction)
|
|
||||||
- **NVIDIA GPU + CUDA** (recommended for Whisper; CPU fallback available)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### 1. Clone and configure
|
### Prerequisites
|
||||||
|
|
||||||
|
- Docker ≥ 24.0 and Docker Compose ≥ 2.20
|
||||||
|
- Python 3.10+ with NVIDIA GPU + CUDA (for Whisper transcription)
|
||||||
|
|
||||||
|
### Setup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone <repository-url>
|
# Clone and configure
|
||||||
cd content-to-kb-automator
|
git clone git@github.com:xpltdco/chrysopedia.git
|
||||||
|
cd chrysopedia
|
||||||
|
cp .env.example .env # edit with real values
|
||||||
|
|
||||||
# Create environment file from template
|
# Start the stack
|
||||||
cp .env.example .env
|
|
||||||
# Edit .env with your actual values (see Environment Variables below)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Start the Docker Compose stack
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
|
# Run database migrations
|
||||||
|
docker exec chrysopedia-api alembic upgrade head
|
||||||
|
|
||||||
|
# Pull the embedding model (first time only)
|
||||||
|
docker exec chrysopedia-ollama ollama pull nomic-embed-text
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
curl http://localhost:8096/health
|
||||||
```
|
```
|
||||||
|
|
||||||
This starts PostgreSQL, Redis, the API server, the Celery worker, and the web UI.
|
### Transcribe videos
|
||||||
|
|
||||||
### 3. Run database migrations
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# From inside the API container:
|
cd whisper && pip install -r requirements.txt
|
||||||
docker compose exec chrysopedia-api alembic upgrade head
|
|
||||||
|
|
||||||
# Or locally (requires Python venv with backend deps):
|
|
||||||
alembic upgrade head
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Verify the stack
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Health check (with DB connectivity)
|
|
||||||
curl http://localhost:8000/health
|
|
||||||
|
|
||||||
# API health (lightweight, no DB)
|
|
||||||
curl http://localhost:8000/api/v1/health
|
|
||||||
|
|
||||||
# Docker Compose status
|
|
||||||
docker compose ps
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Transcribe videos (desktop)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd whisper
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
# Single file
|
# Single file
|
||||||
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
|
python transcribe.py --input "path/to/video.mp4" --output-dir ./transcripts
|
||||||
|
|
||||||
# Batch (all videos in a directory)
|
# Batch
|
||||||
python transcribe.py --input ./videos/ --output-dir ./transcripts
|
python transcribe.py --input ./videos/ --output-dir ./transcripts
|
||||||
```
|
```
|
||||||
|
|
||||||
See [`whisper/README.md`](whisper/README.md) for full transcription documentation.
|
See [`whisper/README.md`](whisper/README.md) for full transcription docs.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
Create `.env` from `.env.example`. All variables have sensible defaults for local development.
|
Copy `.env.example` to `.env`. Key groups:
|
||||||
|
|
||||||
### Database
|
| Group | Variables | Notes |
|
||||||
|
|-------|-----------|-------|
|
||||||
| Variable | Default | Description |
|
| **Database** | `POSTGRES_USER`, `POSTGRES_PASSWORD`, `POSTGRES_DB` | Default user: `chrysopedia` |
|
||||||
|--------------------|----------------|---------------------------------|
|
| **LLM** | `LLM_API_URL`, `LLM_API_KEY`, `LLM_MODEL` | OpenAI-compatible endpoint |
|
||||||
| `POSTGRES_USER` | `chrysopedia` | PostgreSQL username |
|
| **LLM Fallback** | `LLM_FALLBACK_URL`, `LLM_FALLBACK_MODEL` | Automatic failover |
|
||||||
| `POSTGRES_PASSWORD`| `changeme` | PostgreSQL password |
|
| **Per-Stage Models** | `LLM_STAGE{2-5}_MODEL`, `LLM_STAGE{2-5}_MODALITY` | `chat` for fast stages, `thinking` for reasoning |
|
||||||
| `POSTGRES_DB` | `chrysopedia` | Database name |
|
| **Embedding** | `EMBEDDING_API_URL`, `EMBEDDING_MODEL` | Ollama nomic-embed-text |
|
||||||
| `DATABASE_URL` | *(composed)* | Full async connection string |
|
| **Vector DB** | `QDRANT_URL`, `QDRANT_COLLECTION` | Container-internal |
|
||||||
|
| **Features** | `REVIEW_MODE`, `DEBUG_MODE` | Review gate + LLM I/O capture |
|
||||||
### Services
|
| **Storage** | `TRANSCRIPT_STORAGE_PATH`, `VIDEO_METADATA_PATH` | Container bind mounts |
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|-----------------|------------------------------------|--------------------------|
|
|
||||||
| `REDIS_URL` | `redis://chrysopedia-redis:6379/0` | Redis connection string |
|
|
||||||
|
|
||||||
### LLM Configuration
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|---------------------|-------------------------------------------|------------------------------------|
|
|
||||||
| `LLM_API_URL` | `https://friend-openwebui.example.com/api`| Primary LLM endpoint (OpenAI-compatible) |
|
|
||||||
| `LLM_API_KEY` | `sk-changeme` | API key for primary LLM |
|
|
||||||
| `LLM_MODEL` | `qwen2.5-72b` | Primary model name |
|
|
||||||
| `LLM_FALLBACK_URL` | `http://localhost:11434/v1` | Fallback LLM endpoint (Ollama) |
|
|
||||||
| `LLM_FALLBACK_MODEL`| `qwen2.5:14b-q8_0` | Fallback model name |
|
|
||||||
|
|
||||||
### Embedding / Vector
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|-----------------------|-------------------------------|--------------------------|
|
|
||||||
| `EMBEDDING_API_URL` | `http://localhost:11434/v1` | Embedding endpoint |
|
|
||||||
| `EMBEDDING_MODEL` | `nomic-embed-text` | Embedding model name |
|
|
||||||
| `QDRANT_URL` | `http://qdrant:6333` | Qdrant vector DB URL |
|
|
||||||
| `QDRANT_COLLECTION` | `chrysopedia` | Qdrant collection name |
|
|
||||||
|
|
||||||
### Application
|
|
||||||
|
|
||||||
| Variable | Default | Description |
|
|
||||||
|--------------------------|----------------------------------|--------------------------------|
|
|
||||||
| `APP_ENV` | `production` | Environment (`development` / `production`) |
|
|
||||||
| `APP_LOG_LEVEL` | `info` | Log level |
|
|
||||||
| `APP_SECRET_KEY` | `changeme-generate-a-real-secret`| Application secret key |
|
|
||||||
| `TRANSCRIPT_STORAGE_PATH`| `/data/transcripts` | Transcript JSON storage path |
|
|
||||||
| `VIDEO_METADATA_PATH` | `/data/video_meta` | Video metadata storage path |
|
|
||||||
| `REVIEW_MODE` | `true` | Enable human review workflow |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Development Workflow
|
|
||||||
|
|
||||||
### Local development (without Docker)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create virtual environment
|
|
||||||
python -m venv .venv
|
|
||||||
source .venv/bin/activate
|
|
||||||
|
|
||||||
# Install backend dependencies
|
|
||||||
pip install -r backend/requirements.txt
|
|
||||||
|
|
||||||
# Start PostgreSQL and Redis (via Docker)
|
|
||||||
docker compose up -d chrysopedia-db chrysopedia-redis
|
|
||||||
|
|
||||||
# Run migrations
|
|
||||||
alembic upgrade head
|
|
||||||
|
|
||||||
# Start the API server with hot-reload
|
|
||||||
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Database migrations
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create a new migration after model changes
|
|
||||||
alembic revision --autogenerate -m "describe_change"
|
|
||||||
|
|
||||||
# Apply all pending migrations
|
|
||||||
alembic upgrade head
|
|
||||||
|
|
||||||
# Rollback one migration
|
|
||||||
alembic downgrade -1
|
|
||||||
```
|
|
||||||
|
|
||||||
### Project structure
|
|
||||||
|
|
||||||
```
|
|
||||||
content-to-kb-automator/
|
|
||||||
├── backend/ # FastAPI application
|
|
||||||
│ ├── main.py # App entry point, middleware, routers
|
|
||||||
│ ├── config.py # pydantic-settings configuration
|
|
||||||
│ ├── database.py # SQLAlchemy async engine + session
|
|
||||||
│ ├── models.py # 7-entity ORM models
|
|
||||||
│ ├── schemas.py # Pydantic request/response schemas
|
|
||||||
│ ├── routers/ # API route handlers
|
|
||||||
│ │ ├── health.py # /health (DB check)
|
|
||||||
│ │ ├── creators.py # /api/v1/creators
|
|
||||||
│ │ └── videos.py # /api/v1/videos
|
|
||||||
│ └── requirements.txt # Python dependencies
|
|
||||||
├── whisper/ # Desktop transcription script
|
|
||||||
│ ├── transcribe.py # Whisper CLI tool
|
|
||||||
│ ├── requirements.txt # Whisper + ffmpeg deps
|
|
||||||
│ └── README.md # Transcription documentation
|
|
||||||
├── docker/ # Dockerfiles
|
|
||||||
│ ├── Dockerfile.api # FastAPI + Celery image
|
|
||||||
│ ├── Dockerfile.web # React + nginx image
|
|
||||||
│ └── nginx.conf # nginx reverse proxy config
|
|
||||||
├── alembic/ # Database migrations
|
|
||||||
│ ├── env.py # Migration environment
|
|
||||||
│ └── versions/ # Migration scripts
|
|
||||||
├── config/ # Configuration files
|
|
||||||
│ └── canonical_tags.yaml # 6 topic categories + genre taxonomy
|
|
||||||
├── prompts/ # LLM prompt templates (editable)
|
|
||||||
├── frontend/ # React web UI (placeholder)
|
|
||||||
├── tests/ # Test fixtures and test suites
|
|
||||||
│ └── fixtures/ # Sample data for testing
|
|
||||||
├── docker-compose.yml # Full stack definition
|
|
||||||
├── alembic.ini # Alembic configuration
|
|
||||||
├── .env.example # Environment variable template
|
|
||||||
└── chrysopedia-spec.md # Full project specification
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
| Method | Path | Description |
|
### Public
|
||||||
|--------|-----------------------------|---------------------------------|
|
|
||||||
| GET | `/health` | Health check with DB connectivity |
|
| Method | Path | Description |
|
||||||
| GET | `/api/v1/health` | Lightweight health (no DB) |
|
|--------|------|-------------|
|
||||||
| GET | `/api/v1/creators` | List all creators |
|
| GET | `/health` | Health check (DB connectivity) |
|
||||||
| GET | `/api/v1/creators/{slug}` | Get creator by slug |
|
| GET | `/api/v1/search?q=&scope=&limit=` | Semantic + keyword search |
|
||||||
| GET | `/api/v1/videos` | List all source videos |
|
| GET | `/api/v1/techniques` | List technique pages |
|
||||||
|
| GET | `/api/v1/techniques/{slug}` | Technique detail + key moments |
|
||||||
|
| GET | `/api/v1/techniques/{slug}/versions` | Version history |
|
||||||
|
| GET | `/api/v1/creators` | List creators (sort, genre filter) |
|
||||||
|
| GET | `/api/v1/creators/{slug}` | Creator detail |
|
||||||
|
| GET | `/api/v1/topics` | Topic hierarchy with counts |
|
||||||
|
| GET | `/api/v1/videos` | List source videos |
|
||||||
|
| POST | `/api/v1/reports` | Submit content report |
|
||||||
|
|
||||||
|
### Admin
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| GET | `/api/v1/review/queue` | Review queue (status filter) |
|
||||||
|
| POST | `/api/v1/review/moments/{id}/approve` | Approve key moment |
|
||||||
|
| POST | `/api/v1/review/moments/{id}/reject` | Reject key moment |
|
||||||
|
| PUT | `/api/v1/review/moments/{id}` | Edit key moment |
|
||||||
|
| POST | `/api/v1/admin/pipeline/trigger/{video_id}` | Trigger/retrigger pipeline |
|
||||||
|
| GET | `/api/v1/admin/pipeline/events/{video_id}` | Pipeline event log |
|
||||||
|
| GET | `/api/v1/admin/pipeline/token-summary/{video_id}` | Token usage by stage |
|
||||||
|
| GET | `/api/v1/admin/pipeline/worker-status` | Celery worker status |
|
||||||
|
| PUT | `/api/v1/admin/pipeline/debug-mode` | Toggle debug mode |
|
||||||
|
|
||||||
|
### Ingest
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| POST | `/api/v1/ingest` | Upload Whisper JSON transcript |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## XPLTD Conventions
|
## Development
|
||||||
|
|
||||||
This project follows XPLTD infrastructure conventions:
|
```bash
|
||||||
|
# Local backend (with Docker services)
|
||||||
|
python -m venv .venv && source .venv/bin/activate
|
||||||
|
pip install -r backend/requirements.txt
|
||||||
|
docker compose up -d chrysopedia-db chrysopedia-redis
|
||||||
|
alembic upgrade head
|
||||||
|
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||||
|
|
||||||
- **Docker project name:** `xpltd_chrysopedia`
|
# Database migrations
|
||||||
- **Bind mounts:** persistent data stored under `/vmPool/r/services/`
|
alembic revision --autogenerate -m "describe_change"
|
||||||
- **Network:** dedicated bridge `chrysopedia` (`172.32.0.0/24`)
|
alembic upgrade head
|
||||||
- **PostgreSQL host port:** `5433` (avoids conflict with system PostgreSQL on `5432`)
|
```
|
||||||
|
|
||||||
|
### Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
chrysopedia/
|
||||||
|
├── backend/ # FastAPI application
|
||||||
|
│ ├── main.py # Entry point, middleware, router mounting
|
||||||
|
│ ├── config.py # Pydantic Settings (all env vars)
|
||||||
|
│ ├── models.py # SQLAlchemy ORM models
|
||||||
|
│ ├── schemas.py # Pydantic request/response schemas
|
||||||
|
│ ├── worker.py # Celery app configuration
|
||||||
|
│ ├── watcher.py # Transcript folder watcher service
|
||||||
|
│ ├── search_service.py # Semantic search + keyword fallback
|
||||||
|
│ ├── routers/ # API endpoint handlers
|
||||||
|
│ ├── pipeline/ # LLM pipeline stages + clients
|
||||||
|
│ │ ├── stages.py # Stages 2-5 (Celery tasks)
|
||||||
|
│ │ ├── llm_client.py # OpenAI-compatible LLM client
|
||||||
|
│ │ ├── embedding_client.py
|
||||||
|
│ │ └── qdrant_client.py
|
||||||
|
│ └── tests/
|
||||||
|
├── frontend/ # React + TypeScript + Vite
|
||||||
|
│ └── src/
|
||||||
|
│ ├── pages/ # Home, Search, Technique, Creator, Topic, Admin
|
||||||
|
│ ├── components/ # Shared UI components
|
||||||
|
│ └── api/ # Typed API clients
|
||||||
|
├── whisper/ # Desktop transcription (Whisper large-v3)
|
||||||
|
├── docker/ # Dockerfiles + nginx config
|
||||||
|
├── alembic/ # Database migrations
|
||||||
|
├── config/ # canonical_tags.yaml (topic taxonomy)
|
||||||
|
├── prompts/ # LLM prompt templates (editable at runtime)
|
||||||
|
├── docker-compose.yml
|
||||||
|
└── .env.example
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Deployment (ub01)
|
## Deployment (ub01)
|
||||||
|
|
||||||
The production stack runs on **ub01.a.xpltd.co**:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone (first time only — requires SSH agent forwarding)
|
ssh ub01
|
||||||
ssh -A ub01
|
|
||||||
cd /vmPool/r/repos/xpltdco/chrysopedia
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
||||||
git clone git@github.com:xpltdco/chrysopedia.git .
|
git pull && docker compose build && docker compose up -d
|
||||||
|
|
||||||
# Create .env from template
|
|
||||||
cp .env.example .env
|
|
||||||
# Edit .env with production secrets
|
|
||||||
|
|
||||||
# Build and start
|
|
||||||
docker compose build
|
|
||||||
docker compose up -d
|
|
||||||
|
|
||||||
# Run migrations
|
|
||||||
docker exec chrysopedia-api alembic upgrade head
|
|
||||||
|
|
||||||
# Pull embedding model (first time only)
|
|
||||||
docker exec chrysopedia-ollama ollama pull nomic-embed-text
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Service URLs
|
| Resource | Location |
|
||||||
| Service | URL |
|
|----------|----------|
|
||||||
|---------|-----|
|
| Web UI | `http://ub01:8096` |
|
||||||
| Web UI | http://ub01:8096 |
|
| API | `http://ub01:8096/health` |
|
||||||
| API Health | http://ub01:8096/health |
|
| PostgreSQL | `ub01:5433` |
|
||||||
| PostgreSQL | ub01:5433 |
|
|
||||||
| Compose config | `/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml` |
|
| Compose config | `/vmPool/r/compose/xpltd_chrysopedia/docker-compose.yml` |
|
||||||
|
| Persistent data | `/vmPool/r/services/chrysopedia_*` |
|
||||||
|
|
||||||
### Update Workflow
|
XPLTD conventions: `xpltd_chrysopedia` project name, dedicated bridge network (`172.32.0.0/24`), bind mounts under `/vmPool/r/services/`, PostgreSQL on port `5433`.
|
||||||
```bash
|
|
||||||
ssh -A ub01
|
|
||||||
cd /vmPool/r/repos/xpltdco/chrysopedia
|
|
||||||
git pull
|
|
||||||
docker compose build && docker compose up -d
|
|
||||||
```
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue