docs: bootstrap wiki with architecture documentation (M018/S02)

Pages: Architecture, Data Model, API Surface, Frontend, Pipeline,
Deployment, Development Guide, Decisions, plus sidebar navigation.

Content derived from Site Audit Report (M018/S01), PROJECT.md,
DECISIONS.md, KNOWLEDGE.md, and source code analysis.
GSD Agent 2026-04-03 21:08:31 +00:00
parent f6b00a0c53
commit 081b39f767
9 changed files with 866 additions and 0 deletions

103
API-Surface.md Normal file

@ -0,0 +1,103 @@
# API Surface
41 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
## Public Endpoints (10)
| Method | Path | Response Shape | Notes |
|--------|------|---------------|-------|
| GET | `/health` | `{status, service, version, database}` | Health check |
| GET | `/api/v1/stats` | `{technique_count, creator_count}` | Homepage stats |
| GET | `/api/v1/search?q=` | `{items, partial_matches, total, query, fallback_used}` | Semantic + keyword fallback (D009) |
| GET | `/api/v1/search/suggestions?q=` | `{suggestions: [{text, type}]}` | Typeahead autocomplete |
| GET | `/api/v1/search/popular` | `{items: [{query, count}]}` | Popular searches (D025) |
| GET | `/api/v1/techniques?limit=&offset=` | `{items, total, offset, limit}` | Paginated technique list |
| GET | `/api/v1/techniques/random` | `{slug}` | Returns JSON slug (not redirect) |
| GET | `/api/v1/techniques/{slug}` | 22-field object | Full technique detail with relations |
| GET | `/api/v1/techniques/{slug}/versions` | `{items, total}` | Version history |
| GET | `/api/v1/techniques/{slug}/versions/{n}` | Version detail | Single version |
### Technique Detail Fields (22)
title, slug, topic_category, topic_tags, summary, body_sections, body_sections_format, signal_chains, plugins, id, creator_id, creator_name, creator_slug, source_quality, view_count, key_moment_count, created_at, updated_at, key_moments, creator_info, related_links, version_count, source_videos.
## Browse Endpoints (5)
| Method | Path | Response Shape | Notes |
|--------|------|---------------|-------|
| GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views |
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links |
| GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) |
| GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques |
| GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques |
## Report Endpoints (3)
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/api/v1/reports` | Submit content report |
| GET | `/api/v1/admin/reports` | List all reports |
| PATCH | `/api/v1/admin/reports/{id}` | Update report status |
## Pipeline Admin Endpoints (20+)
All under prefix `/api/v1/admin/pipeline/`.
| Method | Path | Purpose |
|--------|------|---------|
| GET | `/admin/pipeline/videos` | Paginated video list with pipeline status |
| POST | `/admin/pipeline/trigger/{video_id}` | Trigger pipeline for video |
| POST | `/admin/pipeline/clean-retrigger/{video_id}` | Wipe output + reprocess |
| POST | `/admin/pipeline/revoke/{video_id}` | Revoke active pipeline task |
| POST | `/admin/pipeline/rerun-stage/{video_id}` | Re-run specific stage |
| GET | `/admin/pipeline/events` | Pipeline event log |
| GET | `/admin/pipeline/runs` | Pipeline run history |
| GET | `/admin/pipeline/chunking-inspector/{video_id}` | Inspect chunking results |
| GET | `/admin/pipeline/embed-status` | Embedding/Qdrant health |
| GET | `/admin/pipeline/debug-mode` | Get debug mode state |
| POST | `/admin/pipeline/debug-mode` | Set debug mode state |
| GET | `/admin/pipeline/token-summary` | Token usage summary |
| GET | `/admin/pipeline/stale-pages` | Pages needing regeneration |
| POST | `/admin/pipeline/bulk-resynthesize` | Regenerate all technique pages |
| POST | `/admin/pipeline/wipe-all-output` | Delete all pipeline output |
| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization |
| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index |
| GET | `/admin/pipeline/worker-status` | Celery worker health |
| GET | `/admin/pipeline/recent-activity` | Recent pipeline events |
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
## Other Endpoints (2)
| Method | Path | Notes |
|--------|------|-------|
| POST | `/api/v1/ingest` | Transcript upload |
| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) |
## Response Conventions
**Standard paginated response:**
```json
{
"items": [...],
"total": 83,
"offset": 0,
"limit": 20
}
```
**Known inconsistencies:**
- `GET /topics` returns bare list instead of paginated dict
- `GET /videos` returns bare list instead of paginated dict
- Search uses `items` key (not `results`)
- `/techniques/random` returns JSON `{slug}` (not HTTP redirect)
**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.**
## Authentication
No authentication on any endpoint. Admin routes (`/admin/*`) are accessible to anyone with network access. Phase 2 will add auth middleware (see [[Decisions]] D033).
---
*See also: [[Architecture]], [[Data-Model]], [[Frontend]]*

84
Architecture.md Normal file

@ -0,0 +1,84 @@
# Architecture
## System Overview
Chrysopedia is a self-hosted music production knowledge base that synthesizes technique articles from video transcripts using a 6-stage LLM pipeline. It runs as a Docker Compose stack on `ub01` with 8 containers.
```
┌─────────────────────────────────────────────────────────────────┐
│ ub01 (10.0.0.10) │
│ Docker Compose: xpltd_chrysopedia Subnet: 172.32.0.0/24 │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ nginx │ │ FastAPI │ │ Celery │ │ Watcher │ │
│ │ :8096 │─▶│ :8000 │ │ Worker │ │ (PollingObs) │ │
│ └──────────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │
│ ┌────────────┼─────────────┼────────────────┘ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Postgres │ │ Redis │ │ Qdrant │ │ Ollama │ │
│ │ :5433 │ │ :6379 │ │ :6333 │ │ :11434 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ nginx reverse proxy
┌────────┴────────┐
│ nuc01 (10.0.0.9)│
│ chrysopedia.com │
│ :443 → :8096 │
└─────────────────┘
```
## Key Architectural Characteristics
- **Zero external frontend dependencies** beyond React, react-router-dom, and Vite
- **Monolithic CSS** — 5,820 lines, single file, BEM naming, 77 custom properties
- **No authentication** — admin routes are network-access-controlled only
- **Dual SQLAlchemy strategy** — async engine for FastAPI request handlers, sync engine for Celery pipeline tasks (D004)
- **Non-blocking pipeline side effects** — embedding/Qdrant failures don't block page synthesis (D005)
## Docker Services
| Service | Image | Container | Port | Volume |
|---------|-------|-----------|------|--------|
| PostgreSQL 16 | postgres:16-alpine | chrysopedia-db | 5433:5432 | chrysopedia_postgres_data |
| Redis 7 | redis:7-alpine | chrysopedia-redis | 6379 (internal) | — |
| Qdrant 1.13.2 | qdrant/qdrant:v1.13.2 | chrysopedia-qdrant | 6333 (internal) | chrysopedia_qdrant_data |
| Ollama | ollama/ollama:latest | chrysopedia-ollama | 11434 (internal) | chrysopedia_ollama_data |
| API (FastAPI) | Dockerfile.api | chrysopedia-api | 8000 (internal) | Bind: backend/, prompts/ |
| Worker (Celery) | Dockerfile.api | chrysopedia-worker | — | Bind: backend/, prompts/ |
| Watcher | Dockerfile.api | chrysopedia-watcher | — | Bind: watch dir |
| Web (nginx) | Dockerfile.web | chrysopedia-web-8096 | 8096:80 | — |
## Network Topology
- **Compose subnet:** 172.32.0.0/24 (D015)
- **External access:** nginx on nuc01 (10.0.0.9) reverse-proxies to ub01:8096
- **DNS:** AdGuard Home rewrites chrysopedia.com → 10.0.0.9
- **Internal services** (Redis, Qdrant, Ollama) are not exposed outside the Docker network
## Tech Stack
| Layer | Technology |
|-------|-----------|
| Frontend | React 18 + TypeScript + Vite |
| Backend | FastAPI + Celery + SQLAlchemy (async) |
| Database | PostgreSQL 16 |
| Cache/Broker | Redis 7 (Celery broker + review mode toggle + classification cache) |
| Vector Store | Qdrant 1.13.2 |
| Embeddings | Ollama (nomic-embed-text) via OpenAI-compatible /v1/embeddings |
| LLM | OpenAI-compatible API — DGX Sparks Qwen primary, local Ollama fallback |
| Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 |
## Data Flow
1. **Ingestion:** Video files → Whisper transcription (desktop, RTX 4090) → JSON transcript
2. **Upload:** Transcript JSON dropped into watch folder or POSTed to `/api/v1/ingest`
3. **Pipeline:** 6 Celery stages process each video (see [[Pipeline]])
4. **Storage:** Technique pages + key moments → PostgreSQL, embeddings → Qdrant
5. **Serving:** React SPA fetches from FastAPI, search queries hit Qdrant then PostgreSQL fallback
---
*See also: [[Deployment]], [[Pipeline]], [[Data-Model]]*

135
Data-Model.md Normal file

@ -0,0 +1,135 @@
# Data Model
13 SQLAlchemy models in `backend/models.py`.
## Entity Relationship Overview
```
Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
│ │
│ └──→ (N) KeyMoment
└──→ (N) TechniquePage (M) ←──→ (N) Tag
├──→ (N) TechniquePageVersion
├──→ (N) RelatedTechniqueLink
└──→ (M:N) SourceVideo (via TechniquePageVideo)
```
## Core Content Models
### Creator
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| name | String | Unique, from folder name |
| slug | String | URL-safe, unique |
| genres | ARRAY(String) | e.g. ["dubstep", "sound design"] |
| avatar_url | String | Optional |
| bio | Text | Admin-editable |
| social_links | JSONB | Platform → URL mapping |
| featured | Boolean | For homepage spotlight |
### SourceVideo
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| creator_id | FK → Creator | |
| filename | String | Original video filename |
| youtube_url | String | Optional |
| folder_name | String | Filesystem folder name |
| processing_status | Enum | queued / in_progress / complete / errored / revoked |
| pipeline_stage | Integer | Current/last completed stage (1-6) |
### TranscriptSegment
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| source_video_id | FK → SourceVideo | |
| start_time | Float | Seconds |
| end_time | Float | Seconds |
| text | Text | Segment transcript text |
### KeyMoment
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| source_video_id | FK → SourceVideo | |
| title | String | |
| summary | Text | |
| start_time | Float | Seconds |
| end_time | Float | Seconds |
| topic_category | String | e.g. "Sound Design" |
| topic_tags | ARRAY(String) | |
| content_type | Enum | tutorial / tip / exploration / walkthrough |
| review_status | String | pending / approved / rejected |
### TechniquePage
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| creator_id | FK → Creator | |
| title | String | |
| slug | String | Unique, URL-safe |
| summary | Text | |
| body_sections | JSONB | v1: dict, v2: list-of-objects with nesting (D024) |
| body_sections_format | String | "v1" or "v2" — format discriminator |
| signal_chains | JSONB | Signal flow descriptions |
| plugins | ARRAY(String) | Referenced plugins/VSTs |
| topic_category | String | |
| topic_tags | ARRAY(String) | |
| source_quality | Enum | high / medium / low |
| view_count | Integer | |
### TechniquePageVersion
| Field | Type | Notes |
|-------|------|-------|
| id | Integer PK | |
| technique_page_id | FK → TechniquePage | |
| version_number | Integer | Sequential |
| content_snapshot | JSONB | Full page state at version time |
| pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config |
## Supporting Models
| Model | Purpose |
|-------|---------|
| **RelatedTechniqueLink** | Directed link between technique pages (source → target with label) |
| **Tag** | Normalized tag with M:N join to TechniquePage via `technique_page_tags` |
| **TechniquePageVideo** | Join table: TechniquePage ↔ SourceVideo (multi-source pages) |
| **ContentReport** | User-submitted content reports with status workflow (open/acknowledged/resolved/dismissed) |
| **SearchLog** | Query logging for popular searches feature (D025) |
| **PipelineRun** | Pipeline execution tracking per video with status and trigger type |
| **PipelineEvent** | Granular pipeline stage events with token counts and JSONB payload |
## Enums
| Enum | Values |
|------|--------|
| ContentType | tutorial, tip, exploration, walkthrough |
| ProcessingStatus | queued, in_progress, complete, errored, revoked |
| KeyMomentContentType | technique, concept, workflow, reference |
| SourceQuality | high, medium, low |
| RelationshipType | related, prerequisite, builds_on |
| ReportType | inaccuracy, missing_info, offensive, other |
| ReportStatus | open, acknowledged, resolved, dismissed |
| PipelineRunStatus | pending, running, completed, failed, revoked |
| PipelineRunTrigger | auto, manual, retrigger, clean_retrigger |
## Schema Notes
- **No Alembic migrations** — schema changes currently require manual DDL
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
---
*See also: [[Architecture]], [[API-Surface]], [[Pipeline]]*

45
Decisions.md Normal file

@ -0,0 +1,45 @@
# Decisions
Architectural and pattern decisions made during Chrysopedia development. Append-only — to reverse a decision, add a new entry that supersedes it.
## Architecture Decisions
| # | When | Decision | Choice | Rationale |
|---|------|----------|--------|-----------|
| D001 | — | Storage layer selection | PostgreSQL + Qdrant + local filesystem | PostgreSQL for JSONB, Qdrant already running on hypervisor, filesystem for transcript JSON |
| D002 | — | Timestamp handling (asyncpg) | `datetime.now(timezone.utc).replace(tzinfo=None)` | asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns |
| D004 | — | Sync vs async in Celery tasks | Sync openai, QdrantClient, SQLAlchemy in Celery | Avoids nested event loop errors with gevent/eventlet workers |
| D005 | — | Embedding failure handling | Non-blocking — log errors, don't fail pipeline | Qdrant may be unreachable; core output (PostgreSQL) is preserved |
| D007 | M001/S04 | Review mode toggle persistence | Redis key `chrysopedia:review_mode` | Redis already in stack; simpler than DB table for single boolean |
| D009 | M001/S05 | Search service pattern | Separate async SearchService for FastAPI | Keeps sync pipeline clients untouched; 300ms timeout + keyword fallback |
| D015 | M002/S01 | Docker network subnet | 172.32.0.0/24 | 172.24.0.0/24 was taken by xpltd_docs_default |
| D016 | M002/S01 | Embedding service | Ollama container (nomic-embed-text) | OpenWebUI doesn't serve /v1/embeddings |
| D017 | — | CSS theming | 77 semantic custom properties, cyan accent | Full variable-based palette for consistency and future theme switching |
| D018 | M004/S04 | Version snapshot failure handling | Best-effort — failure doesn't block page update | Follows D005 pattern for non-critical side effects |
| D019 | M005/S02 | Technique page layout | CSS grid 2-column (1fr + 22rem sidebar), 64rem max-width | Collapses at 768px; accommodates prose + sidebar |
| D023 | M012/S01 | Qdrant embedding text enrichment | Prepend creator_name, join topic_tags | Enables creator-name and tag-specific semantic search |
| D024 | M014/S01 | Sections with subsections content model | Empty-string content for parent sections | Avoids duplication; substance lives in subsection content fields |
| D025 | M015 | Search query storage | PostgreSQL search_log + Redis cache (5-min TTL) | Full history for analytics; Redis prevents DB hit on every homepage load |
## Phase 2 Decisions
| # | Decision | Choice | Rationale |
|---|----------|--------|-----------|
| D031 | Phase 2 milestone structure | 8 milestones (M018M025) with parallel frontend/backend slices | Maps to Sprint 0-8 plan; deploy gate per milestone |
| D032 | RAG framework | LightRAG + Qdrant + NetworkX (MVP) | Graph-enhanced retrieval; supports existing Qdrant; incremental updates |
| D033 | Monetization | Demo build with "Coming Soon" placeholders | Recruit creators first; Stripe Connect deferred to Phase 3 |
| D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 |
| D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure |
## UI/UX Decisions
| # | Decision | Choice |
|---|----------|--------|
| D014 | Creator equity | Random default sort; no creator privileged |
| D020 | Topics card differentiation | 3px colored left border + dot |
| D021 | M011 findings triage | 12/16 approved; denied beginner paths, YouTube links, hide admin, CTA label |
| D030 | ToC scroll-spy rootMargin | `0px 0px -70% 0px` — active when in top 30% of viewport |
---
*See also: [[Architecture]], [[Development-Guide]]*

130
Deployment.md Normal file

@ -0,0 +1,130 @@
# Deployment
## Quick Reference
```bash
# SSH to ub01
ssh ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
# Standard deploy
git pull
docker compose build && docker compose up -d
# Run migrations (if Alembic is configured)
docker exec chrysopedia-api alembic upgrade head
# View logs
docker logs -f chrysopedia-api
docker logs -f chrysopedia-worker
docker logs -f chrysopedia-watcher
# Check status
docker ps --filter name=chrysopedia
```
## File Layout on ub01
```
/vmPool/r/
├── repos/xpltdco/chrysopedia/ # Git repo (source code)
├── compose/xpltd_chrysopedia/ # Symlink to repo's docker-compose.yml
├── services/
│ ├── chrysopedia_postgres_data/ # PostgreSQL data
│ ├── chrysopedia_qdrant_data/ # Qdrant vector data
│ ├── chrysopedia_ollama_data/ # Ollama model cache
│ └── chrysopedia_watch/ # Watcher input directory
│ ├── processed/ # Successfully ingested transcripts
│ └── failed/ # Failed transcripts + .error sidecars
```
## Docker Compose Configuration
- **Project name:** `xpltd_chrysopedia`
- **Network:** `chrysopedia-net` (172.32.0.0/24)
- **Compose file:** `/vmPool/r/repos/xpltdco/chrysopedia/docker-compose.yml`
### Build Args / Environment
Frontend build-time constants are injected via Docker build args:
```yaml
build:
args:
VITE_APP_VERSION: ${APP_VERSION:-0.1.0}
VITE_GIT_COMMIT: ${GIT_COMMIT:-unknown}
```
**Important:** `ARG``ENV``RUN npm run build` ordering matters in the Dockerfile. The `ENV` line must appear before the build step.
### Service Dependencies
```
chrysopedia-web-8096 → chrysopedia-api → chrysopedia-db, chrysopedia-redis
chrysopedia-worker → chrysopedia-db, chrysopedia-redis, chrysopedia-qdrant, chrysopedia-ollama
chrysopedia-watcher → chrysopedia-api
```
## Healthchecks
| Service | Healthcheck | Notes |
|---------|------------|-------|
| PostgreSQL | `pg_isready` | Built-in |
| Redis | `redis-cli ping` | Built-in |
| Qdrant | `bash -c 'echo > /dev/tcp/localhost/6333'` | No curl available |
| Ollama | `ollama list` | Built-in CLI |
| API | `curl -f http://localhost:8000/health` | |
| Worker | `celery -A worker inspect ping` | Not HTTP |
| Watcher | `python -c "import os; os.kill(1, 0)"` | Slim image, no pgrep |
## nginx Reverse Proxy
On nuc01 (10.0.0.9):
- Server block proxies chrysopedia.com → ub01:8096
- SSL via Certbot (Let's Encrypt)
- SPA fallback: all paths return index.html
**Stale DNS after rebuild:** If API container is rebuilt, restart nginx container to pick up new internal IP:
```bash
docker compose restart chrysopedia-web-8096
```
## Rebuilding After Code Changes
```bash
# Full rebuild (backend + frontend)
cd /vmPool/r/repos/xpltdco/chrysopedia
git pull
docker compose build && docker compose up -d
# Frontend only
docker compose build chrysopedia-web-8096 && docker compose up -d chrysopedia-web-8096
# Backend only (API + Worker share same image)
docker compose build chrysopedia-api && docker compose up -d chrysopedia-api chrysopedia-worker
# Restart without rebuild
docker compose restart chrysopedia-api chrysopedia-worker
```
## Port Mapping
| Service | Container Port | Host Port | Binding |
|---------|---------------|-----------|---------|
| PostgreSQL | 5432 | 5433 | 0.0.0.0 |
| Web (nginx) | 80 | 8096 | 0.0.0.0 |
| SSH (Forgejo) | 22 | 2222 | 0.0.0.0 |
All other services (Redis, Qdrant, Ollama, API, Worker) are internal-only.
## Monitoring
- **Web UI:** http://ub01:8096
- **API Health:** http://ub01:8096/health
- **Pipeline Admin:** http://ub01:8096/admin/pipeline
- **Worker Status:** http://ub01:8096/admin/pipeline (shows Celery worker count)
- **PostgreSQL:** Connect via `psql -h ub01 -p 5433 -U chrysopedia`
---
*See also: [[Architecture]], [[Development-Guide]]*

134
Development-Guide.md Normal file

@ -0,0 +1,134 @@
# Development Guide
## Getting Started
### Prerequisites
- Docker + Docker Compose
- Node.js 18+ (for frontend dev)
- Python 3.11+ (for backend dev)
- SSH access to ub01
### Local Development
The simplest approach is working directly on ub01:
```bash
ssh ub01
cd /vmPool/r/repos/xpltdco/chrysopedia
```
For frontend-only work, you can run Vite locally and proxy to the remote API:
```bash
cd frontend
npm install
npm run dev # Vite dev server with /api proxy to localhost:8001
```
## Project Structure
```
chrysopedia/
├── backend/
│ ├── config.py # Settings (env vars, LRU cached)
│ ├── database.py # Async SQLAlchemy engine + session
│ ├── main.py # FastAPI app, router registration
│ ├── models.py # All 13 SQLAlchemy models
│ ├── schemas.py # Pydantic request/response schemas
│ ├── search_service.py # Async search (Qdrant + keyword fallback)
│ ├── redis_client.py # Async Redis client
│ ├── watcher.py # Transcript folder watcher
│ ├── routers/ # FastAPI route handlers
│ ├── pipeline/ # Celery pipeline stages
│ │ ├── stages.py # Stage implementations
│ │ └── quality/ # Prompt quality toolkit
│ ├── services/ # Business logic services
│ └── tests/ # pytest test suite
├── frontend/
│ └── src/
│ ├── App.tsx # Routes, layout
│ ├── App.css # All styles (5,820 lines)
│ ├── main.tsx # React entry point
│ ├── api/ # API client (public-client.ts)
│ ├── pages/ # Page components (11)
│ ├── components/ # Shared components (11+)
│ ├── hooks/ # Custom hooks (3)
│ └── utils/ # Utilities (citations, slugs)
├── prompts/ # LLM prompt templates
├── alembic/ # DB migrations (if configured)
├── docker-compose.yml
├── Dockerfile.api
├── Dockerfile.web
└── CLAUDE.md # AI agent development reference
```
## Common Gotchas
### asyncpg Timestamp Errors
Use `datetime.now(timezone.utc).replace(tzinfo=None)` for all timestamp defaults. asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns.
### SQLAlchemy Column Name Conflicts
Never name a column `relationship`, `query`, or `metadata` — these shadow ORM functions. Use `from sqlalchemy.orm import relationship as sa_relationship` if the schema requires it.
### Vite Build Constants
Always wrap with `JSON.stringify()`: `define: { __APP_VERSION__: JSON.stringify(version) }`. Without it, the built code gets unquoted values (syntax error).
### Docker ARG/ENV Ordering
`ARG VITE_FOO=default``ENV VITE_FOO=$VITE_FOO``RUN npm run build`. The ENV line must appear before the build step.
### Slim Docker Images
`python:3.x-slim` doesn't include `procps` (no `pgrep`, `ps`). Use `python -c "import os; os.kill(1, 0)"` for healthchecks.
### Host Port 8000 Conflict
Port 8000 on ub01 may be used by kerf-engine. Use 8001 for local testing, or ensure kerf-engine is stopped.
### Nginx Stale DNS
After rebuilding API container, restart the web container: `docker compose restart chrysopedia-web-8096`.
### ZFS Filesystem Watchers
Use `watchdog.observers.polling.PollingObserver` instead of the default inotify observer — inotify doesn't reliably detect changes on ZFS/NFS.
### File Stability for SCP Uploads
Wait for file size stability (check twice with 2-second gap) before processing files received via SCP/rsync.
## Testing
```bash
cd backend
python -m pytest tests/ -v
```
Tests use:
- `NullPool` for async engine (prevents connection pool contention)
- Module-level patching for Celery stage globals (`_engine`, `_SessionLocal`)
- `patch('pipeline.stages.run_pipeline')` for lazy import mocking (not at the router level)
## Adding New Features
### New API Endpoint
1. Create router in `backend/routers/foo.py` with `APIRouter(prefix="/foo", tags=["foo"])`
2. Register in `backend/main.py`: `app.include_router(foo.router, prefix="/api/v1")`
3. Define schemas in `backend/schemas.py`
4. Use paginated response: `{items, total, offset, limit}`
### New Frontend Route
1. Add `<Route>` to `App.tsx`
2. Create page component in `frontend/src/pages/`
3. Call `useDocumentTitle()` in the component
4. Add API functions to `public-client.ts`
### New Database Model
1. Add to `backend/models.py`
2. Add schemas to `backend/schemas.py`
3. Apply DDL manually or via Alembic migration
4. Use `_now()` helper for timestamp defaults
### New CSS
1. Append to `App.css` using BEM naming
2. Use CSS custom properties for all colors
3. Prefer 768px breakpoint for mobile/desktop split
4. Namespace Phase 2 selectors: `.p2-feature__element`
---
*See also: [[Architecture]], [[Frontend]], [[Deployment]]*

110
Frontend.md Normal file

@ -0,0 +1,110 @@
# Frontend
React 18 + TypeScript + Vite SPA. No UI library, no state management library, no CSS framework.
## Route Map
| Route | Page Component | Auth | Notes |
|-------|---------------|------|-------|
| `/` | Home | Public | Hero search, stats counters, popular topics, nav cards |
| `/search` | SearchResults | Public | Sort, highlights, partial matches |
| `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations |
| `/creators` | CreatorsBrowse | Public | Random default sort, genre filters |
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list |
| `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics |
| `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques |
| `/about` | About | Public | Static project info |
| `/admin/reports` | AdminReports | Admin* | Content reports |
| `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management |
| `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin |
| `*` | → Redirect `/` | — | SPA fallback |
*Admin routes have no authentication gate.
**Routing:** All routes in a single `<Routes>` block in `App.tsx`. nginx returns the SPA shell for all paths; react-router-dom v6 handles client-side routing.
## Shared Components
| Component | Purpose |
|-----------|---------|
| SearchAutocomplete | Global search with Ctrl+Shift+F shortcut (nav + mobile instances) |
| AdminDropdown | Hover-open at desktop, tap-toggle on mobile |
| AppFooter | Version, build date, GitHub link |
| TableOfContents | Sticky sidebar ToC with IntersectionObserver scroll-spy |
| SortDropdown | Reusable sort selector |
| TagList | Tag/badge pills with +N overflow |
| CategoryIcons | SVG icons per topic category |
| CreatorAvatar | Avatar with fallback |
| CopyLinkButton | Clipboard copy with tooltip |
| SocialIcons | Social media link icons (9 platforms) |
| ReportIssueModal | Content report submission |
## Hooks
| Hook | Purpose |
|------|---------|
| useCountUp | Animated counter for homepage stats |
| useSortPreference | Persists sort preference in localStorage |
| useDocumentTitle | Sets `<title>` per page (all 10 pages instrumented) |
## State Management
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library.
## API Client
Single module `public-client.ts` (~600 lines) with typed `request<T>` helper. Relative `/api/v1` base URL (nginx proxies to API container). All response TypeScript interfaces defined in the same file.
## CSS Architecture
| Property | Value |
|----------|-------|
| File | `frontend/src/App.css` |
| Lines | 5,820 |
| Unique classes | ~589 |
| Naming | BEM (`block__element--modifier`) |
| Theme | Dark-only (no light mode) |
| Custom properties | 77 in `:root` (D017) |
| Accent color | Cyan `#22d3ee` |
| Font stack | System fonts |
| Preprocessor | None |
| CSS Modules | None |
### Custom Property Categories (77 total)
- **Surface colors:** page background, card backgrounds, nav, footer, input
- **Text colors:** primary, secondary, muted, inverse, link, heading
- **Accent colors:** primary cyan, hover/active, focus rings
- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories
- **Status colors:** Success/warning/error/info
- **Border colors:** Default, hover, focus, divider
- **Shadow colors:** Elevation, glow effects
- **Overlay colors:** Modal/dropdown overlays
### Breakpoints
| Breakpoint | Usage |
|-----------|-------|
| 480px | Narrow mobile — compact cards |
| 600px | Wider mobile — grid adjustments |
| 640px | Small tablet — content width |
| 768px | Desktop ↔ mobile transition — sidebar collapse |
### Layout Patterns
- **Page max-width:** 64rem (D019)
- **Technique page:** CSS grid 2-column (1fr + 22rem sidebar), collapses at 768px
- **Card layouts:** CSS grid with `auto-fill, minmax(...)` for responsive grids
- **Collapsible sections:** `grid-template-rows: 0fr/1fr` animation
- **Sticky elements:** ToC sidebar, reading header
## Build
- **Bundler:** Vite
- **Build-time constants:** `__APP_VERSION__`, `__BUILD_DATE__`, `__GIT_COMMIT__` via `define` (must use `JSON.stringify`)
- **Dev proxy:** `/api``localhost:8001`
- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container
---
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*

108
Pipeline.md Normal file

@ -0,0 +1,108 @@
# Pipeline
6-stage LLM-powered extraction pipeline that transforms video transcripts into structured technique articles.
## Pipeline Stages
```
Video File
[Desktop] Whisper large-v3 (RTX 4090) → transcript JSON
[Watcher/API] Ingest → SourceVideo + TranscriptSegments in PostgreSQL
Stage 1: Transcript Segmentation — chunk transcript into logical segments
Stage 2: Key Moment Extraction — identify teachable moments with timestamps
Stage 3: (reserved)
Stage 4: Classification & Tagging — assign topic_category + topic_tags per moment
Stage 5: Technique Page Synthesis — compose study guide articles from moments
Stage 6: Embed & Index — generate embeddings, upsert to Qdrant (non-blocking)
```
## Stage Details
### Stage 1: Transcript Segmentation
- Chunks raw transcript into logical segments
- Input: TranscriptSegments from DB
- Output: Segmented data for stage 2
### Stage 2: Key Moment Extraction
- Identifies teachable moments with titles, summaries, timestamps
- Uses LLM with prompt template from `prompts/` directory
- Output: KeyMoment records in PostgreSQL
### Stage 4: Classification & Tagging
- Assigns topic_category and topic_tags to each key moment
- References canonical tag list (`canonical_tags.yaml`) with aliases
- Output: Classification data stored in Redis (`chrysopedia:classification:{video_id}`, 24h TTL)
### Stage 5: Technique Page Synthesis
- Composes study guide articles from classified key moments
- Handles multi-source merging: new video moments merge into existing technique pages
- Uses offset-based citation indexing (existing [0]-[N-1], new [N]-[N+M-1])
- Creates pre-overwrite version snapshot before mutating existing pages (D018)
- Output: TechniquePage records with body_sections (v2 format), signal_chains, plugins
### Stage 6: Embed & Index
- Generates embeddings via Ollama (nomic-embed-text)
- Embedding text enriched with creator_name and topic_tags (D023)
- Upserts to Qdrant with deterministic UUIDs based on content
- **Non-blocking:** Failures log WARNING but don't fail the pipeline (D005)
- Can be re-triggered independently via `/admin/pipeline/reindex-all`
## LLM Configuration
| Setting | Value |
|---------|-------|
| Primary LLM | DGX Sparks Qwen (OpenAI-compatible API) |
| Fallback LLM | Local Ollama |
| Embedding model | nomic-embed-text (Ollama) |
| Model routing | Per-stage configuration (chat vs thinking models) |
## Prompt Template System
- Prompt files stored in `prompts/` directory (D013)
- Templates use XML-style content fencing
- Editable without code changes — pipeline reads from disk at runtime
- SHA-256 hashes tracked in TechniquePageVersion.pipeline_metadata for reproducibility
- Re-process after prompt edits via `POST /admin/pipeline/trigger/{video_id}`
## Pipeline Admin Features
- **Debug mode:** Redis-backed toggle captures full LLM I/O (system prompt, user prompt, response) in pipeline_events
- **Token tracking:** Per-event and per-video token usage visible in admin UI
- **Stale page detection:** Identifies pages needing regeneration
- **Bulk operations:** Bulk resynthesize, wipe all output, reindex all
- **Worker status:** Real-time Celery worker health check
## Prompt Quality Toolkit
CLI tool (`python -m pipeline.quality`) with:
- **LLM fitness suite** — 9 tests (Mandelbrot reasoning, JSON compliance, instruction following)
- **5-dimension quality scorer** with voice preservation dial
- **Automated prompt A/B optimization loop** — LLM-powered variant generation, iterative scoring, leaderboard
- **Multi-stage support** for pipeline stages 2-5 with per-stage rubrics and fixtures
## Key Design Decisions
- **Sync clients in Celery** (D004): openai.OpenAI, QdrantClient, sync SQLAlchemy. Avoids nested event loop errors.
- **Non-blocking embedding** (D005): Stage 6 failures don't block core pipeline output.
- **Redis for stage 4 data**: Classification results in Redis with 24h TTL, not DB columns.
- **Best-effort versioning** (D018): Version snapshot failure doesn't block page update.
## Transcript Watcher
Standalone service (`watcher.py`) monitors `/vmPool/r/services/chrysopedia_watch/` for new transcript JSON files:
- Uses `watchdog.observers.polling.PollingObserver` for ZFS reliability
- Validates file structure, waits for size stability (handles partial SCP writes)
- POSTs to ingest API on file detection
- Moves processed files to `processed/`, failures to `failed/` with `.error` sidecar
---
*See also: [[Architecture]], [[Data-Model]], [[Deployment]]*

17
_Sidebar.md Normal file

@ -0,0 +1,17 @@
### Chrysopedia Wiki
- [[Home]]
**Architecture**
- [[Architecture]]
- [[Data-Model]]
- [[Pipeline]]
**Reference**
- [[API-Surface]]
- [[Frontend]]
- [[Decisions]]
**Operations**
- [[Deployment]]
- [[Development-Guide]]