diff --git a/API-Surface.md b/API-Surface.md new file mode 100644 index 0000000..e50b5cf --- /dev/null +++ b/API-Surface.md @@ -0,0 +1,103 @@ +# API Surface + +41 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`. + +## Public Endpoints (10) + +| Method | Path | Response Shape | Notes | +|--------|------|---------------|-------| +| GET | `/health` | `{status, service, version, database}` | Health check | +| GET | `/api/v1/stats` | `{technique_count, creator_count}` | Homepage stats | +| GET | `/api/v1/search?q=` | `{items, partial_matches, total, query, fallback_used}` | Semantic + keyword fallback (D009) | +| GET | `/api/v1/search/suggestions?q=` | `{suggestions: [{text, type}]}` | Typeahead autocomplete | +| GET | `/api/v1/search/popular` | `{items: [{query, count}]}` | Popular searches (D025) | +| GET | `/api/v1/techniques?limit=&offset=` | `{items, total, offset, limit}` | Paginated technique list | +| GET | `/api/v1/techniques/random` | `{slug}` | Returns JSON slug (not redirect) | +| GET | `/api/v1/techniques/{slug}` | 22-field object | Full technique detail with relations | +| GET | `/api/v1/techniques/{slug}/versions` | `{items, total}` | Version history | +| GET | `/api/v1/techniques/{slug}/versions/{n}` | Version detail | Single version | + +### Technique Detail Fields (22) + +title, slug, topic_category, topic_tags, summary, body_sections, body_sections_format, signal_chains, plugins, id, creator_id, creator_name, creator_slug, source_quality, view_count, key_moment_count, created_at, updated_at, key_moments, creator_info, related_links, version_count, source_videos. + +## Browse Endpoints (5) + +| Method | Path | Response Shape | Notes | +|--------|------|---------------|-------| +| GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views | +| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links | +| GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) | +| GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques | +| GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques | + +## Report Endpoints (3) + +| Method | Path | Purpose | +|--------|------|---------| +| POST | `/api/v1/reports` | Submit content report | +| GET | `/api/v1/admin/reports` | List all reports | +| PATCH | `/api/v1/admin/reports/{id}` | Update report status | + +## Pipeline Admin Endpoints (20+) + +All under prefix `/api/v1/admin/pipeline/`. + +| Method | Path | Purpose | +|--------|------|---------| +| GET | `/admin/pipeline/videos` | Paginated video list with pipeline status | +| POST | `/admin/pipeline/trigger/{video_id}` | Trigger pipeline for video | +| POST | `/admin/pipeline/clean-retrigger/{video_id}` | Wipe output + reprocess | +| POST | `/admin/pipeline/revoke/{video_id}` | Revoke active pipeline task | +| POST | `/admin/pipeline/rerun-stage/{video_id}` | Re-run specific stage | +| GET | `/admin/pipeline/events` | Pipeline event log | +| GET | `/admin/pipeline/runs` | Pipeline run history | +| GET | `/admin/pipeline/chunking-inspector/{video_id}` | Inspect chunking results | +| GET | `/admin/pipeline/embed-status` | Embedding/Qdrant health | +| GET | `/admin/pipeline/debug-mode` | Get debug mode state | +| POST | `/admin/pipeline/debug-mode` | Set debug mode state | +| GET | `/admin/pipeline/token-summary` | Token usage summary | +| GET | `/admin/pipeline/stale-pages` | Pages needing regeneration | +| POST | `/admin/pipeline/bulk-resynthesize` | Regenerate all technique pages | +| POST | `/admin/pipeline/wipe-all-output` | Delete all pipeline output | +| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization | +| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index | +| GET | `/admin/pipeline/worker-status` | Celery worker health | +| GET | `/admin/pipeline/recent-activity` | Recent pipeline events | +| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile | +| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar | + +## Other Endpoints (2) + +| Method | Path | Notes | +|--------|------|-------| +| POST | `/api/v1/ingest` | Transcript upload | +| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) | + +## Response Conventions + +**Standard paginated response:** +```json +{ + "items": [...], + "total": 83, + "offset": 0, + "limit": 20 +} +``` + +**Known inconsistencies:** +- `GET /topics` returns bare list instead of paginated dict +- `GET /videos` returns bare list instead of paginated dict +- Search uses `items` key (not `results`) +- `/techniques/random` returns JSON `{slug}` (not HTTP redirect) + +**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.** + +## Authentication + +No authentication on any endpoint. Admin routes (`/admin/*`) are accessible to anyone with network access. Phase 2 will add auth middleware (see [[Decisions]] D033). + +--- + +*See also: [[Architecture]], [[Data-Model]], [[Frontend]]* diff --git a/Architecture.md b/Architecture.md new file mode 100644 index 0000000..e6d3d98 --- /dev/null +++ b/Architecture.md @@ -0,0 +1,84 @@ +# Architecture + +## System Overview + +Chrysopedia is a self-hosted music production knowledge base that synthesizes technique articles from video transcripts using a 6-stage LLM pipeline. It runs as a Docker Compose stack on `ub01` with 8 containers. + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ ub01 (10.0.0.10) │ +│ Docker Compose: xpltd_chrysopedia Subnet: 172.32.0.0/24 │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ +│ │ nginx │ │ FastAPI │ │ Celery │ │ Watcher │ │ +│ │ :8096 │─▶│ :8000 │ │ Worker │ │ (PollingObs) │ │ +│ └──────────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │ +│ │ │ │ │ +│ ┌────────────┼─────────────┼────────────────┘ │ +│ ▼ ▼ ▼ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ +│ │ Postgres │ │ Redis │ │ Qdrant │ │ Ollama │ │ +│ │ :5433 │ │ :6379 │ │ :6333 │ │ :11434 │ │ +│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + ▲ + │ nginx reverse proxy +┌────────┴────────┐ +│ nuc01 (10.0.0.9)│ +│ chrysopedia.com │ +│ :443 → :8096 │ +└─────────────────┘ +``` + +## Key Architectural Characteristics + +- **Zero external frontend dependencies** beyond React, react-router-dom, and Vite +- **Monolithic CSS** — 5,820 lines, single file, BEM naming, 77 custom properties +- **No authentication** — admin routes are network-access-controlled only +- **Dual SQLAlchemy strategy** — async engine for FastAPI request handlers, sync engine for Celery pipeline tasks (D004) +- **Non-blocking pipeline side effects** — embedding/Qdrant failures don't block page synthesis (D005) + +## Docker Services + +| Service | Image | Container | Port | Volume | +|---------|-------|-----------|------|--------| +| PostgreSQL 16 | postgres:16-alpine | chrysopedia-db | 5433:5432 | chrysopedia_postgres_data | +| Redis 7 | redis:7-alpine | chrysopedia-redis | 6379 (internal) | — | +| Qdrant 1.13.2 | qdrant/qdrant:v1.13.2 | chrysopedia-qdrant | 6333 (internal) | chrysopedia_qdrant_data | +| Ollama | ollama/ollama:latest | chrysopedia-ollama | 11434 (internal) | chrysopedia_ollama_data | +| API (FastAPI) | Dockerfile.api | chrysopedia-api | 8000 (internal) | Bind: backend/, prompts/ | +| Worker (Celery) | Dockerfile.api | chrysopedia-worker | — | Bind: backend/, prompts/ | +| Watcher | Dockerfile.api | chrysopedia-watcher | — | Bind: watch dir | +| Web (nginx) | Dockerfile.web | chrysopedia-web-8096 | 8096:80 | — | + +## Network Topology + +- **Compose subnet:** 172.32.0.0/24 (D015) +- **External access:** nginx on nuc01 (10.0.0.9) reverse-proxies to ub01:8096 +- **DNS:** AdGuard Home rewrites chrysopedia.com → 10.0.0.9 +- **Internal services** (Redis, Qdrant, Ollama) are not exposed outside the Docker network + +## Tech Stack + +| Layer | Technology | +|-------|-----------| +| Frontend | React 18 + TypeScript + Vite | +| Backend | FastAPI + Celery + SQLAlchemy (async) | +| Database | PostgreSQL 16 | +| Cache/Broker | Redis 7 (Celery broker + review mode toggle + classification cache) | +| Vector Store | Qdrant 1.13.2 | +| Embeddings | Ollama (nomic-embed-text) via OpenAI-compatible /v1/embeddings | +| LLM | OpenAI-compatible API — DGX Sparks Qwen primary, local Ollama fallback | +| Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 | + +## Data Flow + +1. **Ingestion:** Video files → Whisper transcription (desktop, RTX 4090) → JSON transcript +2. **Upload:** Transcript JSON dropped into watch folder or POSTed to `/api/v1/ingest` +3. **Pipeline:** 6 Celery stages process each video (see [[Pipeline]]) +4. **Storage:** Technique pages + key moments → PostgreSQL, embeddings → Qdrant +5. **Serving:** React SPA fetches from FastAPI, search queries hit Qdrant then PostgreSQL fallback + +--- + +*See also: [[Deployment]], [[Pipeline]], [[Data-Model]]* diff --git a/Data-Model.md b/Data-Model.md new file mode 100644 index 0000000..c152800 --- /dev/null +++ b/Data-Model.md @@ -0,0 +1,135 @@ +# Data Model + +13 SQLAlchemy models in `backend/models.py`. + +## Entity Relationship Overview + +``` +Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment + │ │ + │ └──→ (N) KeyMoment + │ + └──→ (N) TechniquePage (M) ←──→ (N) Tag + │ + ├──→ (N) TechniquePageVersion + ├──→ (N) RelatedTechniqueLink + └──→ (M:N) SourceVideo (via TechniquePageVideo) +``` + +## Core Content Models + +### Creator + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| name | String | Unique, from folder name | +| slug | String | URL-safe, unique | +| genres | ARRAY(String) | e.g. ["dubstep", "sound design"] | +| avatar_url | String | Optional | +| bio | Text | Admin-editable | +| social_links | JSONB | Platform → URL mapping | +| featured | Boolean | For homepage spotlight | + +### SourceVideo + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| creator_id | FK → Creator | | +| filename | String | Original video filename | +| youtube_url | String | Optional | +| folder_name | String | Filesystem folder name | +| processing_status | Enum | queued / in_progress / complete / errored / revoked | +| pipeline_stage | Integer | Current/last completed stage (1-6) | + +### TranscriptSegment + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| source_video_id | FK → SourceVideo | | +| start_time | Float | Seconds | +| end_time | Float | Seconds | +| text | Text | Segment transcript text | + +### KeyMoment + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| source_video_id | FK → SourceVideo | | +| title | String | | +| summary | Text | | +| start_time | Float | Seconds | +| end_time | Float | Seconds | +| topic_category | String | e.g. "Sound Design" | +| topic_tags | ARRAY(String) | | +| content_type | Enum | tutorial / tip / exploration / walkthrough | +| review_status | String | pending / approved / rejected | + +### TechniquePage + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| creator_id | FK → Creator | | +| title | String | | +| slug | String | Unique, URL-safe | +| summary | Text | | +| body_sections | JSONB | v1: dict, v2: list-of-objects with nesting (D024) | +| body_sections_format | String | "v1" or "v2" — format discriminator | +| signal_chains | JSONB | Signal flow descriptions | +| plugins | ARRAY(String) | Referenced plugins/VSTs | +| topic_category | String | | +| topic_tags | ARRAY(String) | | +| source_quality | Enum | high / medium / low | +| view_count | Integer | | + +### TechniquePageVersion + +| Field | Type | Notes | +|-------|------|-------| +| id | Integer PK | | +| technique_page_id | FK → TechniquePage | | +| version_number | Integer | Sequential | +| content_snapshot | JSONB | Full page state at version time | +| pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config | + +## Supporting Models + +| Model | Purpose | +|-------|---------| +| **RelatedTechniqueLink** | Directed link between technique pages (source → target with label) | +| **Tag** | Normalized tag with M:N join to TechniquePage via `technique_page_tags` | +| **TechniquePageVideo** | Join table: TechniquePage ↔ SourceVideo (multi-source pages) | +| **ContentReport** | User-submitted content reports with status workflow (open/acknowledged/resolved/dismissed) | +| **SearchLog** | Query logging for popular searches feature (D025) | +| **PipelineRun** | Pipeline execution tracking per video with status and trigger type | +| **PipelineEvent** | Granular pipeline stage events with token counts and JSONB payload | + +## Enums + +| Enum | Values | +|------|--------| +| ContentType | tutorial, tip, exploration, walkthrough | +| ProcessingStatus | queued, in_progress, complete, errored, revoked | +| KeyMomentContentType | technique, concept, workflow, reference | +| SourceQuality | high, medium, low | +| RelationshipType | related, prerequisite, builds_on | +| ReportType | inaccuracy, missing_info, offensive, other | +| ReportStatus | open, acknowledged, resolved, dismissed | +| PipelineRunStatus | pending, running, completed, failed, revoked | +| PipelineRunTrigger | auto, manual, retrigger, clean_retrigger | + +## Schema Notes + +- **No Alembic migrations** — schema changes currently require manual DDL +- **body_sections_format** discriminator enables v1/v2 format coexistence (D024) +- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue +- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns +- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002) + +--- + +*See also: [[Architecture]], [[API-Surface]], [[Pipeline]]* diff --git a/Decisions.md b/Decisions.md new file mode 100644 index 0000000..0eee8d5 --- /dev/null +++ b/Decisions.md @@ -0,0 +1,45 @@ +# Decisions + +Architectural and pattern decisions made during Chrysopedia development. Append-only — to reverse a decision, add a new entry that supersedes it. + +## Architecture Decisions + +| # | When | Decision | Choice | Rationale | +|---|------|----------|--------|-----------| +| D001 | — | Storage layer selection | PostgreSQL + Qdrant + local filesystem | PostgreSQL for JSONB, Qdrant already running on hypervisor, filesystem for transcript JSON | +| D002 | — | Timestamp handling (asyncpg) | `datetime.now(timezone.utc).replace(tzinfo=None)` | asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns | +| D004 | — | Sync vs async in Celery tasks | Sync openai, QdrantClient, SQLAlchemy in Celery | Avoids nested event loop errors with gevent/eventlet workers | +| D005 | — | Embedding failure handling | Non-blocking — log errors, don't fail pipeline | Qdrant may be unreachable; core output (PostgreSQL) is preserved | +| D007 | M001/S04 | Review mode toggle persistence | Redis key `chrysopedia:review_mode` | Redis already in stack; simpler than DB table for single boolean | +| D009 | M001/S05 | Search service pattern | Separate async SearchService for FastAPI | Keeps sync pipeline clients untouched; 300ms timeout + keyword fallback | +| D015 | M002/S01 | Docker network subnet | 172.32.0.0/24 | 172.24.0.0/24 was taken by xpltd_docs_default | +| D016 | M002/S01 | Embedding service | Ollama container (nomic-embed-text) | OpenWebUI doesn't serve /v1/embeddings | +| D017 | — | CSS theming | 77 semantic custom properties, cyan accent | Full variable-based palette for consistency and future theme switching | +| D018 | M004/S04 | Version snapshot failure handling | Best-effort — failure doesn't block page update | Follows D005 pattern for non-critical side effects | +| D019 | M005/S02 | Technique page layout | CSS grid 2-column (1fr + 22rem sidebar), 64rem max-width | Collapses at 768px; accommodates prose + sidebar | +| D023 | M012/S01 | Qdrant embedding text enrichment | Prepend creator_name, join topic_tags | Enables creator-name and tag-specific semantic search | +| D024 | M014/S01 | Sections with subsections content model | Empty-string content for parent sections | Avoids duplication; substance lives in subsection content fields | +| D025 | M015 | Search query storage | PostgreSQL search_log + Redis cache (5-min TTL) | Full history for analytics; Redis prevents DB hit on every homepage load | + +## Phase 2 Decisions + +| # | Decision | Choice | Rationale | +|---|----------|--------|-----------| +| D031 | Phase 2 milestone structure | 8 milestones (M018–M025) with parallel frontend/backend slices | Maps to Sprint 0-8 plan; deploy gate per milestone | +| D032 | RAG framework | LightRAG + Qdrant + NetworkX (MVP) | Graph-enhanced retrieval; supports existing Qdrant; incremental updates | +| D033 | Monetization | Demo build with "Coming Soon" placeholders | Recruit creators first; Stripe Connect deferred to Phase 3 | +| D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 | +| D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure | + +## UI/UX Decisions + +| # | Decision | Choice | +|---|----------|--------| +| D014 | Creator equity | Random default sort; no creator privileged | +| D020 | Topics card differentiation | 3px colored left border + dot | +| D021 | M011 findings triage | 12/16 approved; denied beginner paths, YouTube links, hide admin, CTA label | +| D030 | ToC scroll-spy rootMargin | `0px 0px -70% 0px` — active when in top 30% of viewport | + +--- + +*See also: [[Architecture]], [[Development-Guide]]* diff --git a/Deployment.md b/Deployment.md new file mode 100644 index 0000000..6715e4f --- /dev/null +++ b/Deployment.md @@ -0,0 +1,130 @@ +# Deployment + +## Quick Reference + +```bash +# SSH to ub01 +ssh ub01 +cd /vmPool/r/repos/xpltdco/chrysopedia + +# Standard deploy +git pull +docker compose build && docker compose up -d + +# Run migrations (if Alembic is configured) +docker exec chrysopedia-api alembic upgrade head + +# View logs +docker logs -f chrysopedia-api +docker logs -f chrysopedia-worker +docker logs -f chrysopedia-watcher + +# Check status +docker ps --filter name=chrysopedia +``` + +## File Layout on ub01 + +``` +/vmPool/r/ +├── repos/xpltdco/chrysopedia/ # Git repo (source code) +├── compose/xpltd_chrysopedia/ # Symlink to repo's docker-compose.yml +├── services/ +│ ├── chrysopedia_postgres_data/ # PostgreSQL data +│ ├── chrysopedia_qdrant_data/ # Qdrant vector data +│ ├── chrysopedia_ollama_data/ # Ollama model cache +│ └── chrysopedia_watch/ # Watcher input directory +│ ├── processed/ # Successfully ingested transcripts +│ └── failed/ # Failed transcripts + .error sidecars +``` + +## Docker Compose Configuration + +- **Project name:** `xpltd_chrysopedia` +- **Network:** `chrysopedia-net` (172.32.0.0/24) +- **Compose file:** `/vmPool/r/repos/xpltdco/chrysopedia/docker-compose.yml` + +### Build Args / Environment + +Frontend build-time constants are injected via Docker build args: + +```yaml +build: + args: + VITE_APP_VERSION: ${APP_VERSION:-0.1.0} + VITE_GIT_COMMIT: ${GIT_COMMIT:-unknown} +``` + +**Important:** `ARG` → `ENV` → `RUN npm run build` ordering matters in the Dockerfile. The `ENV` line must appear before the build step. + +### Service Dependencies + +``` +chrysopedia-web-8096 → chrysopedia-api → chrysopedia-db, chrysopedia-redis +chrysopedia-worker → chrysopedia-db, chrysopedia-redis, chrysopedia-qdrant, chrysopedia-ollama +chrysopedia-watcher → chrysopedia-api +``` + +## Healthchecks + +| Service | Healthcheck | Notes | +|---------|------------|-------| +| PostgreSQL | `pg_isready` | Built-in | +| Redis | `redis-cli ping` | Built-in | +| Qdrant | `bash -c 'echo > /dev/tcp/localhost/6333'` | No curl available | +| Ollama | `ollama list` | Built-in CLI | +| API | `curl -f http://localhost:8000/health` | | +| Worker | `celery -A worker inspect ping` | Not HTTP | +| Watcher | `python -c "import os; os.kill(1, 0)"` | Slim image, no pgrep | + +## nginx Reverse Proxy + +On nuc01 (10.0.0.9): +- Server block proxies chrysopedia.com → ub01:8096 +- SSL via Certbot (Let's Encrypt) +- SPA fallback: all paths return index.html + +**Stale DNS after rebuild:** If API container is rebuilt, restart nginx container to pick up new internal IP: +```bash +docker compose restart chrysopedia-web-8096 +``` + +## Rebuilding After Code Changes + +```bash +# Full rebuild (backend + frontend) +cd /vmPool/r/repos/xpltdco/chrysopedia +git pull +docker compose build && docker compose up -d + +# Frontend only +docker compose build chrysopedia-web-8096 && docker compose up -d chrysopedia-web-8096 + +# Backend only (API + Worker share same image) +docker compose build chrysopedia-api && docker compose up -d chrysopedia-api chrysopedia-worker + +# Restart without rebuild +docker compose restart chrysopedia-api chrysopedia-worker +``` + +## Port Mapping + +| Service | Container Port | Host Port | Binding | +|---------|---------------|-----------|---------| +| PostgreSQL | 5432 | 5433 | 0.0.0.0 | +| Web (nginx) | 80 | 8096 | 0.0.0.0 | +| SSH (Forgejo) | 22 | 2222 | 0.0.0.0 | + +All other services (Redis, Qdrant, Ollama, API, Worker) are internal-only. + +## Monitoring + +- **Web UI:** http://ub01:8096 +- **API Health:** http://ub01:8096/health +- **Pipeline Admin:** http://ub01:8096/admin/pipeline +- **Worker Status:** http://ub01:8096/admin/pipeline (shows Celery worker count) +- **PostgreSQL:** Connect via `psql -h ub01 -p 5433 -U chrysopedia` + +--- + +*See also: [[Architecture]], [[Development-Guide]]* diff --git a/Development-Guide.md b/Development-Guide.md new file mode 100644 index 0000000..579f271 --- /dev/null +++ b/Development-Guide.md @@ -0,0 +1,134 @@ +# Development Guide + +## Getting Started + +### Prerequisites +- Docker + Docker Compose +- Node.js 18+ (for frontend dev) +- Python 3.11+ (for backend dev) +- SSH access to ub01 + +### Local Development + +The simplest approach is working directly on ub01: + +```bash +ssh ub01 +cd /vmPool/r/repos/xpltdco/chrysopedia +``` + +For frontend-only work, you can run Vite locally and proxy to the remote API: + +```bash +cd frontend +npm install +npm run dev # Vite dev server with /api proxy to localhost:8001 +``` + +## Project Structure + +``` +chrysopedia/ +├── backend/ +│ ├── config.py # Settings (env vars, LRU cached) +│ ├── database.py # Async SQLAlchemy engine + session +│ ├── main.py # FastAPI app, router registration +│ ├── models.py # All 13 SQLAlchemy models +│ ├── schemas.py # Pydantic request/response schemas +│ ├── search_service.py # Async search (Qdrant + keyword fallback) +│ ├── redis_client.py # Async Redis client +│ ├── watcher.py # Transcript folder watcher +│ ├── routers/ # FastAPI route handlers +│ ├── pipeline/ # Celery pipeline stages +│ │ ├── stages.py # Stage implementations +│ │ └── quality/ # Prompt quality toolkit +│ ├── services/ # Business logic services +│ └── tests/ # pytest test suite +├── frontend/ +│ └── src/ +│ ├── App.tsx # Routes, layout +│ ├── App.css # All styles (5,820 lines) +│ ├── main.tsx # React entry point +│ ├── api/ # API client (public-client.ts) +│ ├── pages/ # Page components (11) +│ ├── components/ # Shared components (11+) +│ ├── hooks/ # Custom hooks (3) +│ └── utils/ # Utilities (citations, slugs) +├── prompts/ # LLM prompt templates +├── alembic/ # DB migrations (if configured) +├── docker-compose.yml +├── Dockerfile.api +├── Dockerfile.web +└── CLAUDE.md # AI agent development reference +``` + +## Common Gotchas + +### asyncpg Timestamp Errors +Use `datetime.now(timezone.utc).replace(tzinfo=None)` for all timestamp defaults. asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns. + +### SQLAlchemy Column Name Conflicts +Never name a column `relationship`, `query`, or `metadata` — these shadow ORM functions. Use `from sqlalchemy.orm import relationship as sa_relationship` if the schema requires it. + +### Vite Build Constants +Always wrap with `JSON.stringify()`: `define: { __APP_VERSION__: JSON.stringify(version) }`. Without it, the built code gets unquoted values (syntax error). + +### Docker ARG/ENV Ordering +`ARG VITE_FOO=default` → `ENV VITE_FOO=$VITE_FOO` → `RUN npm run build`. The ENV line must appear before the build step. + +### Slim Docker Images +`python:3.x-slim` doesn't include `procps` (no `pgrep`, `ps`). Use `python -c "import os; os.kill(1, 0)"` for healthchecks. + +### Host Port 8000 Conflict +Port 8000 on ub01 may be used by kerf-engine. Use 8001 for local testing, or ensure kerf-engine is stopped. + +### Nginx Stale DNS +After rebuilding API container, restart the web container: `docker compose restart chrysopedia-web-8096`. + +### ZFS Filesystem Watchers +Use `watchdog.observers.polling.PollingObserver` instead of the default inotify observer — inotify doesn't reliably detect changes on ZFS/NFS. + +### File Stability for SCP Uploads +Wait for file size stability (check twice with 2-second gap) before processing files received via SCP/rsync. + +## Testing + +```bash +cd backend +python -m pytest tests/ -v +``` + +Tests use: +- `NullPool` for async engine (prevents connection pool contention) +- Module-level patching for Celery stage globals (`_engine`, `_SessionLocal`) +- `patch('pipeline.stages.run_pipeline')` for lazy import mocking (not at the router level) + +## Adding New Features + +### New API Endpoint +1. Create router in `backend/routers/foo.py` with `APIRouter(prefix="/foo", tags=["foo"])` +2. Register in `backend/main.py`: `app.include_router(foo.router, prefix="/api/v1")` +3. Define schemas in `backend/schemas.py` +4. Use paginated response: `{items, total, offset, limit}` + +### New Frontend Route +1. Add `` to `App.tsx` +2. Create page component in `frontend/src/pages/` +3. Call `useDocumentTitle()` in the component +4. Add API functions to `public-client.ts` + +### New Database Model +1. Add to `backend/models.py` +2. Add schemas to `backend/schemas.py` +3. Apply DDL manually or via Alembic migration +4. Use `_now()` helper for timestamp defaults + +### New CSS +1. Append to `App.css` using BEM naming +2. Use CSS custom properties for all colors +3. Prefer 768px breakpoint for mobile/desktop split +4. Namespace Phase 2 selectors: `.p2-feature__element` + +--- + +*See also: [[Architecture]], [[Frontend]], [[Deployment]]* diff --git a/Frontend.md b/Frontend.md new file mode 100644 index 0000000..80d9627 --- /dev/null +++ b/Frontend.md @@ -0,0 +1,110 @@ +# Frontend + +React 18 + TypeScript + Vite SPA. No UI library, no state management library, no CSS framework. + +## Route Map + +| Route | Page Component | Auth | Notes | +|-------|---------------|------|-------| +| `/` | Home | Public | Hero search, stats counters, popular topics, nav cards | +| `/search` | SearchResults | Public | Sort, highlights, partial matches | +| `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations | +| `/creators` | CreatorsBrowse | Public | Random default sort, genre filters | +| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list | +| `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics | +| `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques | +| `/about` | About | Public | Static project info | +| `/admin/reports` | AdminReports | Admin* | Content reports | +| `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management | +| `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin | +| `*` | → Redirect `/` | — | SPA fallback | + +*Admin routes have no authentication gate. + +**Routing:** All routes in a single `` block in `App.tsx`. nginx returns the SPA shell for all paths; react-router-dom v6 handles client-side routing. + +## Shared Components + +| Component | Purpose | +|-----------|---------| +| SearchAutocomplete | Global search with Ctrl+Shift+F shortcut (nav + mobile instances) | +| AdminDropdown | Hover-open at desktop, tap-toggle on mobile | +| AppFooter | Version, build date, GitHub link | +| TableOfContents | Sticky sidebar ToC with IntersectionObserver scroll-spy | +| SortDropdown | Reusable sort selector | +| TagList | Tag/badge pills with +N overflow | +| CategoryIcons | SVG icons per topic category | +| CreatorAvatar | Avatar with fallback | +| CopyLinkButton | Clipboard copy with tooltip | +| SocialIcons | Social media link icons (9 platforms) | +| ReportIssueModal | Content report submission | + +## Hooks + +| Hook | Purpose | +|------|---------| +| useCountUp | Animated counter for homepage stats | +| useSortPreference | Persists sort preference in localStorage | +| useDocumentTitle | Sets `` per page (all 10 pages instrumented) | + +## State Management + +Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library. + +## API Client + +Single module `public-client.ts` (~600 lines) with typed `request<T>` helper. Relative `/api/v1` base URL (nginx proxies to API container). All response TypeScript interfaces defined in the same file. + +## CSS Architecture + +| Property | Value | +|----------|-------| +| File | `frontend/src/App.css` | +| Lines | 5,820 | +| Unique classes | ~589 | +| Naming | BEM (`block__element--modifier`) | +| Theme | Dark-only (no light mode) | +| Custom properties | 77 in `:root` (D017) | +| Accent color | Cyan `#22d3ee` | +| Font stack | System fonts | +| Preprocessor | None | +| CSS Modules | None | + +### Custom Property Categories (77 total) + +- **Surface colors:** page background, card backgrounds, nav, footer, input +- **Text colors:** primary, secondary, muted, inverse, link, heading +- **Accent colors:** primary cyan, hover/active, focus rings +- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories +- **Status colors:** Success/warning/error/info +- **Border colors:** Default, hover, focus, divider +- **Shadow colors:** Elevation, glow effects +- **Overlay colors:** Modal/dropdown overlays + +### Breakpoints + +| Breakpoint | Usage | +|-----------|-------| +| 480px | Narrow mobile — compact cards | +| 600px | Wider mobile — grid adjustments | +| 640px | Small tablet — content width | +| 768px | Desktop ↔ mobile transition — sidebar collapse | + +### Layout Patterns + +- **Page max-width:** 64rem (D019) +- **Technique page:** CSS grid 2-column (1fr + 22rem sidebar), collapses at 768px +- **Card layouts:** CSS grid with `auto-fill, minmax(...)` for responsive grids +- **Collapsible sections:** `grid-template-rows: 0fr/1fr` animation +- **Sticky elements:** ToC sidebar, reading header + +## Build + +- **Bundler:** Vite +- **Build-time constants:** `__APP_VERSION__`, `__BUILD_DATE__`, `__GIT_COMMIT__` via `define` (must use `JSON.stringify`) +- **Dev proxy:** `/api` → `localhost:8001` +- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container + +--- + +*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]* diff --git a/Pipeline.md b/Pipeline.md new file mode 100644 index 0000000..3500e97 --- /dev/null +++ b/Pipeline.md @@ -0,0 +1,108 @@ +# Pipeline + +6-stage LLM-powered extraction pipeline that transforms video transcripts into structured technique articles. + +## Pipeline Stages + +``` +Video File + ↓ +[Desktop] Whisper large-v3 (RTX 4090) → transcript JSON + ↓ +[Watcher/API] Ingest → SourceVideo + TranscriptSegments in PostgreSQL + ↓ +Stage 1: Transcript Segmentation — chunk transcript into logical segments + ↓ +Stage 2: Key Moment Extraction — identify teachable moments with timestamps + ↓ +Stage 3: (reserved) + ↓ +Stage 4: Classification & Tagging — assign topic_category + topic_tags per moment + ↓ +Stage 5: Technique Page Synthesis — compose study guide articles from moments + ↓ +Stage 6: Embed & Index — generate embeddings, upsert to Qdrant (non-blocking) +``` + +## Stage Details + +### Stage 1: Transcript Segmentation +- Chunks raw transcript into logical segments +- Input: TranscriptSegments from DB +- Output: Segmented data for stage 2 + +### Stage 2: Key Moment Extraction +- Identifies teachable moments with titles, summaries, timestamps +- Uses LLM with prompt template from `prompts/` directory +- Output: KeyMoment records in PostgreSQL + +### Stage 4: Classification & Tagging +- Assigns topic_category and topic_tags to each key moment +- References canonical tag list (`canonical_tags.yaml`) with aliases +- Output: Classification data stored in Redis (`chrysopedia:classification:{video_id}`, 24h TTL) + +### Stage 5: Technique Page Synthesis +- Composes study guide articles from classified key moments +- Handles multi-source merging: new video moments merge into existing technique pages +- Uses offset-based citation indexing (existing [0]-[N-1], new [N]-[N+M-1]) +- Creates pre-overwrite version snapshot before mutating existing pages (D018) +- Output: TechniquePage records with body_sections (v2 format), signal_chains, plugins + +### Stage 6: Embed & Index +- Generates embeddings via Ollama (nomic-embed-text) +- Embedding text enriched with creator_name and topic_tags (D023) +- Upserts to Qdrant with deterministic UUIDs based on content +- **Non-blocking:** Failures log WARNING but don't fail the pipeline (D005) +- Can be re-triggered independently via `/admin/pipeline/reindex-all` + +## LLM Configuration + +| Setting | Value | +|---------|-------| +| Primary LLM | DGX Sparks Qwen (OpenAI-compatible API) | +| Fallback LLM | Local Ollama | +| Embedding model | nomic-embed-text (Ollama) | +| Model routing | Per-stage configuration (chat vs thinking models) | + +## Prompt Template System + +- Prompt files stored in `prompts/` directory (D013) +- Templates use XML-style content fencing +- Editable without code changes — pipeline reads from disk at runtime +- SHA-256 hashes tracked in TechniquePageVersion.pipeline_metadata for reproducibility +- Re-process after prompt edits via `POST /admin/pipeline/trigger/{video_id}` + +## Pipeline Admin Features + +- **Debug mode:** Redis-backed toggle captures full LLM I/O (system prompt, user prompt, response) in pipeline_events +- **Token tracking:** Per-event and per-video token usage visible in admin UI +- **Stale page detection:** Identifies pages needing regeneration +- **Bulk operations:** Bulk resynthesize, wipe all output, reindex all +- **Worker status:** Real-time Celery worker health check + +## Prompt Quality Toolkit + +CLI tool (`python -m pipeline.quality`) with: +- **LLM fitness suite** — 9 tests (Mandelbrot reasoning, JSON compliance, instruction following) +- **5-dimension quality scorer** with voice preservation dial +- **Automated prompt A/B optimization loop** — LLM-powered variant generation, iterative scoring, leaderboard +- **Multi-stage support** for pipeline stages 2-5 with per-stage rubrics and fixtures + +## Key Design Decisions + +- **Sync clients in Celery** (D004): openai.OpenAI, QdrantClient, sync SQLAlchemy. Avoids nested event loop errors. +- **Non-blocking embedding** (D005): Stage 6 failures don't block core pipeline output. +- **Redis for stage 4 data**: Classification results in Redis with 24h TTL, not DB columns. +- **Best-effort versioning** (D018): Version snapshot failure doesn't block page update. + +## Transcript Watcher + +Standalone service (`watcher.py`) monitors `/vmPool/r/services/chrysopedia_watch/` for new transcript JSON files: +- Uses `watchdog.observers.polling.PollingObserver` for ZFS reliability +- Validates file structure, waits for size stability (handles partial SCP writes) +- POSTs to ingest API on file detection +- Moves processed files to `processed/`, failures to `failed/` with `.error` sidecar + +--- + +*See also: [[Architecture]], [[Data-Model]], [[Deployment]]* diff --git a/_Sidebar.md b/_Sidebar.md new file mode 100644 index 0000000..6f5f9d6 --- /dev/null +++ b/_Sidebar.md @@ -0,0 +1,17 @@ +### Chrysopedia Wiki + +- [[Home]] + +**Architecture** +- [[Architecture]] +- [[Data-Model]] +- [[Pipeline]] + +**Reference** +- [[API-Surface]] +- [[Frontend]] +- [[Decisions]] + +**Operations** +- [[Deployment]] +- [[Development-Guide]]