docs: bootstrap wiki with architecture documentation (M018/S02)
Pages: Architecture, Data Model, API Surface, Frontend, Pipeline, Deployment, Development Guide, Decisions, plus sidebar navigation. Content derived from Site Audit Report (M018/S01), PROJECT.md, DECISIONS.md, KNOWLEDGE.md, and source code analysis.
parent
f6b00a0c53
commit
081b39f767
9 changed files with 866 additions and 0 deletions
103
API-Surface.md
Normal file
103
API-Surface.md
Normal file
|
|
@ -0,0 +1,103 @@
|
||||||
|
# API Surface
|
||||||
|
|
||||||
|
41 API endpoints grouped by domain. All served by FastAPI under `/api/v1/`.
|
||||||
|
|
||||||
|
## Public Endpoints (10)
|
||||||
|
|
||||||
|
| Method | Path | Response Shape | Notes |
|
||||||
|
|--------|------|---------------|-------|
|
||||||
|
| GET | `/health` | `{status, service, version, database}` | Health check |
|
||||||
|
| GET | `/api/v1/stats` | `{technique_count, creator_count}` | Homepage stats |
|
||||||
|
| GET | `/api/v1/search?q=` | `{items, partial_matches, total, query, fallback_used}` | Semantic + keyword fallback (D009) |
|
||||||
|
| GET | `/api/v1/search/suggestions?q=` | `{suggestions: [{text, type}]}` | Typeahead autocomplete |
|
||||||
|
| GET | `/api/v1/search/popular` | `{items: [{query, count}]}` | Popular searches (D025) |
|
||||||
|
| GET | `/api/v1/techniques?limit=&offset=` | `{items, total, offset, limit}` | Paginated technique list |
|
||||||
|
| GET | `/api/v1/techniques/random` | `{slug}` | Returns JSON slug (not redirect) |
|
||||||
|
| GET | `/api/v1/techniques/{slug}` | 22-field object | Full technique detail with relations |
|
||||||
|
| GET | `/api/v1/techniques/{slug}/versions` | `{items, total}` | Version history |
|
||||||
|
| GET | `/api/v1/techniques/{slug}/versions/{n}` | Version detail | Single version |
|
||||||
|
|
||||||
|
### Technique Detail Fields (22)
|
||||||
|
|
||||||
|
title, slug, topic_category, topic_tags, summary, body_sections, body_sections_format, signal_chains, plugins, id, creator_id, creator_name, creator_slug, source_quality, view_count, key_moment_count, created_at, updated_at, key_moments, creator_info, related_links, version_count, source_videos.
|
||||||
|
|
||||||
|
## Browse Endpoints (5)
|
||||||
|
|
||||||
|
| Method | Path | Response Shape | Notes |
|
||||||
|
|--------|------|---------------|-------|
|
||||||
|
| GET | `/api/v1/creators?sort=&genre=` | `{items, total, offset, limit}` | sort: random\|alpha\|views |
|
||||||
|
| GET | `/api/v1/creators/{slug}` | 16-field object | Includes genre_breakdown, techniques, social_links |
|
||||||
|
| GET | `/api/v1/topics` | `[{name, description, sub_topics}]` | ⚠️ Bare list (not paginated) |
|
||||||
|
| GET | `/api/v1/topics/{cat}/{sub}` | `{items, total, offset, limit}` | Subtopic techniques |
|
||||||
|
| GET | `/api/v1/topics/{cat}` | `{items, total, offset, limit}` | Category techniques |
|
||||||
|
|
||||||
|
## Report Endpoints (3)
|
||||||
|
|
||||||
|
| Method | Path | Purpose |
|
||||||
|
|--------|------|---------|
|
||||||
|
| POST | `/api/v1/reports` | Submit content report |
|
||||||
|
| GET | `/api/v1/admin/reports` | List all reports |
|
||||||
|
| PATCH | `/api/v1/admin/reports/{id}` | Update report status |
|
||||||
|
|
||||||
|
## Pipeline Admin Endpoints (20+)
|
||||||
|
|
||||||
|
All under prefix `/api/v1/admin/pipeline/`.
|
||||||
|
|
||||||
|
| Method | Path | Purpose |
|
||||||
|
|--------|------|---------|
|
||||||
|
| GET | `/admin/pipeline/videos` | Paginated video list with pipeline status |
|
||||||
|
| POST | `/admin/pipeline/trigger/{video_id}` | Trigger pipeline for video |
|
||||||
|
| POST | `/admin/pipeline/clean-retrigger/{video_id}` | Wipe output + reprocess |
|
||||||
|
| POST | `/admin/pipeline/revoke/{video_id}` | Revoke active pipeline task |
|
||||||
|
| POST | `/admin/pipeline/rerun-stage/{video_id}` | Re-run specific stage |
|
||||||
|
| GET | `/admin/pipeline/events` | Pipeline event log |
|
||||||
|
| GET | `/admin/pipeline/runs` | Pipeline run history |
|
||||||
|
| GET | `/admin/pipeline/chunking-inspector/{video_id}` | Inspect chunking results |
|
||||||
|
| GET | `/admin/pipeline/embed-status` | Embedding/Qdrant health |
|
||||||
|
| GET | `/admin/pipeline/debug-mode` | Get debug mode state |
|
||||||
|
| POST | `/admin/pipeline/debug-mode` | Set debug mode state |
|
||||||
|
| GET | `/admin/pipeline/token-summary` | Token usage summary |
|
||||||
|
| GET | `/admin/pipeline/stale-pages` | Pages needing regeneration |
|
||||||
|
| POST | `/admin/pipeline/bulk-resynthesize` | Regenerate all technique pages |
|
||||||
|
| POST | `/admin/pipeline/wipe-all-output` | Delete all pipeline output |
|
||||||
|
| POST | `/admin/pipeline/optimize-prompt` | Trigger prompt optimization |
|
||||||
|
| POST | `/admin/pipeline/reindex-all` | Rebuild Qdrant index |
|
||||||
|
| GET | `/admin/pipeline/worker-status` | Celery worker health |
|
||||||
|
| GET | `/admin/pipeline/recent-activity` | Recent pipeline events |
|
||||||
|
| POST | `/admin/pipeline/creator-profile/{creator_id}` | Update creator profile |
|
||||||
|
| POST | `/admin/pipeline/avatar-fetch/{creator_id}` | Fetch creator avatar |
|
||||||
|
|
||||||
|
## Other Endpoints (2)
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|--------|------|-------|
|
||||||
|
| POST | `/api/v1/ingest` | Transcript upload |
|
||||||
|
| GET | `/api/v1/videos` | ⚠️ Bare list (not paginated) |
|
||||||
|
|
||||||
|
## Response Conventions
|
||||||
|
|
||||||
|
**Standard paginated response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"items": [...],
|
||||||
|
"total": 83,
|
||||||
|
"offset": 0,
|
||||||
|
"limit": 20
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Known inconsistencies:**
|
||||||
|
- `GET /topics` returns bare list instead of paginated dict
|
||||||
|
- `GET /videos` returns bare list instead of paginated dict
|
||||||
|
- Search uses `items` key (not `results`)
|
||||||
|
- `/techniques/random` returns JSON `{slug}` (not HTTP redirect)
|
||||||
|
|
||||||
|
**New endpoints should follow the `{items, total, offset, limit}` paginated pattern.**
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
No authentication on any endpoint. Admin routes (`/admin/*`) are accessible to anyone with network access. Phase 2 will add auth middleware (see [[Decisions]] D033).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[Data-Model]], [[Frontend]]*
|
||||||
84
Architecture.md
Normal file
84
Architecture.md
Normal file
|
|
@ -0,0 +1,84 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
## System Overview
|
||||||
|
|
||||||
|
Chrysopedia is a self-hosted music production knowledge base that synthesizes technique articles from video transcripts using a 6-stage LLM pipeline. It runs as a Docker Compose stack on `ub01` with 8 containers.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ ub01 (10.0.0.10) │
|
||||||
|
│ Docker Compose: xpltd_chrysopedia Subnet: 172.32.0.0/24 │
|
||||||
|
│ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
|
||||||
|
│ │ nginx │ │ FastAPI │ │ Celery │ │ Watcher │ │
|
||||||
|
│ │ :8096 │─▶│ :8000 │ │ Worker │ │ (PollingObs) │ │
|
||||||
|
│ └──────────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
|
||||||
|
│ │ │ │ │
|
||||||
|
│ ┌────────────┼─────────────┼────────────────┘ │
|
||||||
|
│ ▼ ▼ ▼ │
|
||||||
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
|
||||||
|
│ │ Postgres │ │ Redis │ │ Qdrant │ │ Ollama │ │
|
||||||
|
│ │ :5433 │ │ :6379 │ │ :6333 │ │ :11434 │ │
|
||||||
|
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
▲
|
||||||
|
│ nginx reverse proxy
|
||||||
|
┌────────┴────────┐
|
||||||
|
│ nuc01 (10.0.0.9)│
|
||||||
|
│ chrysopedia.com │
|
||||||
|
│ :443 → :8096 │
|
||||||
|
└─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Architectural Characteristics
|
||||||
|
|
||||||
|
- **Zero external frontend dependencies** beyond React, react-router-dom, and Vite
|
||||||
|
- **Monolithic CSS** — 5,820 lines, single file, BEM naming, 77 custom properties
|
||||||
|
- **No authentication** — admin routes are network-access-controlled only
|
||||||
|
- **Dual SQLAlchemy strategy** — async engine for FastAPI request handlers, sync engine for Celery pipeline tasks (D004)
|
||||||
|
- **Non-blocking pipeline side effects** — embedding/Qdrant failures don't block page synthesis (D005)
|
||||||
|
|
||||||
|
## Docker Services
|
||||||
|
|
||||||
|
| Service | Image | Container | Port | Volume |
|
||||||
|
|---------|-------|-----------|------|--------|
|
||||||
|
| PostgreSQL 16 | postgres:16-alpine | chrysopedia-db | 5433:5432 | chrysopedia_postgres_data |
|
||||||
|
| Redis 7 | redis:7-alpine | chrysopedia-redis | 6379 (internal) | — |
|
||||||
|
| Qdrant 1.13.2 | qdrant/qdrant:v1.13.2 | chrysopedia-qdrant | 6333 (internal) | chrysopedia_qdrant_data |
|
||||||
|
| Ollama | ollama/ollama:latest | chrysopedia-ollama | 11434 (internal) | chrysopedia_ollama_data |
|
||||||
|
| API (FastAPI) | Dockerfile.api | chrysopedia-api | 8000 (internal) | Bind: backend/, prompts/ |
|
||||||
|
| Worker (Celery) | Dockerfile.api | chrysopedia-worker | — | Bind: backend/, prompts/ |
|
||||||
|
| Watcher | Dockerfile.api | chrysopedia-watcher | — | Bind: watch dir |
|
||||||
|
| Web (nginx) | Dockerfile.web | chrysopedia-web-8096 | 8096:80 | — |
|
||||||
|
|
||||||
|
## Network Topology
|
||||||
|
|
||||||
|
- **Compose subnet:** 172.32.0.0/24 (D015)
|
||||||
|
- **External access:** nginx on nuc01 (10.0.0.9) reverse-proxies to ub01:8096
|
||||||
|
- **DNS:** AdGuard Home rewrites chrysopedia.com → 10.0.0.9
|
||||||
|
- **Internal services** (Redis, Qdrant, Ollama) are not exposed outside the Docker network
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
| Layer | Technology |
|
||||||
|
|-------|-----------|
|
||||||
|
| Frontend | React 18 + TypeScript + Vite |
|
||||||
|
| Backend | FastAPI + Celery + SQLAlchemy (async) |
|
||||||
|
| Database | PostgreSQL 16 |
|
||||||
|
| Cache/Broker | Redis 7 (Celery broker + review mode toggle + classification cache) |
|
||||||
|
| Vector Store | Qdrant 1.13.2 |
|
||||||
|
| Embeddings | Ollama (nomic-embed-text) via OpenAI-compatible /v1/embeddings |
|
||||||
|
| LLM | OpenAI-compatible API — DGX Sparks Qwen primary, local Ollama fallback |
|
||||||
|
| Deployment | Docker Compose on ub01, nginx reverse proxy on nuc01 |
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
1. **Ingestion:** Video files → Whisper transcription (desktop, RTX 4090) → JSON transcript
|
||||||
|
2. **Upload:** Transcript JSON dropped into watch folder or POSTed to `/api/v1/ingest`
|
||||||
|
3. **Pipeline:** 6 Celery stages process each video (see [[Pipeline]])
|
||||||
|
4. **Storage:** Technique pages + key moments → PostgreSQL, embeddings → Qdrant
|
||||||
|
5. **Serving:** React SPA fetches from FastAPI, search queries hit Qdrant then PostgreSQL fallback
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Deployment]], [[Pipeline]], [[Data-Model]]*
|
||||||
135
Data-Model.md
Normal file
135
Data-Model.md
Normal file
|
|
@ -0,0 +1,135 @@
|
||||||
|
# Data Model
|
||||||
|
|
||||||
|
13 SQLAlchemy models in `backend/models.py`.
|
||||||
|
|
||||||
|
## Entity Relationship Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
Creator (1) ──→ (N) SourceVideo (1) ──→ (N) TranscriptSegment
|
||||||
|
│ │
|
||||||
|
│ └──→ (N) KeyMoment
|
||||||
|
│
|
||||||
|
└──→ (N) TechniquePage (M) ←──→ (N) Tag
|
||||||
|
│
|
||||||
|
├──→ (N) TechniquePageVersion
|
||||||
|
├──→ (N) RelatedTechniqueLink
|
||||||
|
└──→ (M:N) SourceVideo (via TechniquePageVideo)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Content Models
|
||||||
|
|
||||||
|
### Creator
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| name | String | Unique, from folder name |
|
||||||
|
| slug | String | URL-safe, unique |
|
||||||
|
| genres | ARRAY(String) | e.g. ["dubstep", "sound design"] |
|
||||||
|
| avatar_url | String | Optional |
|
||||||
|
| bio | Text | Admin-editable |
|
||||||
|
| social_links | JSONB | Platform → URL mapping |
|
||||||
|
| featured | Boolean | For homepage spotlight |
|
||||||
|
|
||||||
|
### SourceVideo
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| creator_id | FK → Creator | |
|
||||||
|
| filename | String | Original video filename |
|
||||||
|
| youtube_url | String | Optional |
|
||||||
|
| folder_name | String | Filesystem folder name |
|
||||||
|
| processing_status | Enum | queued / in_progress / complete / errored / revoked |
|
||||||
|
| pipeline_stage | Integer | Current/last completed stage (1-6) |
|
||||||
|
|
||||||
|
### TranscriptSegment
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| source_video_id | FK → SourceVideo | |
|
||||||
|
| start_time | Float | Seconds |
|
||||||
|
| end_time | Float | Seconds |
|
||||||
|
| text | Text | Segment transcript text |
|
||||||
|
|
||||||
|
### KeyMoment
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| source_video_id | FK → SourceVideo | |
|
||||||
|
| title | String | |
|
||||||
|
| summary | Text | |
|
||||||
|
| start_time | Float | Seconds |
|
||||||
|
| end_time | Float | Seconds |
|
||||||
|
| topic_category | String | e.g. "Sound Design" |
|
||||||
|
| topic_tags | ARRAY(String) | |
|
||||||
|
| content_type | Enum | tutorial / tip / exploration / walkthrough |
|
||||||
|
| review_status | String | pending / approved / rejected |
|
||||||
|
|
||||||
|
### TechniquePage
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| creator_id | FK → Creator | |
|
||||||
|
| title | String | |
|
||||||
|
| slug | String | Unique, URL-safe |
|
||||||
|
| summary | Text | |
|
||||||
|
| body_sections | JSONB | v1: dict, v2: list-of-objects with nesting (D024) |
|
||||||
|
| body_sections_format | String | "v1" or "v2" — format discriminator |
|
||||||
|
| signal_chains | JSONB | Signal flow descriptions |
|
||||||
|
| plugins | ARRAY(String) | Referenced plugins/VSTs |
|
||||||
|
| topic_category | String | |
|
||||||
|
| topic_tags | ARRAY(String) | |
|
||||||
|
| source_quality | Enum | high / medium / low |
|
||||||
|
| view_count | Integer | |
|
||||||
|
|
||||||
|
### TechniquePageVersion
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| id | Integer PK | |
|
||||||
|
| technique_page_id | FK → TechniquePage | |
|
||||||
|
| version_number | Integer | Sequential |
|
||||||
|
| content_snapshot | JSONB | Full page state at version time |
|
||||||
|
| pipeline_metadata | JSONB | Prompt SHA-256 hashes, model config |
|
||||||
|
|
||||||
|
## Supporting Models
|
||||||
|
|
||||||
|
| Model | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| **RelatedTechniqueLink** | Directed link between technique pages (source → target with label) |
|
||||||
|
| **Tag** | Normalized tag with M:N join to TechniquePage via `technique_page_tags` |
|
||||||
|
| **TechniquePageVideo** | Join table: TechniquePage ↔ SourceVideo (multi-source pages) |
|
||||||
|
| **ContentReport** | User-submitted content reports with status workflow (open/acknowledged/resolved/dismissed) |
|
||||||
|
| **SearchLog** | Query logging for popular searches feature (D025) |
|
||||||
|
| **PipelineRun** | Pipeline execution tracking per video with status and trigger type |
|
||||||
|
| **PipelineEvent** | Granular pipeline stage events with token counts and JSONB payload |
|
||||||
|
|
||||||
|
## Enums
|
||||||
|
|
||||||
|
| Enum | Values |
|
||||||
|
|------|--------|
|
||||||
|
| ContentType | tutorial, tip, exploration, walkthrough |
|
||||||
|
| ProcessingStatus | queued, in_progress, complete, errored, revoked |
|
||||||
|
| KeyMomentContentType | technique, concept, workflow, reference |
|
||||||
|
| SourceQuality | high, medium, low |
|
||||||
|
| RelationshipType | related, prerequisite, builds_on |
|
||||||
|
| ReportType | inaccuracy, missing_info, offensive, other |
|
||||||
|
| ReportStatus | open, acknowledged, resolved, dismissed |
|
||||||
|
| PipelineRunStatus | pending, running, completed, failed, revoked |
|
||||||
|
| PipelineRunTrigger | auto, manual, retrigger, clean_retrigger |
|
||||||
|
|
||||||
|
## Schema Notes
|
||||||
|
|
||||||
|
- **No Alembic migrations** — schema changes currently require manual DDL
|
||||||
|
- **body_sections_format** discriminator enables v1/v2 format coexistence (D024)
|
||||||
|
- **topic_category casing** is inconsistent across records (e.g., "Sound design" vs "Sound Design") — known data quality issue
|
||||||
|
- **Stage 4 classification data** (per-moment topic_tags) stored in Redis with 24h TTL, not DB columns
|
||||||
|
- **Timestamp convention:** `datetime.now(timezone.utc).replace(tzinfo=None)` — asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns (D002)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[API-Surface]], [[Pipeline]]*
|
||||||
45
Decisions.md
Normal file
45
Decisions.md
Normal file
|
|
@ -0,0 +1,45 @@
|
||||||
|
# Decisions
|
||||||
|
|
||||||
|
Architectural and pattern decisions made during Chrysopedia development. Append-only — to reverse a decision, add a new entry that supersedes it.
|
||||||
|
|
||||||
|
## Architecture Decisions
|
||||||
|
|
||||||
|
| # | When | Decision | Choice | Rationale |
|
||||||
|
|---|------|----------|--------|-----------|
|
||||||
|
| D001 | — | Storage layer selection | PostgreSQL + Qdrant + local filesystem | PostgreSQL for JSONB, Qdrant already running on hypervisor, filesystem for transcript JSON |
|
||||||
|
| D002 | — | Timestamp handling (asyncpg) | `datetime.now(timezone.utc).replace(tzinfo=None)` | asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns |
|
||||||
|
| D004 | — | Sync vs async in Celery tasks | Sync openai, QdrantClient, SQLAlchemy in Celery | Avoids nested event loop errors with gevent/eventlet workers |
|
||||||
|
| D005 | — | Embedding failure handling | Non-blocking — log errors, don't fail pipeline | Qdrant may be unreachable; core output (PostgreSQL) is preserved |
|
||||||
|
| D007 | M001/S04 | Review mode toggle persistence | Redis key `chrysopedia:review_mode` | Redis already in stack; simpler than DB table for single boolean |
|
||||||
|
| D009 | M001/S05 | Search service pattern | Separate async SearchService for FastAPI | Keeps sync pipeline clients untouched; 300ms timeout + keyword fallback |
|
||||||
|
| D015 | M002/S01 | Docker network subnet | 172.32.0.0/24 | 172.24.0.0/24 was taken by xpltd_docs_default |
|
||||||
|
| D016 | M002/S01 | Embedding service | Ollama container (nomic-embed-text) | OpenWebUI doesn't serve /v1/embeddings |
|
||||||
|
| D017 | — | CSS theming | 77 semantic custom properties, cyan accent | Full variable-based palette for consistency and future theme switching |
|
||||||
|
| D018 | M004/S04 | Version snapshot failure handling | Best-effort — failure doesn't block page update | Follows D005 pattern for non-critical side effects |
|
||||||
|
| D019 | M005/S02 | Technique page layout | CSS grid 2-column (1fr + 22rem sidebar), 64rem max-width | Collapses at 768px; accommodates prose + sidebar |
|
||||||
|
| D023 | M012/S01 | Qdrant embedding text enrichment | Prepend creator_name, join topic_tags | Enables creator-name and tag-specific semantic search |
|
||||||
|
| D024 | M014/S01 | Sections with subsections content model | Empty-string content for parent sections | Avoids duplication; substance lives in subsection content fields |
|
||||||
|
| D025 | M015 | Search query storage | PostgreSQL search_log + Redis cache (5-min TTL) | Full history for analytics; Redis prevents DB hit on every homepage load |
|
||||||
|
|
||||||
|
## Phase 2 Decisions
|
||||||
|
|
||||||
|
| # | Decision | Choice | Rationale |
|
||||||
|
|---|----------|--------|-----------|
|
||||||
|
| D031 | Phase 2 milestone structure | 8 milestones (M018–M025) with parallel frontend/backend slices | Maps to Sprint 0-8 plan; deploy gate per milestone |
|
||||||
|
| D032 | RAG framework | LightRAG + Qdrant + NetworkX (MVP) | Graph-enhanced retrieval; supports existing Qdrant; incremental updates |
|
||||||
|
| D033 | Monetization | Demo build with "Coming Soon" placeholders | Recruit creators first; Stripe Connect deferred to Phase 3 |
|
||||||
|
| D034 | Documentation strategy | Forgejo wiki, KB slice at end of every milestone | Incremental docs stay current; final pass in M025 |
|
||||||
|
| D035 | File/object storage | MinIO (S3-compatible) self-hosted | Docker-native, signed URLs, fits existing infrastructure |
|
||||||
|
|
||||||
|
## UI/UX Decisions
|
||||||
|
|
||||||
|
| # | Decision | Choice |
|
||||||
|
|---|----------|--------|
|
||||||
|
| D014 | Creator equity | Random default sort; no creator privileged |
|
||||||
|
| D020 | Topics card differentiation | 3px colored left border + dot |
|
||||||
|
| D021 | M011 findings triage | 12/16 approved; denied beginner paths, YouTube links, hide admin, CTA label |
|
||||||
|
| D030 | ToC scroll-spy rootMargin | `0px 0px -70% 0px` — active when in top 30% of viewport |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[Development-Guide]]*
|
||||||
130
Deployment.md
Normal file
130
Deployment.md
Normal file
|
|
@ -0,0 +1,130 @@
|
||||||
|
# Deployment
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SSH to ub01
|
||||||
|
ssh ub01
|
||||||
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
||||||
|
|
||||||
|
# Standard deploy
|
||||||
|
git pull
|
||||||
|
docker compose build && docker compose up -d
|
||||||
|
|
||||||
|
# Run migrations (if Alembic is configured)
|
||||||
|
docker exec chrysopedia-api alembic upgrade head
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
docker logs -f chrysopedia-api
|
||||||
|
docker logs -f chrysopedia-worker
|
||||||
|
docker logs -f chrysopedia-watcher
|
||||||
|
|
||||||
|
# Check status
|
||||||
|
docker ps --filter name=chrysopedia
|
||||||
|
```
|
||||||
|
|
||||||
|
## File Layout on ub01
|
||||||
|
|
||||||
|
```
|
||||||
|
/vmPool/r/
|
||||||
|
├── repos/xpltdco/chrysopedia/ # Git repo (source code)
|
||||||
|
├── compose/xpltd_chrysopedia/ # Symlink to repo's docker-compose.yml
|
||||||
|
├── services/
|
||||||
|
│ ├── chrysopedia_postgres_data/ # PostgreSQL data
|
||||||
|
│ ├── chrysopedia_qdrant_data/ # Qdrant vector data
|
||||||
|
│ ├── chrysopedia_ollama_data/ # Ollama model cache
|
||||||
|
│ └── chrysopedia_watch/ # Watcher input directory
|
||||||
|
│ ├── processed/ # Successfully ingested transcripts
|
||||||
|
│ └── failed/ # Failed transcripts + .error sidecars
|
||||||
|
```
|
||||||
|
|
||||||
|
## Docker Compose Configuration
|
||||||
|
|
||||||
|
- **Project name:** `xpltd_chrysopedia`
|
||||||
|
- **Network:** `chrysopedia-net` (172.32.0.0/24)
|
||||||
|
- **Compose file:** `/vmPool/r/repos/xpltdco/chrysopedia/docker-compose.yml`
|
||||||
|
|
||||||
|
### Build Args / Environment
|
||||||
|
|
||||||
|
Frontend build-time constants are injected via Docker build args:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
build:
|
||||||
|
args:
|
||||||
|
VITE_APP_VERSION: ${APP_VERSION:-0.1.0}
|
||||||
|
VITE_GIT_COMMIT: ${GIT_COMMIT:-unknown}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** `ARG` → `ENV` → `RUN npm run build` ordering matters in the Dockerfile. The `ENV` line must appear before the build step.
|
||||||
|
|
||||||
|
### Service Dependencies
|
||||||
|
|
||||||
|
```
|
||||||
|
chrysopedia-web-8096 → chrysopedia-api → chrysopedia-db, chrysopedia-redis
|
||||||
|
chrysopedia-worker → chrysopedia-db, chrysopedia-redis, chrysopedia-qdrant, chrysopedia-ollama
|
||||||
|
chrysopedia-watcher → chrysopedia-api
|
||||||
|
```
|
||||||
|
|
||||||
|
## Healthchecks
|
||||||
|
|
||||||
|
| Service | Healthcheck | Notes |
|
||||||
|
|---------|------------|-------|
|
||||||
|
| PostgreSQL | `pg_isready` | Built-in |
|
||||||
|
| Redis | `redis-cli ping` | Built-in |
|
||||||
|
| Qdrant | `bash -c 'echo > /dev/tcp/localhost/6333'` | No curl available |
|
||||||
|
| Ollama | `ollama list` | Built-in CLI |
|
||||||
|
| API | `curl -f http://localhost:8000/health` | |
|
||||||
|
| Worker | `celery -A worker inspect ping` | Not HTTP |
|
||||||
|
| Watcher | `python -c "import os; os.kill(1, 0)"` | Slim image, no pgrep |
|
||||||
|
|
||||||
|
## nginx Reverse Proxy
|
||||||
|
|
||||||
|
On nuc01 (10.0.0.9):
|
||||||
|
- Server block proxies chrysopedia.com → ub01:8096
|
||||||
|
- SSL via Certbot (Let's Encrypt)
|
||||||
|
- SPA fallback: all paths return index.html
|
||||||
|
|
||||||
|
**Stale DNS after rebuild:** If API container is rebuilt, restart nginx container to pick up new internal IP:
|
||||||
|
```bash
|
||||||
|
docker compose restart chrysopedia-web-8096
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rebuilding After Code Changes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Full rebuild (backend + frontend)
|
||||||
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
||||||
|
git pull
|
||||||
|
docker compose build && docker compose up -d
|
||||||
|
|
||||||
|
# Frontend only
|
||||||
|
docker compose build chrysopedia-web-8096 && docker compose up -d chrysopedia-web-8096
|
||||||
|
|
||||||
|
# Backend only (API + Worker share same image)
|
||||||
|
docker compose build chrysopedia-api && docker compose up -d chrysopedia-api chrysopedia-worker
|
||||||
|
|
||||||
|
# Restart without rebuild
|
||||||
|
docker compose restart chrysopedia-api chrysopedia-worker
|
||||||
|
```
|
||||||
|
|
||||||
|
## Port Mapping
|
||||||
|
|
||||||
|
| Service | Container Port | Host Port | Binding |
|
||||||
|
|---------|---------------|-----------|---------|
|
||||||
|
| PostgreSQL | 5432 | 5433 | 0.0.0.0 |
|
||||||
|
| Web (nginx) | 80 | 8096 | 0.0.0.0 |
|
||||||
|
| SSH (Forgejo) | 22 | 2222 | 0.0.0.0 |
|
||||||
|
|
||||||
|
All other services (Redis, Qdrant, Ollama, API, Worker) are internal-only.
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
- **Web UI:** http://ub01:8096
|
||||||
|
- **API Health:** http://ub01:8096/health
|
||||||
|
- **Pipeline Admin:** http://ub01:8096/admin/pipeline
|
||||||
|
- **Worker Status:** http://ub01:8096/admin/pipeline (shows Celery worker count)
|
||||||
|
- **PostgreSQL:** Connect via `psql -h ub01 -p 5433 -U chrysopedia`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[Development-Guide]]*
|
||||||
134
Development-Guide.md
Normal file
134
Development-Guide.md
Normal file
|
|
@ -0,0 +1,134 @@
|
||||||
|
# Development Guide
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Docker + Docker Compose
|
||||||
|
- Node.js 18+ (for frontend dev)
|
||||||
|
- Python 3.11+ (for backend dev)
|
||||||
|
- SSH access to ub01
|
||||||
|
|
||||||
|
### Local Development
|
||||||
|
|
||||||
|
The simplest approach is working directly on ub01:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh ub01
|
||||||
|
cd /vmPool/r/repos/xpltdco/chrysopedia
|
||||||
|
```
|
||||||
|
|
||||||
|
For frontend-only work, you can run Vite locally and proxy to the remote API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd frontend
|
||||||
|
npm install
|
||||||
|
npm run dev # Vite dev server with /api proxy to localhost:8001
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
chrysopedia/
|
||||||
|
├── backend/
|
||||||
|
│ ├── config.py # Settings (env vars, LRU cached)
|
||||||
|
│ ├── database.py # Async SQLAlchemy engine + session
|
||||||
|
│ ├── main.py # FastAPI app, router registration
|
||||||
|
│ ├── models.py # All 13 SQLAlchemy models
|
||||||
|
│ ├── schemas.py # Pydantic request/response schemas
|
||||||
|
│ ├── search_service.py # Async search (Qdrant + keyword fallback)
|
||||||
|
│ ├── redis_client.py # Async Redis client
|
||||||
|
│ ├── watcher.py # Transcript folder watcher
|
||||||
|
│ ├── routers/ # FastAPI route handlers
|
||||||
|
│ ├── pipeline/ # Celery pipeline stages
|
||||||
|
│ │ ├── stages.py # Stage implementations
|
||||||
|
│ │ └── quality/ # Prompt quality toolkit
|
||||||
|
│ ├── services/ # Business logic services
|
||||||
|
│ └── tests/ # pytest test suite
|
||||||
|
├── frontend/
|
||||||
|
│ └── src/
|
||||||
|
│ ├── App.tsx # Routes, layout
|
||||||
|
│ ├── App.css # All styles (5,820 lines)
|
||||||
|
│ ├── main.tsx # React entry point
|
||||||
|
│ ├── api/ # API client (public-client.ts)
|
||||||
|
│ ├── pages/ # Page components (11)
|
||||||
|
│ ├── components/ # Shared components (11+)
|
||||||
|
│ ├── hooks/ # Custom hooks (3)
|
||||||
|
│ └── utils/ # Utilities (citations, slugs)
|
||||||
|
├── prompts/ # LLM prompt templates
|
||||||
|
├── alembic/ # DB migrations (if configured)
|
||||||
|
├── docker-compose.yml
|
||||||
|
├── Dockerfile.api
|
||||||
|
├── Dockerfile.web
|
||||||
|
└── CLAUDE.md # AI agent development reference
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Gotchas
|
||||||
|
|
||||||
|
### asyncpg Timestamp Errors
|
||||||
|
Use `datetime.now(timezone.utc).replace(tzinfo=None)` for all timestamp defaults. asyncpg rejects timezone-aware datetimes for TIMESTAMP WITHOUT TIME ZONE columns.
|
||||||
|
|
||||||
|
### SQLAlchemy Column Name Conflicts
|
||||||
|
Never name a column `relationship`, `query`, or `metadata` — these shadow ORM functions. Use `from sqlalchemy.orm import relationship as sa_relationship` if the schema requires it.
|
||||||
|
|
||||||
|
### Vite Build Constants
|
||||||
|
Always wrap with `JSON.stringify()`: `define: { __APP_VERSION__: JSON.stringify(version) }`. Without it, the built code gets unquoted values (syntax error).
|
||||||
|
|
||||||
|
### Docker ARG/ENV Ordering
|
||||||
|
`ARG VITE_FOO=default` → `ENV VITE_FOO=$VITE_FOO` → `RUN npm run build`. The ENV line must appear before the build step.
|
||||||
|
|
||||||
|
### Slim Docker Images
|
||||||
|
`python:3.x-slim` doesn't include `procps` (no `pgrep`, `ps`). Use `python -c "import os; os.kill(1, 0)"` for healthchecks.
|
||||||
|
|
||||||
|
### Host Port 8000 Conflict
|
||||||
|
Port 8000 on ub01 may be used by kerf-engine. Use 8001 for local testing, or ensure kerf-engine is stopped.
|
||||||
|
|
||||||
|
### Nginx Stale DNS
|
||||||
|
After rebuilding API container, restart the web container: `docker compose restart chrysopedia-web-8096`.
|
||||||
|
|
||||||
|
### ZFS Filesystem Watchers
|
||||||
|
Use `watchdog.observers.polling.PollingObserver` instead of the default inotify observer — inotify doesn't reliably detect changes on ZFS/NFS.
|
||||||
|
|
||||||
|
### File Stability for SCP Uploads
|
||||||
|
Wait for file size stability (check twice with 2-second gap) before processing files received via SCP/rsync.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
python -m pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Tests use:
|
||||||
|
- `NullPool` for async engine (prevents connection pool contention)
|
||||||
|
- Module-level patching for Celery stage globals (`_engine`, `_SessionLocal`)
|
||||||
|
- `patch('pipeline.stages.run_pipeline')` for lazy import mocking (not at the router level)
|
||||||
|
|
||||||
|
## Adding New Features
|
||||||
|
|
||||||
|
### New API Endpoint
|
||||||
|
1. Create router in `backend/routers/foo.py` with `APIRouter(prefix="/foo", tags=["foo"])`
|
||||||
|
2. Register in `backend/main.py`: `app.include_router(foo.router, prefix="/api/v1")`
|
||||||
|
3. Define schemas in `backend/schemas.py`
|
||||||
|
4. Use paginated response: `{items, total, offset, limit}`
|
||||||
|
|
||||||
|
### New Frontend Route
|
||||||
|
1. Add `<Route>` to `App.tsx`
|
||||||
|
2. Create page component in `frontend/src/pages/`
|
||||||
|
3. Call `useDocumentTitle()` in the component
|
||||||
|
4. Add API functions to `public-client.ts`
|
||||||
|
|
||||||
|
### New Database Model
|
||||||
|
1. Add to `backend/models.py`
|
||||||
|
2. Add schemas to `backend/schemas.py`
|
||||||
|
3. Apply DDL manually or via Alembic migration
|
||||||
|
4. Use `_now()` helper for timestamp defaults
|
||||||
|
|
||||||
|
### New CSS
|
||||||
|
1. Append to `App.css` using BEM naming
|
||||||
|
2. Use CSS custom properties for all colors
|
||||||
|
3. Prefer 768px breakpoint for mobile/desktop split
|
||||||
|
4. Namespace Phase 2 selectors: `.p2-feature__element`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[Frontend]], [[Deployment]]*
|
||||||
110
Frontend.md
Normal file
110
Frontend.md
Normal file
|
|
@ -0,0 +1,110 @@
|
||||||
|
# Frontend
|
||||||
|
|
||||||
|
React 18 + TypeScript + Vite SPA. No UI library, no state management library, no CSS framework.
|
||||||
|
|
||||||
|
## Route Map
|
||||||
|
|
||||||
|
| Route | Page Component | Auth | Notes |
|
||||||
|
|-------|---------------|------|-------|
|
||||||
|
| `/` | Home | Public | Hero search, stats counters, popular topics, nav cards |
|
||||||
|
| `/search` | SearchResults | Public | Sort, highlights, partial matches |
|
||||||
|
| `/techniques/:slug` | TechniquePage | Public | v2 body sections, ToC sidebar, citations |
|
||||||
|
| `/creators` | CreatorsBrowse | Public | Random default sort, genre filters |
|
||||||
|
| `/creators/:slug` | CreatorDetail | Public | Avatar, stats, technique list |
|
||||||
|
| `/topics` | TopicsBrowse | Public | 7 category cards, expandable sub-topics |
|
||||||
|
| `/topics/:category/:subtopic` | SubTopicPage | Public | Creator-grouped techniques |
|
||||||
|
| `/about` | About | Public | Static project info |
|
||||||
|
| `/admin/reports` | AdminReports | Admin* | Content reports |
|
||||||
|
| `/admin/pipeline` | AdminPipeline | Admin* | Pipeline management |
|
||||||
|
| `/admin/techniques` | AdminTechniquePages | Admin* | Technique page admin |
|
||||||
|
| `*` | → Redirect `/` | — | SPA fallback |
|
||||||
|
|
||||||
|
*Admin routes have no authentication gate.
|
||||||
|
|
||||||
|
**Routing:** All routes in a single `<Routes>` block in `App.tsx`. nginx returns the SPA shell for all paths; react-router-dom v6 handles client-side routing.
|
||||||
|
|
||||||
|
## Shared Components
|
||||||
|
|
||||||
|
| Component | Purpose |
|
||||||
|
|-----------|---------|
|
||||||
|
| SearchAutocomplete | Global search with Ctrl+Shift+F shortcut (nav + mobile instances) |
|
||||||
|
| AdminDropdown | Hover-open at desktop, tap-toggle on mobile |
|
||||||
|
| AppFooter | Version, build date, GitHub link |
|
||||||
|
| TableOfContents | Sticky sidebar ToC with IntersectionObserver scroll-spy |
|
||||||
|
| SortDropdown | Reusable sort selector |
|
||||||
|
| TagList | Tag/badge pills with +N overflow |
|
||||||
|
| CategoryIcons | SVG icons per topic category |
|
||||||
|
| CreatorAvatar | Avatar with fallback |
|
||||||
|
| CopyLinkButton | Clipboard copy with tooltip |
|
||||||
|
| SocialIcons | Social media link icons (9 platforms) |
|
||||||
|
| ReportIssueModal | Content report submission |
|
||||||
|
|
||||||
|
## Hooks
|
||||||
|
|
||||||
|
| Hook | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| useCountUp | Animated counter for homepage stats |
|
||||||
|
| useSortPreference | Persists sort preference in localStorage |
|
||||||
|
| useDocumentTitle | Sets `<title>` per page (all 10 pages instrumented) |
|
||||||
|
|
||||||
|
## State Management
|
||||||
|
|
||||||
|
Local component state only (`useState`/`useEffect`). No Redux, Zustand, Context providers, or external state management library.
|
||||||
|
|
||||||
|
## API Client
|
||||||
|
|
||||||
|
Single module `public-client.ts` (~600 lines) with typed `request<T>` helper. Relative `/api/v1` base URL (nginx proxies to API container). All response TypeScript interfaces defined in the same file.
|
||||||
|
|
||||||
|
## CSS Architecture
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| File | `frontend/src/App.css` |
|
||||||
|
| Lines | 5,820 |
|
||||||
|
| Unique classes | ~589 |
|
||||||
|
| Naming | BEM (`block__element--modifier`) |
|
||||||
|
| Theme | Dark-only (no light mode) |
|
||||||
|
| Custom properties | 77 in `:root` (D017) |
|
||||||
|
| Accent color | Cyan `#22d3ee` |
|
||||||
|
| Font stack | System fonts |
|
||||||
|
| Preprocessor | None |
|
||||||
|
| CSS Modules | None |
|
||||||
|
|
||||||
|
### Custom Property Categories (77 total)
|
||||||
|
|
||||||
|
- **Surface colors:** page background, card backgrounds, nav, footer, input
|
||||||
|
- **Text colors:** primary, secondary, muted, inverse, link, heading
|
||||||
|
- **Accent colors:** primary cyan, hover/active, focus rings
|
||||||
|
- **Badge colors:** Per-category pairs (bg + text) for 7 topic categories
|
||||||
|
- **Status colors:** Success/warning/error/info
|
||||||
|
- **Border colors:** Default, hover, focus, divider
|
||||||
|
- **Shadow colors:** Elevation, glow effects
|
||||||
|
- **Overlay colors:** Modal/dropdown overlays
|
||||||
|
|
||||||
|
### Breakpoints
|
||||||
|
|
||||||
|
| Breakpoint | Usage |
|
||||||
|
|-----------|-------|
|
||||||
|
| 480px | Narrow mobile — compact cards |
|
||||||
|
| 600px | Wider mobile — grid adjustments |
|
||||||
|
| 640px | Small tablet — content width |
|
||||||
|
| 768px | Desktop ↔ mobile transition — sidebar collapse |
|
||||||
|
|
||||||
|
### Layout Patterns
|
||||||
|
|
||||||
|
- **Page max-width:** 64rem (D019)
|
||||||
|
- **Technique page:** CSS grid 2-column (1fr + 22rem sidebar), collapses at 768px
|
||||||
|
- **Card layouts:** CSS grid with `auto-fill, minmax(...)` for responsive grids
|
||||||
|
- **Collapsible sections:** `grid-template-rows: 0fr/1fr` animation
|
||||||
|
- **Sticky elements:** ToC sidebar, reading header
|
||||||
|
|
||||||
|
## Build
|
||||||
|
|
||||||
|
- **Bundler:** Vite
|
||||||
|
- **Build-time constants:** `__APP_VERSION__`, `__BUILD_DATE__`, `__GIT_COMMIT__` via `define` (must use `JSON.stringify`)
|
||||||
|
- **Dev proxy:** `/api` → `localhost:8001`
|
||||||
|
- **Production:** nginx serves static `dist/` bundle, proxies `/api` to FastAPI container
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[API-Surface]], [[Development-Guide]]*
|
||||||
108
Pipeline.md
Normal file
108
Pipeline.md
Normal file
|
|
@ -0,0 +1,108 @@
|
||||||
|
# Pipeline
|
||||||
|
|
||||||
|
6-stage LLM-powered extraction pipeline that transforms video transcripts into structured technique articles.
|
||||||
|
|
||||||
|
## Pipeline Stages
|
||||||
|
|
||||||
|
```
|
||||||
|
Video File
|
||||||
|
↓
|
||||||
|
[Desktop] Whisper large-v3 (RTX 4090) → transcript JSON
|
||||||
|
↓
|
||||||
|
[Watcher/API] Ingest → SourceVideo + TranscriptSegments in PostgreSQL
|
||||||
|
↓
|
||||||
|
Stage 1: Transcript Segmentation — chunk transcript into logical segments
|
||||||
|
↓
|
||||||
|
Stage 2: Key Moment Extraction — identify teachable moments with timestamps
|
||||||
|
↓
|
||||||
|
Stage 3: (reserved)
|
||||||
|
↓
|
||||||
|
Stage 4: Classification & Tagging — assign topic_category + topic_tags per moment
|
||||||
|
↓
|
||||||
|
Stage 5: Technique Page Synthesis — compose study guide articles from moments
|
||||||
|
↓
|
||||||
|
Stage 6: Embed & Index — generate embeddings, upsert to Qdrant (non-blocking)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Stage Details
|
||||||
|
|
||||||
|
### Stage 1: Transcript Segmentation
|
||||||
|
- Chunks raw transcript into logical segments
|
||||||
|
- Input: TranscriptSegments from DB
|
||||||
|
- Output: Segmented data for stage 2
|
||||||
|
|
||||||
|
### Stage 2: Key Moment Extraction
|
||||||
|
- Identifies teachable moments with titles, summaries, timestamps
|
||||||
|
- Uses LLM with prompt template from `prompts/` directory
|
||||||
|
- Output: KeyMoment records in PostgreSQL
|
||||||
|
|
||||||
|
### Stage 4: Classification & Tagging
|
||||||
|
- Assigns topic_category and topic_tags to each key moment
|
||||||
|
- References canonical tag list (`canonical_tags.yaml`) with aliases
|
||||||
|
- Output: Classification data stored in Redis (`chrysopedia:classification:{video_id}`, 24h TTL)
|
||||||
|
|
||||||
|
### Stage 5: Technique Page Synthesis
|
||||||
|
- Composes study guide articles from classified key moments
|
||||||
|
- Handles multi-source merging: new video moments merge into existing technique pages
|
||||||
|
- Uses offset-based citation indexing (existing [0]-[N-1], new [N]-[N+M-1])
|
||||||
|
- Creates pre-overwrite version snapshot before mutating existing pages (D018)
|
||||||
|
- Output: TechniquePage records with body_sections (v2 format), signal_chains, plugins
|
||||||
|
|
||||||
|
### Stage 6: Embed & Index
|
||||||
|
- Generates embeddings via Ollama (nomic-embed-text)
|
||||||
|
- Embedding text enriched with creator_name and topic_tags (D023)
|
||||||
|
- Upserts to Qdrant with deterministic UUIDs based on content
|
||||||
|
- **Non-blocking:** Failures log WARNING but don't fail the pipeline (D005)
|
||||||
|
- Can be re-triggered independently via `/admin/pipeline/reindex-all`
|
||||||
|
|
||||||
|
## LLM Configuration
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| Primary LLM | DGX Sparks Qwen (OpenAI-compatible API) |
|
||||||
|
| Fallback LLM | Local Ollama |
|
||||||
|
| Embedding model | nomic-embed-text (Ollama) |
|
||||||
|
| Model routing | Per-stage configuration (chat vs thinking models) |
|
||||||
|
|
||||||
|
## Prompt Template System
|
||||||
|
|
||||||
|
- Prompt files stored in `prompts/` directory (D013)
|
||||||
|
- Templates use XML-style content fencing
|
||||||
|
- Editable without code changes — pipeline reads from disk at runtime
|
||||||
|
- SHA-256 hashes tracked in TechniquePageVersion.pipeline_metadata for reproducibility
|
||||||
|
- Re-process after prompt edits via `POST /admin/pipeline/trigger/{video_id}`
|
||||||
|
|
||||||
|
## Pipeline Admin Features
|
||||||
|
|
||||||
|
- **Debug mode:** Redis-backed toggle captures full LLM I/O (system prompt, user prompt, response) in pipeline_events
|
||||||
|
- **Token tracking:** Per-event and per-video token usage visible in admin UI
|
||||||
|
- **Stale page detection:** Identifies pages needing regeneration
|
||||||
|
- **Bulk operations:** Bulk resynthesize, wipe all output, reindex all
|
||||||
|
- **Worker status:** Real-time Celery worker health check
|
||||||
|
|
||||||
|
## Prompt Quality Toolkit
|
||||||
|
|
||||||
|
CLI tool (`python -m pipeline.quality`) with:
|
||||||
|
- **LLM fitness suite** — 9 tests (Mandelbrot reasoning, JSON compliance, instruction following)
|
||||||
|
- **5-dimension quality scorer** with voice preservation dial
|
||||||
|
- **Automated prompt A/B optimization loop** — LLM-powered variant generation, iterative scoring, leaderboard
|
||||||
|
- **Multi-stage support** for pipeline stages 2-5 with per-stage rubrics and fixtures
|
||||||
|
|
||||||
|
## Key Design Decisions
|
||||||
|
|
||||||
|
- **Sync clients in Celery** (D004): openai.OpenAI, QdrantClient, sync SQLAlchemy. Avoids nested event loop errors.
|
||||||
|
- **Non-blocking embedding** (D005): Stage 6 failures don't block core pipeline output.
|
||||||
|
- **Redis for stage 4 data**: Classification results in Redis with 24h TTL, not DB columns.
|
||||||
|
- **Best-effort versioning** (D018): Version snapshot failure doesn't block page update.
|
||||||
|
|
||||||
|
## Transcript Watcher
|
||||||
|
|
||||||
|
Standalone service (`watcher.py`) monitors `/vmPool/r/services/chrysopedia_watch/` for new transcript JSON files:
|
||||||
|
- Uses `watchdog.observers.polling.PollingObserver` for ZFS reliability
|
||||||
|
- Validates file structure, waits for size stability (handles partial SCP writes)
|
||||||
|
- POSTs to ingest API on file detection
|
||||||
|
- Moves processed files to `processed/`, failures to `failed/` with `.error` sidecar
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*See also: [[Architecture]], [[Data-Model]], [[Deployment]]*
|
||||||
17
_Sidebar.md
Normal file
17
_Sidebar.md
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
### Chrysopedia Wiki
|
||||||
|
|
||||||
|
- [[Home]]
|
||||||
|
|
||||||
|
**Architecture**
|
||||||
|
- [[Architecture]]
|
||||||
|
- [[Data-Model]]
|
||||||
|
- [[Pipeline]]
|
||||||
|
|
||||||
|
**Reference**
|
||||||
|
- [[API-Surface]]
|
||||||
|
- [[Frontend]]
|
||||||
|
- [[Decisions]]
|
||||||
|
|
||||||
|
**Operations**
|
||||||
|
- [[Deployment]]
|
||||||
|
- [[Development-Guide]]
|
||||||
Loading…
Add table
Reference in a new issue