- "backend/pipeline/schemas.py" - "backend/pipeline/citation_utils.py" - "backend/pipeline/test_citation_utils.py" GSD-Task: S01/T01
29 KiB
Chrysopedia — Project Context Document
Auto-generated: 2026-04-01 | Assessed Stage: Integration/Stabilization | Root:
/home/aux/projects/content-to-kb-automator
Overview
Chrysopedia is a self-hosted knowledge extraction and retrieval system for electronic music production content. It takes raw video files (tutorials, livestreams, track breakdowns) from 50+ electronic music producers, transcribes them via Whisper, runs them through a multi-stage LLM pipeline to extract structured knowledge, and serves the results through a search-first web UI designed for mid-session retrieval — a producer Alt+Tabs from their DAW, searches for a technique, absorbs the answer, and gets back to work in under 30 seconds.
Audience: Electronic music producers, primarily one power user (the project owner) with a personal library of 100-500 video files. Single-admin tool, not multi-tenant.
Project type: Full-stack web application with an LLM-powered data pipeline. Monorepo with backend (Python/FastAPI), frontend (React/TypeScript), Whisper transcription script, Docker Compose deployment, and prompt engineering toolkit.
Evidence for purpose: Extensive 37-page spec (chrysopedia-spec.md), README with architecture diagrams, detailed PROJECT.md in GSD artifacts, 23 decisions logged, 32 requirements tracked (28 validated, 1 active, 4 out-of-scope). Etymology: chrysopoeia (alchemical transmutation) + encyclopedia.
Canonical development directory: This is not the active development location. Per CLAUDE.md, all future development happens on ub01 at /vmPool/r/repos/xpltdco/chrysopedia. This directory was the initial workspace. GitHub: github.com/xpltdco/chrysopedia (private, xpltdco org).
Architecture & Stack
Technology Stack
| Layer | Technology | Version/Notes |
|---|---|---|
| Backend | Python 3.12, FastAPI, SQLAlchemy (async), Pydantic Settings | API + business logic |
| Task Queue | Celery + Redis (broker + result backend) | Sync tasks, concurrency=1 |
| Database | PostgreSQL 16 (asyncpg driver) | Primary data store |
| Vector DB | Qdrant v1.13.2 | Semantic search embeddings |
| Embeddings | Ollama (nomic-embed-text, 768-dim) | Local CPU inference |
| LLM | OpenAI-compatible API (DGX Sparks Qwen primary, Ollama fallback) | Per-stage model routing (chat vs thinking) |
| Frontend | React 18.3, TypeScript 5.6, Vite 6, React Router 6.28 | Zero UI libraries — all custom CSS |
| Web Server | nginx 1.27 (Alpine) | SPA routing + API proxy |
| Containerization | Docker Compose | 8 services, dedicated bridge network |
| Deployment | ub01 (on-premises server) | Bind mounts to /vmPool/r/services/chrysopedia_* |
| Reverse Proxy | nginx on nuc01 (separate machine) | Routes chrysopedia.xpltd.co → ub01:8096 |
System Architecture
Desktop (GPU workstation — hal0022)
└── whisper/transcribe.py → JSON transcripts → SCP/rsync to /watch folder
Docker Compose on ub01 (8 services on 172.32.0.0/24):
┌─────────────┐ ┌───────┐ ┌────────┐ ┌────────┐
│ PostgreSQL │ │ Redis │ │ Qdrant │ │ Ollama │
│ :5433→5432 │ │ broker│ │ vector │ │ embed │
└──────┬───────┘ └───┬───┘ └───┬────┘ └───┬────┘
└──────┬───────┴──────────┴────────────┘
│
┌─────────────┼────────────────────────────────┐
│ FastAPI API │ Celery Worker │ Watcher │
│ REST + admin │ LLM pipeline │ /watch→POST │
└──────────────┴─────────────────┴──────────────┘
│
┌─────────────┴──────┐
│ nginx (React SPA) │
│ :8096→80 │
└────────────────────┘
Data flow: Video → Whisper transcript JSON → Watcher POSTs to /api/v1/ingest → Celery pipeline (4 LLM stages: segment → extract → classify → synthesize) → KeyMoments + TechniquePages in PostgreSQL → Embeddings in Qdrant → Search-first web UI.
External integrations:
- OpenWebUI at
chat.forgetyour.name(DGX Sparks Qwen models for LLM inference) - AdGuard DNS on ub01 for internal domain resolution
- nginx on nuc01 for external HTTPS termination (via Certbot)
Data Model
11 entities across 11 tables:
| Entity | Purpose | Key Fields |
|---|---|---|
| Creator | Artists/producers | name, slug, genres[], folder_name, hidden |
| SourceVideo | Processed video files | filename, content_hash (dedup), processing_status, classification_data (JSONB) |
| TranscriptSegment | Whisper output rows | start_time, end_time, text, segment_index, topic_label |
| KeyMoment | LLM-extracted insights | title, summary, start_time, end_time, content_type, plugins[] |
| TechniquePage | Synthesized knowledge (primary output) | title, slug, topic_category, topic_tags[], body_sections (JSONB), signal_chains (JSONB), plugins[] |
| TechniquePageVersion | Pre-overwrite snapshots | content_snapshot (JSONB), pipeline_metadata (JSONB), version_number |
| RelatedTechniqueLink | Cross-references | source→target, relationship type |
| Tag | Topic taxonomy | name, category, aliases[] |
| ContentReport | User-reported issues | report_type, status, admin_notes |
| PipelineRun | Pipeline execution record | video_id, run_number, trigger, status, total_tokens |
| PipelineEvent | Per-stage execution log | stage, event_type, token counts, payload (JSONB), debug I/O columns |
Relationships: Creator → SourceVideo → TranscriptSegment, KeyMoment; Creator → TechniquePage → KeyMoment, TechniquePageVersion, RelatedTechniqueLink; SourceVideo → PipelineRun → PipelineEvent.
Migrations: 11 Alembic migrations (001 through 011), covering initial schema through pipeline runs and classification cache additions.
Project Structure
chrysopedia/
├── backend/ # FastAPI application (10,209 LOC Python)
│ ├── main.py # App entry, middleware, router mounting
│ ├── config.py # Pydantic Settings (all env vars)
│ ├── database.py # Async engine + session factory
│ ├── models.py # 11 SQLAlchemy ORM models
│ ├── schemas.py # Pydantic request/response schemas (422 lines)
│ ├── worker.py # Celery app config
│ ├── watcher.py # Folder monitor → auto-ingest service
│ ├── search_service.py # Async semantic + keyword search (603 lines)
│ ├── redis_client.py # Redis client for feature flags
│ ├── routers/ # 9 API router modules
│ │ ├── health.py, ingest.py, search.py, techniques.py
│ │ ├── creators.py, topics.py, videos.py
│ │ ├── pipeline.py (admin), reports.py
│ ├── pipeline/ # LLM pipeline core (2,908 LOC)
│ │ ├── stages.py # 4 LLM stages + orchestrator (2,102 lines — largest file)
│ │ ├── llm_client.py # OpenAI-compatible sync client with fallback
│ │ ├── embedding_client.py # Sync embedding client for Celery
│ │ ├── qdrant_client.py # Qdrant upsert + collection management
│ │ ├── schemas.py # Pipeline data schemas
│ │ └── quality/ # Prompt optimization toolkit (2,507 LOC)
│ │ ├── fitness.py # LLM fitness test suite (9 tests)
│ │ ├── scorer.py # 5-dimension LLM-as-judge scoring
│ │ ├── optimizer.py # Automated prompt A/B optimization
│ │ ├── variant_generator.py # LLM-powered prompt mutation
│ │ └── voice_dial.py # Voice preservation dial
│ └── tests/ # Integration tests (2,754 LOC, 65 tests)
├── frontend/ # React SPA (9,975 LOC TypeScript + CSS)
│ └── src/
│ ├── pages/ # 10 page components
│ ├── components/ # 9 shared components
│ ├── hooks/ # 2 custom hooks
│ ├── api/ # Typed API client
│ └── App.css # 4,871 lines — all styles (no CSS framework)
├── whisper/ # Desktop transcription scripts
├── prompts/ # 3 active prompt templates + 100 stage5 variants
├── alembic/ # 11 database migrations
├── config/ # canonical_tags.yaml (7-category topic taxonomy)
├── docker/ # Dockerfile.api, Dockerfile.web, nginx.conf
├── docker-compose.yml # 8-service stack definition
├── generate_stage5_variants.py # Stage 5 prompt variant generator (874 lines — one-off tool)
├── .gsd/ # GSD project management artifacts
│ ├── PROJECT.md, REQUIREMENTS.md, DECISIONS.md, KNOWLEDGE.md
│ └── milestones/ # 13 completed milestone artifacts
└── .env.example # Environment variable template
Entry points:
backend/main.py→ FastAPI app (uvicorn main:app)backend/worker.py→ Celery worker (celery -A worker worker)backend/watcher.py→ Folder watcher service (python watcher.py)frontend/src/main.tsx→ React app (Vite dev server or nginx-served build)whisper/transcribe.py→ Desktop transcription CLIbackend/pipeline/quality/__main__.py→ Prompt quality toolkit CLI
Configuration & Environment
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
POSTGRES_USER |
Database user | chrysopedia |
POSTGRES_PASSWORD |
Database password | changeme |
POSTGRES_DB |
Database name | chrysopedia |
DATABASE_URL |
Full async connection string | Composed from above |
REDIS_URL |
Redis broker URL | redis://chrysopedia-redis:6379/0 |
LLM_API_URL |
Primary LLM endpoint | OpenWebUI on DGX |
LLM_API_KEY |
LLM authentication | Required |
LLM_MODEL |
Default LLM model name | fyn-llm-agent-chat |
LLM_FALLBACK_URL / _MODEL |
Fallback LLM endpoint | Same as primary |
LLM_STAGE{2-5}_MODEL |
Per-stage model override | chat for 2/4, think for 3/5 |
LLM_STAGE{2-5}_MODALITY |
chat or thinking per stage | See above |
LLM_MAX_TOKENS |
LLM response token limit | 32768 |
LLM_TEMPERATURE |
LLM temperature | 0.0 (deterministic) |
SYNTHESIS_CHUNK_SIZE |
Max moments per synthesis call | 30 |
EMBEDDING_API_URL |
Ollama embedding endpoint | Container-internal |
EMBEDDING_MODEL |
Embedding model name | nomic-embed-text |
EMBEDDING_DIMENSIONS |
Vector dimensionality | 768 |
QDRANT_URL |
Qdrant endpoint | Container-internal |
QDRANT_COLLECTION |
Qdrant collection name | chrysopedia |
APP_ENV |
Environment name | development |
APP_LOG_LEVEL |
Log level | info |
APP_SECRET_KEY |
Application secret | changeme-generate-a-real-secret |
CORS_ORIGINS |
Allowed CORS origins | ["*"] |
REVIEW_MODE |
Require admin review of moments | true |
DEBUG_MODE |
Capture full LLM I/O in events | false |
TRANSCRIPT_STORAGE_PATH |
Transcript file storage | /data/transcripts |
VIDEO_METADATA_PATH |
Video metadata storage | /data/video_meta |
PROMPTS_PATH |
Prompt template directory | ./prompts |
GIT_COMMIT_SHA |
Build-time commit hash | unknown |
WATCH_FOLDER |
Watcher monitored directory | /watch |
WATCHER_API_URL |
Ingest endpoint for watcher | Container-internal |
WATCHER_STABILITY_SECONDS |
File stability wait time | 2 |
WATCHER_POLL_INTERVAL |
Filesystem poll interval | 5 |
GIT_COMMIT_SHA (build arg) |
Passed at Docker build time for footer | dev |
VITE_GIT_COMMIT (build arg) |
Frontend build-time constant | dev |
Environments
- Production: Docker Compose on ub01,
.envfile with real credentials - Local dev: Backend runs locally with
docker compose up -d chrysopedia-db chrysopedia-redis,.envin backend/ - Test: Uses real PostgreSQL (test database), configured in
backend/tests/conftest.py - No staging environment exists.
Secrets Management
Environment variables via .env file (gitignored). No vault, KMS, or sealed secrets. The .env.example contains placeholders. backend/.env exists locally (not tracked in git) and contains a real API key — this is expected for local dev but the key should be rotated if this directory is ever shared.
Development Workflow
Getting Started
# 1. Clone the repo
git clone git@github.com:xpltdco/chrysopedia.git
cd chrysopedia
# 2. Configure environment
cp .env.example .env
# Edit .env with real LLM_API_KEY and POSTGRES_PASSWORD
# 3. Start infrastructure
docker compose up -d
# 4. Run migrations
docker exec chrysopedia-api alembic upgrade head
# 5. Pull embedding model (first time)
docker exec chrysopedia-ollama ollama pull nomic-embed-text
# 6. Verify
curl http://localhost:8096/health
For local backend development (outside Docker):
python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
docker compose up -d chrysopedia-db chrysopedia-redis # just infra
alembic upgrade head
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8001 # 8001 to avoid kerf-engine conflict on 8000
For frontend development:
cd frontend && npm ci && npm run dev
Key Commands
| Task | Command |
|---|---|
| Start full stack | docker compose up -d |
| Rebuild after code changes | docker compose build && docker compose up -d |
| Run migrations | docker exec chrysopedia-api alembic upgrade head |
| Create migration | alembic revision --autogenerate -m "description" |
| View API logs | docker logs -f chrysopedia-api |
| View worker logs | docker logs -f chrysopedia-worker |
| Run tests | cd backend && pytest |
| Frontend dev server | cd frontend && npm run dev |
| Frontend build | cd frontend && npm run build |
| Prompt quality CLI | cd backend && python -m pipeline.quality |
| Deploy to ub01 | ssh ub01; cd /vmPool/r/repos/xpltdco/chrysopedia; git pull && docker compose build && docker compose up -d |
CI/CD Pipeline
None. No .github/workflows/, no CI config files. Deployment is manual: git pull && docker compose build && docker compose up -d on ub01. [inferred — high confidence based on absence of any CI configuration]
Code Conventions
- Python: No linter config (no ruff, black, flake8 config files found). Code follows PEP 8 by convention. Type hints used throughout (Python 3.12 features like
X | None). - TypeScript: No ESLint config. TypeScript strict mode via tsconfig. Zero-dependency UI (no UI libraries, no Tailwind).
- CSS: Single monolithic
App.css(4,871 lines). 77 CSS custom properties for theming. Dark theme with cyan accent (#22d3ee). - Naming: Slugified URLs, snake_case Python, camelCase TypeScript. SQLAlchemy models use
Mappedannotations. Pydantic schemas usemodel_config = {"from_attributes": True}. - No pre-commit hooks, no
.editorconfig, no formatter configs.
Current State Assessment
Stage: Integration/Stabilization — All 13 milestones complete. 28 of 32 requirements validated. 171 commits over 3 days (March 29–April 1, 2026) by a single contributor. The system is deployed and running. However, it was built rapidly by AI agents (GSD workflow), the pipeline is running inline (not via Celery chain as originally designed per recent commit 29f6e74), and there are no CI/CD guardrails. The codebase is functional but hasn't been through the hardening that comes from sustained multi-user operation.
Recent Activity
- 171 commits from 2026-03-29 to 2026-04-01 (3 days of intense development)
- Single contributor: jlightner
- Last commit:
29f6e74— "pipeline: run stages inline instead of Celery chain dispatch" - Most recent work: Stage 5 prompt optimization (100 variant prompts generated), inline pipeline execution, prompt quality toolkit (M013)
Active Branches
Only main exists. All development has been on a single branch. No feature branches, no release branches.
What's Working
- Full 6-stage pipeline (transcription → ingestion → LLM extraction → review → synthesis → search)
- Docker Compose deployment with 8 services, healthchecks on all containers
- Search (semantic via Qdrant + keyword fallback with multi-token AND matching)
- Admin review queue with approve/edit/reject workflows
- Pipeline admin dashboard with event logs, token usage, retrigger controls
- 10-page React SPA with responsive design, topic taxonomy, creator browse, technique detail
- Folder watcher for auto-ingestion of new transcripts
- Article versioning with pipeline metadata snapshots
- 65 integration tests covering all major API paths
- Prompt quality toolkit (fitness tests, scoring, automated optimization)
What's In Progress
- Stage 5 prompt optimization: 100 variant prompts generated (
prompts/stage5_variants/), active A/B testing with the quality toolkit. The most recent commits are all prompt refinement. - Inline pipeline execution: The latest commit switches from Celery chain dispatch to inline stage execution, suggesting the Celery chaining had issues.
generate_stage5_variants.py(874 lines) is a one-off script at project root — should likely be absorbed into the quality toolkit or removed.
Technical Debt Inventory
Zero TODOs/FIXMEs/HACKs in source code. All annotations found were in node_modules/ (third-party). This is notable — either debt was addressed as it arose, or code annotations weren't used as a practice.
Implicit debt captured in KNOWLEDGE.md:
- QdrantManager uses random UUIDs for point IDs, causing duplicates on re-index (noted as deferred fix — use deterministic UUIDs)
- LLM-generated topic categories have inconsistent casing (deferred)
- Stage 4 classification data stored in Redis with 24h TTL instead of DB columns (expedient but fragile)
Structural debt:
frontend/src/App.css— 4,871-line monolithic stylesheet. No CSS modules, no component-scoped styles.backend/pipeline/stages.py— 2,102 lines. All 4 LLM stages + orchestrator in one file.generate_stage5_variants.py— 874-line one-off script at project root.prompts/stage5_variants/.v016.txt.swp— vim swap file committed (harmless but untidy).- No authentication on any endpoint (admin or public). Single-admin tool by design, but the admin endpoints are exposed to anyone on the network.
- CORS allows all origins (
"*").
Test Coverage
- Framework: pytest + pytest-asyncio
- Test count: 65 tests across 4 files (ingest: 6, pipeline: 11, public API: 26, search: 22)
- Test LOC: 2,754 (27% of backend source LOC)
- Approach: Integration tests against real PostgreSQL with NullPool. Mock LLM responses via fixtures. httpx.AsyncClient with ASGI transport for API tests.
- Missing: No frontend tests. No unit tests for pipeline stages in isolation. No load/performance tests. No test for the watcher service. No test for the quality toolkit.
- No CI: Tests are run manually (
cd backend && pytest).
Documentation Status
- README.md: Comprehensive (19KB) — architecture diagrams, quick start, full API reference, environment variables, deployment instructions. High quality.
- chrysopedia-spec.md: Detailed 37-page product specification. Thorough and thoughtful.
- CLAUDE.md: Development reference with deployment info and quick commands.
- GSD artifacts: 13 milestone summaries, 23 decisions, 32 requirements, extensive KNOWLEDGE.md with 30+ lessons learned. Unusually thorough project history.
- prompts/README.md: Exists (not inspected in detail).
- whisper/README.md: Exists for transcription docs.
- Missing: No API documentation generation (no OpenAPI spec export, though FastAPI auto-generates one at
/docs). No architecture decision records beyond GSD decisions. No runbook for operations/debugging.
Red Flags & Observations
Security
- No authentication on any endpoint. Admin endpoints (pipeline control, review queue, debug mode toggle) are accessible to anyone who can reach the server. Acceptable for a single-user tool on a private network, but risky if the port is ever exposed.
- CORS allows all origins (
cors_origins: ["*"]). No restriction on which domains can call the API. backend/.envcontains a real API key (sk-dcdd...). Not tracked in git (correctly gitignored), but present on disk. Standard for local dev.APP_SECRET_KEYdefaults tochangeme-generate-a-real-secretin config.py. If the .env doesn't override this, it's a predictable secret (though it's unclear if anything actually uses it — no session/JWT middleware found).
Architectural Concerns
- Monolithic CSS file (4,871 lines). Any style change requires searching through a single massive file. No component isolation.
- stages.py god file (2,102 lines). Four LLM stages + orchestrator + helpers all in one module. Each stage is a complex function with JSON parsing, error recovery, and DB writes.
- Pipeline switched from Celery chains to inline execution (latest commit). This suggests Celery task chaining had reliability issues. Inline execution means the API request thread runs all LLM stages synchronously — a single pipeline run could take 10+ minutes blocking a worker.
- Qdrant duplicate points on re-index (documented in KNOWLEDGE.md, unfixed). Random UUIDs mean every re-embed creates duplicates instead of upserts.
- No retry/backoff on LLM API calls beyond the primary→fallback pattern. If both endpoints are down, the pipeline fails immediately.
Fragile Areas
- Classification data in Redis with 24h TTL. If Redis restarts between stage 4 and stage 5, classification data is lost and stage 5 fails or produces degraded output.
- Frontend has zero type-safe API layer. The
public-client.tsusesfetch()directly. No generated types from the backend schema. API contract drift is possible. - Single-branch development. All 171 commits on
main. No protection against broken deploys.
Inconsistencies
- FastAPI version in
app = FastAPI(version="0.1.0")vspackage.jsonversion"0.8.0". No single source of truth for the project version.
Trajectory & Opportunities
Where It's Heading
The most recent work is prompt quality optimization — generating 100 stage 5 variants and building automated A/B testing infrastructure. The project owner is clearly focused on improving the LLM output quality now that the infrastructure is stable.
The inline pipeline execution change suggests the next phase may involve processing real video content at scale and encountering reliability issues with the current architecture.
Partially Built / Stubbed Features
- Content reports — Model and API exist (
ContentReport,/api/v1/reports), admin reports page exists, but unclear if actively used. - View counts —
view_countfield on Creator and TechniquePage models, but no increment logic found. Fields default to 0. - Creator hidden flag —
hiddenboolean on Creator model (migration 009), but no admin UI to toggle it. - Genre filtering on Creators page — Spec mentions it, UI has it, but genre data depends on pipeline classification which may not populate genres consistently.
Capability Gaps
- No authentication/authorization. Adding a simple API key or basic auth for admin endpoints would be a quick security win.
- No WebSocket/SSE for pipeline progress. The admin UI polls for pipeline status. Real-time updates would improve the pipeline monitoring experience.
- No full-text search index. Keyword search uses
ILIKEwhich doesn't scale. PostgreSQLtsvector/GIN index would be significantly faster. - No backup strategy documented. PostgreSQL data and Qdrant vectors are on bind mounts but no backup cron or strategy is mentioned.
- No content analytics. No view tracking, no search query logging, no usage metrics beyond pipeline token counts.
Low-Hanging Fruit
- Fix Qdrant duplicate points — Switch to deterministic UUIDs based on content hash. Small change, big data quality impact.
- Add basic auth to admin endpoints — A single API key middleware for
/admin/*and/review/*routes. - Split
stages.py— Extract each stage into its own module. The file is already structured with clear stage boundaries. - Normalize topic category casing —
.lower()or.title()in stage 4 output. One-line fix for data consistency. - Delete
generate_stage5_variants.pyfrom project root (or move into quality toolkit). - Add a
Makefilewith common commands (build, test, deploy, migrate) to replace the manual command documentation.
Logical Next Features
Based on the trajectory and spec:
- Batch processing pipeline — Process the full video library (100-500 files). Will stress-test pipeline reliability.
- Content analytics — View tracking, popular searches, usage patterns.
- Improved search — Full-text search index, search result ranking improvements, faceted filtering.
- Multi-user support — Authentication, user-specific bookmarks/notes on techniques.
- Video timestamp deep links — If videos are accessible on the network, link directly to the timestamp in a player.
Key Files Reference
| File | Purpose |
|---|---|
chrysopedia-spec.md |
Full product specification (37 pages) — read first for product understanding |
README.md |
Architecture, setup, API reference, deployment guide |
CLAUDE.md |
Development context and canonical directory warning |
backend/main.py |
FastAPI app entry point, middleware, router mounting |
backend/config.py |
All environment variables with defaults (Pydantic Settings) |
backend/models.py |
All 11 SQLAlchemy ORM models — the data model source of truth |
backend/schemas.py |
Pydantic request/response schemas |
backend/pipeline/stages.py |
LLM pipeline — all 4 stages and orchestrator (the most complex file) |
backend/pipeline/llm_client.py |
LLM API client with primary/fallback and thinking mode support |
backend/search_service.py |
Semantic + keyword search implementation |
backend/watcher.py |
Transcript folder watcher service |
frontend/src/App.tsx |
React app root with routing |
frontend/src/App.css |
All styles (4,871 lines) |
frontend/src/api/public-client.ts |
Typed API client |
config/canonical_tags.yaml |
7-category topic taxonomy definition |
docker-compose.yml |
Full 8-service stack definition |
.env.example |
Environment variable template |
.gsd/PROJECT.md |
Living project state document with milestone history |
.gsd/KNOWLEDGE.md |
Lessons learned and patterns (30+ entries) — invaluable for newcomers |
.gsd/DECISIONS.md |
23 architectural decisions with rationale |
.gsd/REQUIREMENTS.md |
32 requirements with validation status |
Uncertainties & Open Questions
-
Is the pipeline actually processing real content? The system is deployed, but it's unclear how many videos have been processed through the pipeline. The test fixtures use sample data, and the prompt optimization work suggests the pipeline output quality isn't yet satisfactory. [inferred — medium confidence]
-
Why did Celery chain dispatch get replaced with inline execution? The latest commit (
29f6e74) switches to inline, but no commit message explains the issue. Was it a Celery reliability problem, a debugging convenience, or a permanent architectural change? [unknown — needs project owner input] -
Is the domain
chrysopedia.xpltd.coactually configured? M003 mentions domain + DNS setup, KNOWLEDGE.md documents the XPLTD domain flow, but the nginx config usesserver_name _(catch-all). [inferred — likely configured on nuc01's nginx, not in this codebase] -
What's the actual LLM infrastructure? References to "DGX Sparks Qwen" and "FYN" suggest a private GPU cluster. The API endpoint is
chat.forgetyour.namewhich appears to be an OpenWebUI instance. The relationship between these systems and their reliability characteristics would matter for pipeline scaling. [low confidence — outside codebase] -
Are there plans for multi-user access? The spec says "single-admin tool" but the architecture (separate frontend, API, PostgreSQL) could support multiple users. No authentication means this is purely a trust-boundary question. [inferred — currently single-user by design]
-
What is the
CHRYSOPEDIA-ASSESSMENT.md(42KB)? Not read in detail — appears to be a UI/UX assessment that fed into M011 decisions. [low confidence on contents]