jlightner 15dcab201a test: Added BodySection/BodySubSection schema models, changed Synthesiz…

- "backend/pipeline/schemas.py"
- "backend/pipeline/citation_utils.py"
- "backend/pipeline/test_citation_utils.py"

GSD-Task: S01/T01

2026-04-03 00:50:30 +00:00

29 KiB

Raw Blame History

Chrysopedia — Project Context Document

Auto-generated: 2026-04-01 | Assessed Stage: Integration/Stabilization | Root: /home/aux/projects/content-to-kb-automator

Overview

Chrysopedia is a self-hosted knowledge extraction and retrieval system for electronic music production content. It takes raw video files (tutorials, livestreams, track breakdowns) from 50+ electronic music producers, transcribes them via Whisper, runs them through a multi-stage LLM pipeline to extract structured knowledge, and serves the results through a search-first web UI designed for mid-session retrieval — a producer Alt+Tabs from their DAW, searches for a technique, absorbs the answer, and gets back to work in under 30 seconds.

Audience: Electronic music producers, primarily one power user (the project owner) with a personal library of 100-500 video files. Single-admin tool, not multi-tenant.

Project type: Full-stack web application with an LLM-powered data pipeline. Monorepo with backend (Python/FastAPI), frontend (React/TypeScript), Whisper transcription script, Docker Compose deployment, and prompt engineering toolkit.

Evidence for purpose: Extensive 37-page spec (chrysopedia-spec.md), README with architecture diagrams, detailed PROJECT.md in GSD artifacts, 23 decisions logged, 32 requirements tracked (28 validated, 1 active, 4 out-of-scope). Etymology: chrysopoeia (alchemical transmutation) + encyclopedia.

Canonical development directory: This is not the active development location. Per CLAUDE.md, all future development happens on ub01 at /vmPool/r/repos/xpltdco/chrysopedia. This directory was the initial workspace. GitHub: github.com/xpltdco/chrysopedia (private, xpltdco org).

Architecture & Stack

Technology Stack

Layer	Technology	Version/Notes
Backend	Python 3.12, FastAPI, SQLAlchemy (async), Pydantic Settings	API + business logic
Task Queue	Celery + Redis (broker + result backend)	Sync tasks, concurrency=1
Database	PostgreSQL 16 (asyncpg driver)	Primary data store
Vector DB	Qdrant v1.13.2	Semantic search embeddings
Embeddings	Ollama (nomic-embed-text, 768-dim)	Local CPU inference
LLM	OpenAI-compatible API (DGX Sparks Qwen primary, Ollama fallback)	Per-stage model routing (chat vs thinking)
Frontend	React 18.3, TypeScript 5.6, Vite 6, React Router 6.28	Zero UI libraries — all custom CSS
Web Server	nginx 1.27 (Alpine)	SPA routing + API proxy
Containerization	Docker Compose	8 services, dedicated bridge network
Deployment	ub01 (on-premises server)	Bind mounts to `/vmPool/r/services/chrysopedia_*`
Reverse Proxy	nginx on nuc01 (separate machine)	Routes `chrysopedia.xpltd.co` → ub01:8096

System Architecture

Desktop (GPU workstation — hal0022)
  └── whisper/transcribe.py → JSON transcripts → SCP/rsync to /watch folder

Docker Compose on ub01 (8 services on 172.32.0.0/24):
  ┌─────────────┐  ┌───────┐  ┌────────┐  ┌────────┐
  │ PostgreSQL   │  │ Redis │  │ Qdrant │  │ Ollama │
  │ :5433→5432   │  │ broker│  │ vector │  │ embed  │
  └──────┬───────┘  └───┬───┘  └───┬────┘  └───┬────┘
         └──────┬───────┴──────────┴────────────┘
                │
  ┌─────────────┼────────────────────────────────┐
  │ FastAPI API  │  Celery Worker  │  Watcher     │
  │ REST + admin │  LLM pipeline   │  /watch→POST │
  └──────────────┴─────────────────┴──────────────┘
                │
  ┌─────────────┴──────┐
  │ nginx (React SPA)  │
  │ :8096→80           │
  └────────────────────┘

Data flow: Video → Whisper transcript JSON → Watcher POSTs to /api/v1/ingest → Celery pipeline (4 LLM stages: segment → extract → classify → synthesize) → KeyMoments + TechniquePages in PostgreSQL → Embeddings in Qdrant → Search-first web UI.

External integrations:

OpenWebUI at chat.forgetyour.name (DGX Sparks Qwen models for LLM inference)
AdGuard DNS on ub01 for internal domain resolution
nginx on nuc01 for external HTTPS termination (via Certbot)

Data Model

11 entities across 11 tables:

Entity	Purpose	Key Fields
Creator	Artists/producers	name, slug, genres[], folder_name, hidden
SourceVideo	Processed video files	filename, content_hash (dedup), processing_status, classification_data (JSONB)
TranscriptSegment	Whisper output rows	start_time, end_time, text, segment_index, topic_label
KeyMoment	LLM-extracted insights	title, summary, start_time, end_time, content_type, plugins[]
TechniquePage	Synthesized knowledge (primary output)	title, slug, topic_category, topic_tags[], body_sections (JSONB), signal_chains (JSONB), plugins[]
TechniquePageVersion	Pre-overwrite snapshots	content_snapshot (JSONB), pipeline_metadata (JSONB), version_number
RelatedTechniqueLink	Cross-references	source→target, relationship type
Tag	Topic taxonomy	name, category, aliases[]
ContentReport	User-reported issues	report_type, status, admin_notes
PipelineRun	Pipeline execution record	video_id, run_number, trigger, status, total_tokens
PipelineEvent	Per-stage execution log	stage, event_type, token counts, payload (JSONB), debug I/O columns

Relationships: Creator → SourceVideo → TranscriptSegment, KeyMoment; Creator → TechniquePage → KeyMoment, TechniquePageVersion, RelatedTechniqueLink; SourceVideo → PipelineRun → PipelineEvent.

Migrations: 11 Alembic migrations (001 through 011), covering initial schema through pipeline runs and classification cache additions.

Project Structure

chrysopedia/
├── backend/                    # FastAPI application (10,209 LOC Python)
│   ├── main.py                 # App entry, middleware, router mounting
│   ├── config.py               # Pydantic Settings (all env vars)
│   ├── database.py             # Async engine + session factory
│   ├── models.py               # 11 SQLAlchemy ORM models
│   ├── schemas.py              # Pydantic request/response schemas (422 lines)
│   ├── worker.py               # Celery app config
│   ├── watcher.py              # Folder monitor → auto-ingest service
│   ├── search_service.py       # Async semantic + keyword search (603 lines)
│   ├── redis_client.py         # Redis client for feature flags
│   ├── routers/                # 9 API router modules
│   │   ├── health.py, ingest.py, search.py, techniques.py
│   │   ├── creators.py, topics.py, videos.py
│   │   ├── pipeline.py (admin), reports.py
│   ├── pipeline/               # LLM pipeline core (2,908 LOC)
│   │   ├── stages.py           # 4 LLM stages + orchestrator (2,102 lines — largest file)
│   │   ├── llm_client.py       # OpenAI-compatible sync client with fallback
│   │   ├── embedding_client.py # Sync embedding client for Celery
│   │   ├── qdrant_client.py    # Qdrant upsert + collection management
│   │   ├── schemas.py          # Pipeline data schemas
│   │   └── quality/            # Prompt optimization toolkit (2,507 LOC)
│   │       ├── fitness.py      # LLM fitness test suite (9 tests)
│   │       ├── scorer.py       # 5-dimension LLM-as-judge scoring
│   │       ├── optimizer.py    # Automated prompt A/B optimization
│   │       ├── variant_generator.py  # LLM-powered prompt mutation
│   │       └── voice_dial.py   # Voice preservation dial
│   └── tests/                  # Integration tests (2,754 LOC, 65 tests)
├── frontend/                   # React SPA (9,975 LOC TypeScript + CSS)
│   └── src/
│       ├── pages/              # 10 page components
│       ├── components/         # 9 shared components
│       ├── hooks/              # 2 custom hooks
│       ├── api/                # Typed API client
│       └── App.css             # 4,871 lines — all styles (no CSS framework)
├── whisper/                    # Desktop transcription scripts
├── prompts/                    # 3 active prompt templates + 100 stage5 variants
├── alembic/                    # 11 database migrations
├── config/                     # canonical_tags.yaml (7-category topic taxonomy)
├── docker/                     # Dockerfile.api, Dockerfile.web, nginx.conf
├── docker-compose.yml          # 8-service stack definition
├── generate_stage5_variants.py # Stage 5 prompt variant generator (874 lines — one-off tool)
├── .gsd/                       # GSD project management artifacts
│   ├── PROJECT.md, REQUIREMENTS.md, DECISIONS.md, KNOWLEDGE.md
│   └── milestones/             # 13 completed milestone artifacts
└── .env.example                # Environment variable template

Entry points:

backend/main.py → FastAPI app (uvicorn main:app)
backend/worker.py → Celery worker (celery -A worker worker)
backend/watcher.py → Folder watcher service (python watcher.py)
frontend/src/main.tsx → React app (Vite dev server or nginx-served build)
whisper/transcribe.py → Desktop transcription CLI
backend/pipeline/quality/__main__.py → Prompt quality toolkit CLI

Configuration & Environment

Environment Variables

Variable	Purpose	Default
`POSTGRES_USER`	Database user	`chrysopedia`
`POSTGRES_PASSWORD`	Database password	`changeme`
`POSTGRES_DB`	Database name	`chrysopedia`
`DATABASE_URL`	Full async connection string	Composed from above
`REDIS_URL`	Redis broker URL	`redis://chrysopedia-redis:6379/0`
`LLM_API_URL`	Primary LLM endpoint	OpenWebUI on DGX
`LLM_API_KEY`	LLM authentication	Required
`LLM_MODEL`	Default LLM model name	`fyn-llm-agent-chat`
`LLM_FALLBACK_URL` / `_MODEL`	Fallback LLM endpoint	Same as primary
`LLM_STAGE{2-5}_MODEL`	Per-stage model override	chat for 2/4, think for 3/5
`LLM_STAGE{2-5}_MODALITY`	chat or thinking per stage	See above
`LLM_MAX_TOKENS`	LLM response token limit	`32768`
`LLM_TEMPERATURE`	LLM temperature	`0.0` (deterministic)
`SYNTHESIS_CHUNK_SIZE`	Max moments per synthesis call	`30`
`EMBEDDING_API_URL`	Ollama embedding endpoint	Container-internal
`EMBEDDING_MODEL`	Embedding model name	`nomic-embed-text`
`EMBEDDING_DIMENSIONS`	Vector dimensionality	`768`
`QDRANT_URL`	Qdrant endpoint	Container-internal
`QDRANT_COLLECTION`	Qdrant collection name	`chrysopedia`
`APP_ENV`	Environment name	`development`
`APP_LOG_LEVEL`	Log level	`info`
`APP_SECRET_KEY`	Application secret	`changeme-generate-a-real-secret`
`CORS_ORIGINS`	Allowed CORS origins	`["*"]`
`REVIEW_MODE`	Require admin review of moments	`true`
`DEBUG_MODE`	Capture full LLM I/O in events	`false`
`TRANSCRIPT_STORAGE_PATH`	Transcript file storage	`/data/transcripts`
`VIDEO_METADATA_PATH`	Video metadata storage	`/data/video_meta`
`PROMPTS_PATH`	Prompt template directory	`./prompts`
`GIT_COMMIT_SHA`	Build-time commit hash	`unknown`
`WATCH_FOLDER`	Watcher monitored directory	`/watch`
`WATCHER_API_URL`	Ingest endpoint for watcher	Container-internal
`WATCHER_STABILITY_SECONDS`	File stability wait time	`2`
`WATCHER_POLL_INTERVAL`	Filesystem poll interval	`5`
`GIT_COMMIT_SHA` (build arg)	Passed at Docker build time for footer	`dev`
`VITE_GIT_COMMIT` (build arg)	Frontend build-time constant	`dev`

Environments

Production: Docker Compose on ub01, .env file with real credentials
Local dev: Backend runs locally with docker compose up -d chrysopedia-db chrysopedia-redis, .env in backend/
Test: Uses real PostgreSQL (test database), configured in backend/tests/conftest.py
No staging environment exists.

Secrets Management

Environment variables via .env file (gitignored). No vault, KMS, or sealed secrets. The .env.example contains placeholders. backend/.env exists locally (not tracked in git) and contains a real API key — this is expected for local dev but the key should be rotated if this directory is ever shared.

Development Workflow

Getting Started

# 1. Clone the repo
git clone git@github.com:xpltdco/chrysopedia.git
cd chrysopedia

# 2. Configure environment
cp .env.example .env
# Edit .env with real LLM_API_KEY and POSTGRES_PASSWORD

# 3. Start infrastructure
docker compose up -d

# 4. Run migrations
docker exec chrysopedia-api alembic upgrade head

# 5. Pull embedding model (first time)
docker exec chrysopedia-ollama ollama pull nomic-embed-text

# 6. Verify
curl http://localhost:8096/health

For local backend development (outside Docker):

python -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
docker compose up -d chrysopedia-db chrysopedia-redis  # just infra
alembic upgrade head
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8001  # 8001 to avoid kerf-engine conflict on 8000

For frontend development:

cd frontend && npm ci && npm run dev

Key Commands

Task	Command
Start full stack	`docker compose up -d`
Rebuild after code changes	`docker compose build && docker compose up -d`
Run migrations	`docker exec chrysopedia-api alembic upgrade head`
Create migration	`alembic revision --autogenerate -m "description"`
View API logs	`docker logs -f chrysopedia-api`
View worker logs	`docker logs -f chrysopedia-worker`
Run tests	`cd backend && pytest`
Frontend dev server	`cd frontend && npm run dev`
Frontend build	`cd frontend && npm run build`
Prompt quality CLI	`cd backend && python -m pipeline.quality`
Deploy to ub01	`ssh ub01; cd /vmPool/r/repos/xpltdco/chrysopedia; git pull && docker compose build && docker compose up -d`

CI/CD Pipeline

None. No .github/workflows/, no CI config files. Deployment is manual: git pull && docker compose build && docker compose up -d on ub01. [inferred — high confidence based on absence of any CI configuration]

Code Conventions

Python: No linter config (no ruff, black, flake8 config files found). Code follows PEP 8 by convention. Type hints used throughout (Python 3.12 features like X | None).
TypeScript: No ESLint config. TypeScript strict mode via tsconfig. Zero-dependency UI (no UI libraries, no Tailwind).
CSS: Single monolithic App.css (4,871 lines). 77 CSS custom properties for theming. Dark theme with cyan accent (#22d3ee).
Naming: Slugified URLs, snake_case Python, camelCase TypeScript. SQLAlchemy models use Mapped annotations. Pydantic schemas use model_config = {"from_attributes": True}.
No pre-commit hooks, no .editorconfig, no formatter configs.

Current State Assessment

Stage: Integration/Stabilization — All 13 milestones complete. 28 of 32 requirements validated. 171 commits over 3 days (March 29–April 1, 2026) by a single contributor. The system is deployed and running. However, it was built rapidly by AI agents (GSD workflow), the pipeline is running inline (not via Celery chain as originally designed per recent commit 29f6e74), and there are no CI/CD guardrails. The codebase is functional but hasn't been through the hardening that comes from sustained multi-user operation.

Recent Activity

171 commits from 2026-03-29 to 2026-04-01 (3 days of intense development)
Single contributor: jlightner
Last commit: 29f6e74 — "pipeline: run stages inline instead of Celery chain dispatch"
Most recent work: Stage 5 prompt optimization (100 variant prompts generated), inline pipeline execution, prompt quality toolkit (M013)

Active Branches

Only main exists. All development has been on a single branch. No feature branches, no release branches.

What's Working

Full 6-stage pipeline (transcription → ingestion → LLM extraction → review → synthesis → search)
Docker Compose deployment with 8 services, healthchecks on all containers
Search (semantic via Qdrant + keyword fallback with multi-token AND matching)
Admin review queue with approve/edit/reject workflows
Pipeline admin dashboard with event logs, token usage, retrigger controls
10-page React SPA with responsive design, topic taxonomy, creator browse, technique detail
Folder watcher for auto-ingestion of new transcripts
Article versioning with pipeline metadata snapshots
65 integration tests covering all major API paths
Prompt quality toolkit (fitness tests, scoring, automated optimization)

What's In Progress

Stage 5 prompt optimization: 100 variant prompts generated (prompts/stage5_variants/), active A/B testing with the quality toolkit. The most recent commits are all prompt refinement.
Inline pipeline execution: The latest commit switches from Celery chain dispatch to inline stage execution, suggesting the Celery chaining had issues.
generate_stage5_variants.py (874 lines) is a one-off script at project root — should likely be absorbed into the quality toolkit or removed.

Technical Debt Inventory

Zero TODOs/FIXMEs/HACKs in source code. All annotations found were in node_modules/ (third-party). This is notable — either debt was addressed as it arose, or code annotations weren't used as a practice.

Implicit debt captured in KNOWLEDGE.md:

QdrantManager uses random UUIDs for point IDs, causing duplicates on re-index (noted as deferred fix — use deterministic UUIDs)
LLM-generated topic categories have inconsistent casing (deferred)
Stage 4 classification data stored in Redis with 24h TTL instead of DB columns (expedient but fragile)

Structural debt:

frontend/src/App.css — 4,871-line monolithic stylesheet. No CSS modules, no component-scoped styles.
backend/pipeline/stages.py — 2,102 lines. All 4 LLM stages + orchestrator in one file.
generate_stage5_variants.py — 874-line one-off script at project root.
prompts/stage5_variants/.v016.txt.swp — vim swap file committed (harmless but untidy).
No authentication on any endpoint (admin or public). Single-admin tool by design, but the admin endpoints are exposed to anyone on the network.
CORS allows all origins ("*").

Test Coverage

Framework: pytest + pytest-asyncio
Test count: 65 tests across 4 files (ingest: 6, pipeline: 11, public API: 26, search: 22)
Test LOC: 2,754 (27% of backend source LOC)
Approach: Integration tests against real PostgreSQL with NullPool. Mock LLM responses via fixtures. httpx.AsyncClient with ASGI transport for API tests.
Missing: No frontend tests. No unit tests for pipeline stages in isolation. No load/performance tests. No test for the watcher service. No test for the quality toolkit.
No CI: Tests are run manually (cd backend && pytest).

Documentation Status

README.md: Comprehensive (19KB) — architecture diagrams, quick start, full API reference, environment variables, deployment instructions. High quality.
chrysopedia-spec.md: Detailed 37-page product specification. Thorough and thoughtful.
CLAUDE.md: Development reference with deployment info and quick commands.
GSD artifacts: 13 milestone summaries, 23 decisions, 32 requirements, extensive KNOWLEDGE.md with 30+ lessons learned. Unusually thorough project history.
prompts/README.md: Exists (not inspected in detail).
whisper/README.md: Exists for transcription docs.
Missing: No API documentation generation (no OpenAPI spec export, though FastAPI auto-generates one at /docs). No architecture decision records beyond GSD decisions. No runbook for operations/debugging.

Red Flags & Observations

Security

No authentication on any endpoint. Admin endpoints (pipeline control, review queue, debug mode toggle) are accessible to anyone who can reach the server. Acceptable for a single-user tool on a private network, but risky if the port is ever exposed.
CORS allows all origins (cors_origins: ["*"]). No restriction on which domains can call the API.
backend/.env contains a real API key (sk-dcdd...). Not tracked in git (correctly gitignored), but present on disk. Standard for local dev.
APP_SECRET_KEY defaults to changeme-generate-a-real-secret in config.py. If the .env doesn't override this, it's a predictable secret (though it's unclear if anything actually uses it — no session/JWT middleware found).

Architectural Concerns

Monolithic CSS file (4,871 lines). Any style change requires searching through a single massive file. No component isolation.
stages.py god file (2,102 lines). Four LLM stages + orchestrator + helpers all in one module. Each stage is a complex function with JSON parsing, error recovery, and DB writes.
Pipeline switched from Celery chains to inline execution (latest commit). This suggests Celery task chaining had reliability issues. Inline execution means the API request thread runs all LLM stages synchronously — a single pipeline run could take 10+ minutes blocking a worker.
Qdrant duplicate points on re-index (documented in KNOWLEDGE.md, unfixed). Random UUIDs mean every re-embed creates duplicates instead of upserts.
No retry/backoff on LLM API calls beyond the primary→fallback pattern. If both endpoints are down, the pipeline fails immediately.

Fragile Areas

Classification data in Redis with 24h TTL. If Redis restarts between stage 4 and stage 5, classification data is lost and stage 5 fails or produces degraded output.
Frontend has zero type-safe API layer. The public-client.ts uses fetch() directly. No generated types from the backend schema. API contract drift is possible.
Single-branch development. All 171 commits on main. No protection against broken deploys.

Inconsistencies

FastAPI version in app = FastAPI(version="0.1.0") vs package.json version "0.8.0". No single source of truth for the project version.

Trajectory & Opportunities

Where It's Heading

The most recent work is prompt quality optimization — generating 100 stage 5 variants and building automated A/B testing infrastructure. The project owner is clearly focused on improving the LLM output quality now that the infrastructure is stable.

The inline pipeline execution change suggests the next phase may involve processing real video content at scale and encountering reliability issues with the current architecture.

Partially Built / Stubbed Features

Content reports — Model and API exist (ContentReport, /api/v1/reports), admin reports page exists, but unclear if actively used.
View counts — view_count field on Creator and TechniquePage models, but no increment logic found. Fields default to 0.
Creator hidden flag — hidden boolean on Creator model (migration 009), but no admin UI to toggle it.
Genre filtering on Creators page — Spec mentions it, UI has it, but genre data depends on pipeline classification which may not populate genres consistently.

Capability Gaps

No authentication/authorization. Adding a simple API key or basic auth for admin endpoints would be a quick security win.
No WebSocket/SSE for pipeline progress. The admin UI polls for pipeline status. Real-time updates would improve the pipeline monitoring experience.
No full-text search index. Keyword search uses ILIKE which doesn't scale. PostgreSQL tsvector/GIN index would be significantly faster.
No backup strategy documented. PostgreSQL data and Qdrant vectors are on bind mounts but no backup cron or strategy is mentioned.
No content analytics. No view tracking, no search query logging, no usage metrics beyond pipeline token counts.

Low-Hanging Fruit

Fix Qdrant duplicate points — Switch to deterministic UUIDs based on content hash. Small change, big data quality impact.
Add basic auth to admin endpoints — A single API key middleware for /admin/* and /review/* routes.
Split stages.py — Extract each stage into its own module. The file is already structured with clear stage boundaries.
Normalize topic category casing — .lower() or .title() in stage 4 output. One-line fix for data consistency.
Delete generate_stage5_variants.py from project root (or move into quality toolkit).
Add a Makefile with common commands (build, test, deploy, migrate) to replace the manual command documentation.

Logical Next Features

Based on the trajectory and spec:

Batch processing pipeline — Process the full video library (100-500 files). Will stress-test pipeline reliability.
Content analytics — View tracking, popular searches, usage patterns.
Improved search — Full-text search index, search result ranking improvements, faceted filtering.
Multi-user support — Authentication, user-specific bookmarks/notes on techniques.
Video timestamp deep links — If videos are accessible on the network, link directly to the timestamp in a player.

Key Files Reference

File	Purpose
`chrysopedia-spec.md`	Full product specification (37 pages) — read first for product understanding
`README.md`	Architecture, setup, API reference, deployment guide
`CLAUDE.md`	Development context and canonical directory warning
`backend/main.py`	FastAPI app entry point, middleware, router mounting
`backend/config.py`	All environment variables with defaults (Pydantic Settings)
`backend/models.py`	All 11 SQLAlchemy ORM models — the data model source of truth
`backend/schemas.py`	Pydantic request/response schemas
`backend/pipeline/stages.py`	LLM pipeline — all 4 stages and orchestrator (the most complex file)
`backend/pipeline/llm_client.py`	LLM API client with primary/fallback and thinking mode support
`backend/search_service.py`	Semantic + keyword search implementation
`backend/watcher.py`	Transcript folder watcher service
`frontend/src/App.tsx`	React app root with routing
`frontend/src/App.css`	All styles (4,871 lines)
`frontend/src/api/public-client.ts`	Typed API client
`config/canonical_tags.yaml`	7-category topic taxonomy definition
`docker-compose.yml`	Full 8-service stack definition
`.env.example`	Environment variable template
`.gsd/PROJECT.md`	Living project state document with milestone history
`.gsd/KNOWLEDGE.md`	Lessons learned and patterns (30+ entries) — invaluable for newcomers
`.gsd/DECISIONS.md`	23 architectural decisions with rationale
`.gsd/REQUIREMENTS.md`	32 requirements with validation status

Uncertainties & Open Questions

Is the pipeline actually processing real content? The system is deployed, but it's unclear how many videos have been processed through the pipeline. The test fixtures use sample data, and the prompt optimization work suggests the pipeline output quality isn't yet satisfactory. [inferred — medium confidence]
Why did Celery chain dispatch get replaced with inline execution? The latest commit (29f6e74) switches to inline, but no commit message explains the issue. Was it a Celery reliability problem, a debugging convenience, or a permanent architectural change? [unknown — needs project owner input]
Is the domain chrysopedia.xpltd.co actually configured? M003 mentions domain + DNS setup, KNOWLEDGE.md documents the XPLTD domain flow, but the nginx config uses server_name _ (catch-all). [inferred — likely configured on nuc01's nginx, not in this codebase]
What's the actual LLM infrastructure? References to "DGX Sparks Qwen" and "FYN" suggest a private GPU cluster. The API endpoint is chat.forgetyour.name which appears to be an OpenWebUI instance. The relationship between these systems and their reliability characteristics would matter for pipeline scaling. [low confidence — outside codebase]
Are there plans for multi-user access? The spec says "single-admin tool" but the architecture (separate frontend, API, PostgreSQL) could support multiple users. No authentication means this is purely a trust-boundary question. [inferred — currently single-user by design]
What is the CHRYSOPEDIA-ASSESSMENT.md (42KB)? Not read in detail — appears to be a UI/UX assessment that fed into M011 decisions. [low confidence on contents]

29 KiB Raw Blame History Unescape Escape