MAESTRO: Initialize repository with README, .gitignore, and project files

Add README.md with project description, quick-start instructions, and AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and env.example.
2026-04-07 01:39:18 -05:00 · 2026-04-07 01:39:18 -05:00 · fc2e4cd7d1
commit fc2e4cd7d1
6 changed files with 1013 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,57 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.egg-info/
 *.egg
 dist/
 build/
 .eggs/
 *.whl
 .venv/
 venv/
 env/
 .env
 *.pyc
 .pytest_cache/
 .mypy_cache/
 .ruff_cache/
 htmlcov/
 .coverage
 .coverage.*
 # Node / Frontend
 node_modules/
 frontend/dist/
 frontend/build/
 .npm
 *.tsbuildinfo
 # Docker
 docker/nginx.conf.bak
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 .DS_Store
 # OS
 Thumbs.db
 Desktop.ini
 # Data (single-container mode)
 *.db
 /data/
 # Alembic
 alembic/versions/__pycache__/
 # Auto Run Docs (Maestro working files)
 Auto Run Docs/Working/
 # Misc
 *.log
 *.bak
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,127 @@
 # CLAUDE.md — PromptLooper
 ## What is this project?
 PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
 ## Repository
 - **Hosted at**: git.xpltd.co/xpltdco/promptlooper
 - **XPLTD project name**: `xpltd_promptlooper`
 - **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
 ## Tech Stack
 - **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
 - **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
 - **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
 - **Cache/Queue**: Redis 7 (production) / in-process (single-container)
 - **Real-time**: WebSocket via FastAPI + Redis pub/sub
 - **MCP**: Python MCP SDK
 - **Container**: Multi-stage Docker build, nginx for frontend
 ## XPLTD Conventions
 These are non-negotiable project conventions shared across all XPLTD projects:
 - Docker Compose project name: `xpltd_promptlooper`
 - Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
 - Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
 - PostgreSQL on external port `5434` (internal `5432`)
 - Web UI on port `8400`
 - MCP server on port `8401`
 - Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
 ## Key Architecture Decisions
 1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
 2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
 3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
 4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
 5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
 6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
 ## File Organization
 ```
 backend/
  main.py              — FastAPI app, middleware, router mounting
  config.py            — Pydantic Settings from env vars
  models.py            — SQLAlchemy ORM models
  schemas.py           — Pydantic request/response schemas
  auth.py              — JWT + API key authentication
  worker.py            — Celery app configuration
  routers/             — API endpoint handlers
  engine/              — Core experiment execution logic
    runner.py          — Individual run execution
    sweep.py           — Sweep orchestration (grid/random/guided)
    cache.py           — Response cache layer
    adapters/          — LLM endpoint adapters
    scorers/           — Pluggable scoring functions
  mcp/                 — MCP server implementation
  websocket/           — WebSocket connection management
 frontend/src/
  pages/               — Route-level components
  components/          — Shared UI components
  api/                 — Typed API client functions
 ```
 ## Database Migrations
 Use Alembic. Same patterns as Chrysopedia:
 ```bash
 alembic revision --autogenerate -m "describe_change"
 alembic upgrade head
 ```
 ## Running Locally
 ```bash
 docker compose up -d promptlooper-db promptlooper-redis
 cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # Frontend in another terminal:
 cd frontend && npm run dev
 ```
 ## Testing
 ```bash
 cd backend && pytest
 cd frontend && npm test
 ```
 ## Important Patterns
 ### Adding a new scorer
 1. Create `backend/engine/scorers/my_scorer.py`
 2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
 3. Register in `backend/engine/scorers/__init__.py`
 4. Add to frontend scorer picker component
 ### Adding a new LLM adapter
 1. Create `backend/engine/adapters/my_adapter.py`
 2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
 3. Register in `backend/engine/adapters/__init__.py`
 4. Currently only OpenAI-compatible is implemented; all others should be edge cases
 ### Adding a new MCP tool
 1. Add tool definition in `backend/mcp/tools.py`
 2. Implement handler in `backend/mcp/server.py`
 3. Tools should map 1:1 to API endpoints where possible
 ## Common Gotchas
 - Always hash the FULL config when checking cache — missing a single parameter means cache misses
 - WebSocket connections must be cleaned up on disconnect — use the connection manager
 - SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
 - Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
 - MCP server runs on a separate port from the main API
 ## Deployment
 ```bash
 ssh ub01
 cd /vmPool/r/repos/xpltdco/promptlooper
 git pull && docker compose build && docker compose up -d
 ```
--- a/README.md
+++ b/README.md
@ -0,0 +1,65 @@
 # PromptLooper
 [![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
 [![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
 > The one who loops prompts — a universal LLM pipeline tuning workbench.
 PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
 It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
 ## Quick Start
 ### Single Container (zero dependencies)
 ```bash
 docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
 ```
 Open `http://localhost:8400` — you'll be prompted to create an admin account on first boot.
 ### Production (Docker Compose)
 ```bash
 git clone git@git.xpltd.co:xpltdco/promptlooper.git
 cd promptlooper
 cp .env.example .env
 # Edit .env — set POSTGRES_PASSWORD and JWT_SECRET at minimum
 docker compose up -d
 ```
 ## Features
 - **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
 - **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
 - **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
 - **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
 - **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
 - **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
 ## Development
 ```bash
 # Start backing services
 docker compose up -d promptlooper-db promptlooper-redis
 # Backend
 cd backend && pip install -r requirements.txt
 alembic upgrade head
 uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # Frontend (separate terminal)
 cd frontend && npm install && npm run dev
 ```
 ## Testing
 ```bash
 cd backend && pytest
 cd frontend && npm test
 ```
 ## License
 [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,106 @@
 name: xpltd_promptlooper
 networks:
  promptlooper:
    driver: bridge
    ipam:
      config:
        - subnet: 172.33.0.0/24
 services:
  promptlooper-db:
    image: postgres:16-alpine
    container_name: promptlooper-db
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "5434:5432"
    environment:
      POSTGRES_USER: ${POSTGRES_USER:-promptlooper}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env}
      POSTGRES_DB: ${POSTGRES_DB:-promptlooper}
    volumes:
      - /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-promptlooper}"]
      interval: 10s
      timeout: 5s
      retries: 5
  promptlooper-redis:
    image: redis:7-alpine
    container_name: promptlooper-redis
    restart: unless-stopped
    networks:
      - promptlooper
    volumes:
      - /vmPool/r/services/promptlooper_redis:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
  promptlooper-api:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: api
    container_name: promptlooper-api
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "8401:8401"  # MCP server
    environment:
      DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
      REDIS_URL: redis://promptlooper-redis:6379/0
      JWT_SECRET: ${JWT_SECRET:?Set JWT_SECRET in .env}
      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
      MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
      MCP_ENABLED: ${MCP_ENABLED:-true}
      MCP_PORT: 8401
    depends_on:
      promptlooper-db:
        condition: service_healthy
      promptlooper-redis:
        condition: service_healthy
  promptlooper-worker:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: api
    container_name: promptlooper-worker
    restart: unless-stopped
    networks:
      - promptlooper
    command: celery -A backend.worker:app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
    environment:
      DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
      REDIS_URL: redis://promptlooper-redis:6379/0
      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
    depends_on:
      promptlooper-db:
        condition: service_healthy
      promptlooper-redis:
        condition: service_healthy
  promptlooper-web:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: web
    container_name: promptlooper-web
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "8400:80"
    depends_on:
      - promptlooper-api
--- a/env.example
+++ b/env.example
@ -0,0 +1,23 @@
 # PromptLooper — Environment Configuration
 # Copy to .env and fill in required values
 # ── Database ──────────────────────────────────────────────
 POSTGRES_USER=promptlooper
 POSTGRES_PASSWORD=          # REQUIRED: set a strong password
 POSTGRES_DB=promptlooper
 # ── Auth ──────────────────────────────────────────────────
 JWT_SECRET=                 # REQUIRED: generate with `openssl rand -hex 32`
 # ── Default LLM Endpoint (optional) ──────────────────────
 # Pre-configure an LLM endpoint so users don't have to add one manually
 DEFAULT_ENDPOINT_URL=       # e.g. http://chat.forgetyour.name/api/v1
 DEFAULT_ENDPOINT_KEY=       # API key for the default endpoint
 # ── Limits ────────────────────────────────────────────────
 MAX_CONCURRENT_RUNS=4       # Parallel run limit per sweep
 MAX_TOKENS_PER_SWEEP=0      # 0 = unlimited; set a number to cap token spend
 # ── MCP Server ────────────────────────────────────────────
 MCP_ENABLED=true            # Enable/disable MCP server for agent access
 # MCP_PORT=8401             # MCP server port (set in docker-compose)
--- a/promptlooper-spec.md
+++ b/promptlooper-spec.md
@ -0,0 +1,635 @@
 # PromptLooper
 > The one who loops prompts — a universal LLM pipeline tuning workbench.
 PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
 It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
 ---
 ## Problem Statement
 Anyone building LLM-powered applications faces the same painful loop:
 1. Write a system prompt
 2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
 3. Run it against sample data
 4. Read the output and decide if it's "good enough"
 5. Tweak something and repeat
 This process is manual, unscientific, and wasteful. There's no way to:
 - Systematically compare configurations side-by-side
 - Know if you've already tested a particular combination
 - Quantify "better" beyond gut feeling
 - Let an agent handle the iteration while you steer from above
 - Share optimized configurations between projects or team members
 PromptLooper makes this process systematic, observable, cached, and agent-drivable.
 ---
 ## Target Users
 | User | Use Case |
 |------|----------|
 | **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
 | **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
 | **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
 | **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
 | **Infrastructure team** | Benchmarking new models against existing baselines before migration |
 ---
 ## Core Concepts
 ### Experiment
 A named configuration that defines:
 - **Sample data**: Input documents, queries, or any text the pipeline will process
 - **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
 - **Evaluation criteria**: Scoring functions that grade the output
 - **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
 ### Run
 A single execution of one specific configuration within an experiment. A run captures:
 - Full input configuration (prompt, model, all parameters)
 - Raw LLM response(s)
 - Timing data (latency, tokens in/out)
 - Evaluation scores
 - Configuration hash (for cache deduplication)
 ### Sweep
 A batch of runs that systematically explores a parameter space. Types:
 - **Grid sweep**: Every combination of specified parameter values
 - **Random sweep**: Random sampling from parameter ranges
 - **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
 ### Scoring Function
 A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
 - **Embedding similarity**: How semantically close is the output to a reference answer?
 - **Length compliance**: Does the output meet length constraints?
 - **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
 - **Keyword presence**: Do required terms appear in the output?
 - **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
 - **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
 - **Custom function**: User-provided Python snippet or HTTP webhook
 ### Project
 A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
 - All experiments and their runs
 - Saved "best" configurations
 - Notes and annotations
 - Export history
 ---
 ## Architecture
 ```
 ┌──────────────────────────────────────────────────────────────────────────┐
 │  Docker Compose: xpltd_promptlooper (ub01)                               │
 │  Network: promptlooper (172.33.0.0/24)                                   │
 │                                                                          │
 │  ┌────────────┐  ┌─────────────┐  ┌──────────────────────────────────┐  │
 │  │  PostgreSQL │  │    Redis    │  │         FastAPI (API)            │  │
 │  │  :5434      │  │  job queue  │  │  Experiments, Runs, Scoring,     │  │
 │  │  experiments│  │  pub/sub    │  │  Projects, Auth, MCP Server      │  │
 │  │  runs, cache│  │  live state │  │  WebSocket for live dashboard    │  │
 │  └─────┬───────┘  └──────┬──────┘  └──────────────┬───────────────────┘  │
 │        │                 │                        │                      │
 │  ┌─────┴─────────────────┴────────────────────────┴───────────────────┐  │
 │  │                      Celery Worker                                 │  │
 │  │  Executes runs against target LLM endpoints                        │  │
 │  │  Caches responses by config hash                                   │  │
 │  │  Streams progress via Redis pub/sub                                │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 │                                                                          │
 │  ┌────────────────────────────────────────────────────────────────────┐  │
 │  │                    Web UI (React + Vite)                           │  │
 │  │  nginx → :8400                                                     │  │
 │  │  Dashboard, Experiment Builder, Live Observability, Steering       │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 └──────────────────────────────────────────────────────────────────────────┘
                              │
                              │  HTTP (OpenAI-compatible)
                              ▼
              ┌───────────────────────────────┐
              │  Target LLM Endpoints          │
              │  OpenWebUI, vLLM, Ollama,      │
              │  OpenAI, Anthropic, any        │
              │  OpenAI-compatible API          │
              └───────────────────────────────┘
 ```
 ### Services (Production Compose)
 | Service | Image | Port | Purpose |
 |---------|-------|------|---------|
 | `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
 | `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
 | `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
 | `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
 | `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
 ### Single Container Mode
 When `DATABASE_URL` is not set, PromptLooper runs with:
 - SQLite at `/data/promptlooper.db`
 - In-process task queue (no Celery/Redis dependency)
 - All services in one container on port 8400
 ```bash
 docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
 ```
 ---
 ## Data Model
 ### User
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | username | string | Unique, "admin" created on first boot |
 | password_hash | string | bcrypt |
 | is_admin | bool | Default true for first user |
 | created_at | timestamp | |
 ### Project
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | name | string | |
 | description | text | Optional |
 | owner_id | UUID | FK → User |
 | created_at | timestamp | |
 | updated_at | timestamp | |
 ### Experiment
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | project_id | UUID | FK → Project |
 | name | string | |
 | description | text | Optional |
 | sample_data | JSONB | Input documents/queries |
 | pipeline_stages | JSONB | Stage definitions with prompt templates |
 | scoring_config | JSONB | Which scoring functions to use and their weights |
 | parameter_space | JSONB | What to vary and ranges/options |
 | status | enum | draft, running, paused, completed |
 | created_at | timestamp | |
 | updated_at | timestamp | |
 ### Run
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | experiment_id | UUID | FK → Experiment |
 | config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
 | config | JSONB | Complete configuration snapshot |
 | status | enum | pending, running, completed, failed, cached |
 | started_at | timestamp | |
 | completed_at | timestamp | |
 | duration_ms | int | Wall clock time |
 | tokens_in | int | Total input tokens across all stages |
 | tokens_out | int | Total output tokens |
 | cost_estimate | decimal | Estimated cost based on model pricing |
 ### StageResult
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | run_id | UUID | FK → Run |
 | stage_index | int | 0-based stage number |
 | prompt_sent | text | Actual prompt after template rendering |
 | response_raw | text | Raw LLM response |
 | model_used | string | Model identifier |
 | parameters | JSONB | Temperature, top_p, etc. |
 | tokens_in | int | This stage |
 | tokens_out | int | This stage |
 | latency_ms | int | This stage |
 ### Score
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | run_id | UUID | FK → Run |
 | scorer_name | string | e.g. "embedding_similarity", "human_rating" |
 | value | float | Normalized 0.0–1.0 |
 | metadata | JSONB | Scorer-specific details |
 | created_at | timestamp | |
 ### ResponseCache
 | Field | Type | Notes |
 |-------|------|-------|
 | config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
 | response | text | Cached LLM response |
 | model | string | |
 | tokens_in | int | |
 | tokens_out | int | |
 | latency_ms | int | Original latency |
 | created_at | timestamp | |
 ### WebhookConfig
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
 | url | string | Target URL |
 | headers | JSONB | Optional auth headers |
 | is_active | bool | |
 ---
 ## API Endpoints
 ### Auth
 | Method | Path | Description |
 |--------|------|-------------|
 | POST | `/api/v1/auth/setup` | First-boot admin password setup |
 | POST | `/api/v1/auth/login` | Login, returns JWT |
 | GET | `/api/v1/auth/me` | Current user info |
 ### Admin
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
 | PUT | `/api/v1/admin/settings` | Update settings |
 | GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
 ### Projects
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/projects` | List projects |
 | POST | `/api/v1/projects` | Create project |
 | GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
 | PUT | `/api/v1/projects/{id}` | Update project |
 | DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
 ### Experiments
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments` | List experiments (filter by project) |
 | POST | `/api/v1/experiments` | Create experiment |
 | GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
 | PUT | `/api/v1/experiments/{id}` | Update experiment config |
 | DELETE | `/api/v1/experiments/{id}` | Delete experiment |
 | POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
 | POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
 | POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
 | POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
 ### Runs
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
 | GET | `/api/v1/runs/{id}` | Run detail with stage results |
 | POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
 | POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
 | GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
 ### Export
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
 | GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
 | GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
 | GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
 ### LLM Endpoints (Target Management)
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/endpoints` | List configured LLM endpoints |
 | POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
 | PUT | `/api/v1/endpoints/{id}` | Update endpoint |
 | DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
 | POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
 ### Webhooks
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/webhooks` | List webhook configs |
 | POST | `/api/v1/webhooks` | Create webhook |
 | DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
 ### WebSocket
 | Path | Description |
 |------|-------------|
 | `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
 | `/ws/dashboard` | Global activity feed across all experiments |
 ### Health
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Health check (DB + Redis connectivity) |
 ---
 ## MCP Server
 PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
 ### MCP Tools
 | Tool | Description |
 |------|-------------|
 | `create_project` | Create a new project workspace |
 | `create_experiment` | Define an experiment with sample data, stages, and scoring |
 | `configure_endpoint` | Add or update an LLM target endpoint |
 | `run_single` | Execute one specific configuration and return results |
 | `run_sweep` | Start a parameter sweep (grid/random/guided) |
 | `get_leaderboard` | Get top N configurations ranked by score |
 | `get_run_detail` | Get full details of a specific run |
 | `export_best_config` | Export the best configuration in JSON/YAML/env format |
 | `pause_sweep` | Pause a running sweep |
 | `resume_sweep` | Resume a paused sweep |
 | `add_human_score` | Rate a run's output |
 | `get_experiment_status` | Check experiment progress |
 | `list_models` | List available models across all configured endpoints |
 ### Example Agent Interaction
 ```
 Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
        that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
        sweeping temperature from 0.1 to 0.9 in 0.2 increments.
        Use embedding similarity scoring against these reference outputs.
        Run a grid sweep."
 PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
                  → streams progress → [get_leaderboard]
 Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
        a .env snippet I can drop into Chrysopedia."
 PromptLooper MCP: [export_best_config format=env]
 ```
 ---
 ## Response Caching
 Every LLM call is cached by a SHA-256 hash of:
 - Prompt text (after template rendering)
 - Model identifier
 - All inference parameters (temperature, top_p, max_tokens, etc.)
 - Input data
 If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
 - Re-running experiments with new scoring functions costs zero tokens
 - Adding a new scorer retroactively evaluates all historical runs
 - Accidentally re-running a sweep wastes nothing
 - Cache can be invalidated per-run or per-experiment if needed
 ---
 ## Authentication Model
 ### First Boot
 - App detects no users exist
 - Presents a setup screen: create admin username + password
 - Admin account is created, user is logged in
 ### Guest Access
 - Admin can toggle `allow_guest_access` in settings
 - Guests can view experiments and results (read-only)
 - Guests cannot create experiments, run sweeps, or modify configs
 - Default: guest access disabled
 ### API Authentication
 - JWT tokens for the web UI
 - API key (generated in admin settings) for programmatic access and MCP
 - API key passed via `Authorization: Bearer <key>` header
 ---
 ## Real-Time Observability Dashboard
 The dashboard is the primary user interface during active experimentation. It provides:
 ### Live Experiment View
 - Progress bar: X of Y runs completed
 - Token usage accumulator (running total)
 - Cost estimate (based on configured model pricing)
 - Cache hit rate for current sweep
 - Estimated time remaining
 ### Side-by-Side Output Comparison
 - Pick any two runs and diff their outputs
 - Highlight differences in prompt, parameters, and response
 - Score comparison overlay
 ### Leaderboard
 - Real-time ranked list of runs by weighted score
 - Sortable by any individual scorer
 - Click to expand full run detail
 ### Steering Controls
 - **Pause**: Stop the sweep after current run completes
 - **Fork**: Create a new experiment branching from current best, with modified parameters
 - **Redirect**: Change remaining sweep parameters mid-flight
 - **Approve**: Mark a configuration as "good enough" and export
 - **Reject**: Exclude a run from leaderboard consideration
 ### Activity Timeline
 - Chronological feed of events: run started, run completed, new best found, cache hit, error
 - Filterable by event type
 ---
 ## Webhook Events
 | Event | Payload | Trigger |
 |-------|---------|---------|
 | `experiment.started` | experiment_id, sweep config | Sweep begins |
 | `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
 | `experiment.paused` | experiment_id, reason | Manual or budget pause |
 | `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
 | `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
 | `human_needed` | experiment_id, reason, context | Agent requests human review |
 | `run.failed` | run_id, error | Individual run error |
 ---
 ## Configuration Export Formats
 ### JSON
 ```json
 {
  "model": "qwen2.5-72b-instruct",
  "endpoint": "http://chat.forgetyour.name/api",
  "temperature": 0.3,
  "top_p": 0.85,
  "max_tokens": 2048,
  "system_prompt": "You are a music production knowledge extractor...",
  "score": 0.87,
  "experiment": "chrysopedia-extraction-v2",
  "exported_at": "2026-04-06T12:00:00Z"
 }
 ```
 ### .env
 ```bash
 LLM_MODEL=qwen2.5-72b-instruct
 LLM_API_URL=http://chat.forgetyour.name/api
 LLM_TEMPERATURE=0.3
 LLM_TOP_P=0.85
 LLM_MAX_TOKENS=2048
 # Score: 0.87 | Experiment: chrysopedia-extraction-v2
 ```
 ### YAML
 ```yaml
 model: qwen2.5-72b-instruct
 endpoint: http://chat.forgetyour.name/api
 parameters:
  temperature: 0.3
  top_p: 0.85
  max_tokens: 2048
 system_prompt: |
  You are a music production knowledge extractor...
 metadata:
  score: 0.87
  experiment: chrysopedia-extraction-v2
  exported_at: 2026-04-06T12:00:00Z
 ```
 ---
 ## Environment Variables
 | Group | Variable | Default | Notes |
 |-------|----------|---------|-------|
 | **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
 | **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
 | **Server** | `HOST` | `0.0.0.0` | Bind address |
 | **Server** | `PORT` | `8400` | HTTP port |
 | **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
 | **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
 | **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
 | **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
 | **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
 | **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
 | **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
 | **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
 | **MCP** | `MCP_PORT` | `8401` | MCP server port |
 ---
 ## Docker Compose (Production — XPLTD Conventions)
 Project name: `xpltd_promptlooper`
 Network: `promptlooper` (`172.33.0.0/24`)
 Persistent data: `/vmPool/r/services/promptlooper_*`
 PostgreSQL port: `5434` (external)
 Web UI port: `8400` (external)
 ---
 ## Technology Stack
 | Layer | Technology | Rationale |
 |-------|-----------|-----------|
 | **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
 | **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
 | **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
 | **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
 | **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
 | **Styling** | Tailwind CSS | Fast iteration, utility-first |
 | **MCP** | Python MCP SDK | Standard protocol for agent integration |
 | **Container** | Multi-stage Docker build | Single image serves both API and frontend |
 ---
 ## Development & Deployment
 ### Local Development
 ```bash
 git clone git@git.xpltd.co:xpltdco/promptlooper.git
 cd promptlooper
 cp .env.example .env
 docker compose up -d promptlooper-db promptlooper-redis
 cd backend && pip install -r requirements.txt
 alembic upgrade head
 uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # In another terminal:
 cd frontend && npm install && npm run dev
 ```
 ### Production Deployment (ub01)
 ```bash
 ssh ub01
 cd /vmPool/r/repos/xpltdco/promptlooper
 git pull && docker compose build && docker compose up -d
 ```
 ### Project Structure
 ```
 promptlooper/
 ├── backend/
 │   ├── main.py                 # FastAPI entry point
 │   ├── config.py               # Pydantic Settings
 │   ├── models.py               # SQLAlchemy ORM
 │   ├── schemas.py              # Pydantic request/response
 │   ├── auth.py                 # JWT + API key auth
 │   ├── worker.py               # Celery app config
 │   ├── routers/
 │   │   ├── auth.py
 │   │   ├── projects.py
 │   │   ├── experiments.py
 │   │   ├── runs.py
 │   │   ├── endpoints.py
 │   │   ├── export.py
 │   │   ├── webhooks.py
 │   │   └── admin.py
 │   ├── engine/
 │   │   ├── runner.py           # Run execution logic
 │   │   ├── sweep.py            # Sweep orchestration
 │   │   ├── cache.py            # Response cache layer
 │   │   ├── adapters/           # LLM endpoint adapters
 │   │   │   ├── openai_compat.py
 │   │   │   └── base.py
 │   │   └── scorers/            # Pluggable scoring functions
 │   │       ├── embedding.py
 │   │       ├── format.py
 │   │       ├── keyword.py
 │   │       ├── llm_judge.py
 │   │       └── base.py
 │   ├── mcp/
 │   │   ├── server.py           # MCP server implementation
 │   │   └── tools.py            # MCP tool definitions
 │   ├── websocket/
 │   │   └── manager.py          # WebSocket connection management
 │   └── tests/
 ├── frontend/
 │   └── src/
 │       ├── pages/
 │       │   ├── Setup.tsx       # First-boot admin setup
 │       │   ├── Login.tsx
 │       │   ├── Dashboard.tsx   # Global activity
 │       │   ├── Projects.tsx
 │       │   ├── Experiment.tsx  # Experiment builder + config
 │       │   ├── Live.tsx        # Real-time observability
 │       │   ├── Compare.tsx     # Side-by-side run comparison
 │       │   └── Admin.tsx       # System settings
 │       ├── components/
 │       │   ├── Leaderboard.tsx
 │       │   ├── SteeringControls.tsx
 │       │   ├── RunCard.tsx
 │       │   ├── ScoreChart.tsx
 │       │   └── Timeline.tsx
 │       └── api/
 ├── docker/
 │   ├── Dockerfile              # Multi-stage: API + frontend
 │   └── nginx.conf
 ├── alembic/
 ├── docker-compose.yml
 ├── .env.example
 ├── CLAUDE.md
 └── README.md
 ```