MAESTRO: Initialize repository with README, .gitignore, and project files

Add README.md with project description, quick-start instructions, and AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and env.example.
2026-04-07 01:39:18 -05:00 · 2026-04-07 01:39:18 -05:00 · fc2e4cd7d1
commit fc2e4cd7d1
6 changed files with 1013 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,57 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.egg-info/
+*.egg
+dist/
+build/
+.eggs/
+*.whl
+.venv/
+venv/
+env/
+.env
+*.pyc
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+htmlcov/
+.coverage
+.coverage.*
+
+# Node / Frontend
+node_modules/
+frontend/dist/
+frontend/build/
+.npm
+*.tsbuildinfo
+
+# Docker
+docker/nginx.conf.bak
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+
+# OS
+Thumbs.db
+Desktop.ini
+
+# Data (single-container mode)
+*.db
+/data/
+
+# Alembic
+alembic/versions/__pycache__/
+
+# Auto Run Docs (Maestro working files)
+Auto Run Docs/Working/
+
+# Misc
+*.log
+*.bak
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,127 @@
+# CLAUDE.md — PromptLooper
+
+## What is this project?
+
+PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
+
+## Repository
+
+- **Hosted at**: git.xpltd.co/xpltdco/promptlooper
+- **XPLTD project name**: `xpltd_promptlooper`
+- **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
+
+## Tech Stack
+
+- **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
+- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
+- **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
+- **Cache/Queue**: Redis 7 (production) / in-process (single-container)
+- **Real-time**: WebSocket via FastAPI + Redis pub/sub
+- **MCP**: Python MCP SDK
+- **Container**: Multi-stage Docker build, nginx for frontend
+
+## XPLTD Conventions
+
+These are non-negotiable project conventions shared across all XPLTD projects:
+
+- Docker Compose project name: `xpltd_promptlooper`
+- Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
+- Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
+- PostgreSQL on external port `5434` (internal `5432`)
+- Web UI on port `8400`
+- MCP server on port `8401`
+- Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
+
+## Key Architecture Decisions
+
+1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
+2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
+3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
+4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
+5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
+6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
+
+## File Organization
+
+```
+backend/
+  main.py              — FastAPI app, middleware, router mounting
+  config.py            — Pydantic Settings from env vars
+  models.py            — SQLAlchemy ORM models
+  schemas.py           — Pydantic request/response schemas
+  auth.py              — JWT + API key authentication
+  worker.py            — Celery app configuration
+  routers/             — API endpoint handlers
+  engine/              — Core experiment execution logic
+    runner.py          — Individual run execution
+    sweep.py           — Sweep orchestration (grid/random/guided)
+    cache.py           — Response cache layer
+    adapters/          — LLM endpoint adapters
+    scorers/           — Pluggable scoring functions
+  mcp/                 — MCP server implementation
+  websocket/           — WebSocket connection management
+
+frontend/src/
+  pages/               — Route-level components
+  components/          — Shared UI components
+  api/                 — Typed API client functions
+```
+
+## Database Migrations
+
+Use Alembic. Same patterns as Chrysopedia:
+```bash
+alembic revision --autogenerate -m "describe_change"
+alembic upgrade head
+```
+
+## Running Locally
+
+```bash
+docker compose up -d promptlooper-db promptlooper-redis
+cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
+# Frontend in another terminal:
+cd frontend && npm run dev
+```
+
+## Testing
+
+```bash
+cd backend && pytest
+cd frontend && npm test
+```
+
+## Important Patterns
+
+### Adding a new scorer
+1. Create `backend/engine/scorers/my_scorer.py`
+2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
+3. Register in `backend/engine/scorers/__init__.py`
+4. Add to frontend scorer picker component
+
+### Adding a new LLM adapter
+1. Create `backend/engine/adapters/my_adapter.py`
+2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
+3. Register in `backend/engine/adapters/__init__.py`
+4. Currently only OpenAI-compatible is implemented; all others should be edge cases
+
+### Adding a new MCP tool
+1. Add tool definition in `backend/mcp/tools.py`
+2. Implement handler in `backend/mcp/server.py`
+3. Tools should map 1:1 to API endpoints where possible
+
+## Common Gotchas
+
+- Always hash the FULL config when checking cache — missing a single parameter means cache misses
+- WebSocket connections must be cleaned up on disconnect — use the connection manager
+- SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
+- Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
+- MCP server runs on a separate port from the main API
+
+## Deployment
+
+```bash
+ssh ub01
+cd /vmPool/r/repos/xpltdco/promptlooper
+git pull && docker compose build && docker compose up -d
+```
--- a/README.md
+++ b/README.md
@ -0,0 +1,65 @@
+# PromptLooper
+
+[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
+
+> The one who loops prompts — a universal LLM pipeline tuning workbench.
+
+PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
+
+It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
+
+## Quick Start
+
+### Single Container (zero dependencies)
+
+```bash
+docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
+```
+
+Open `http://localhost:8400` — you'll be prompted to create an admin account on first boot.
+
+### Production (Docker Compose)
+
+```bash
+git clone git@git.xpltd.co:xpltdco/promptlooper.git
+cd promptlooper
+cp .env.example .env
+# Edit .env — set POSTGRES_PASSWORD and JWT_SECRET at minimum
+docker compose up -d
+```
+
+## Features
+
+- **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
+- **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
+- **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
+- **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
+- **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
+- **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
+
+## Development
+
+```bash
+# Start backing services
+docker compose up -d promptlooper-db promptlooper-redis
+
+# Backend
+cd backend && pip install -r requirements.txt
+alembic upgrade head
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+
+# Frontend (separate terminal)
+cd frontend && npm install && npm run dev
+```
+
+## Testing
+
+```bash
+cd backend && pytest
+cd frontend && npm test
+```
+
+## License
+
+[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,106 @@
+name: xpltd_promptlooper
+
+networks:
+  promptlooper:
+    driver: bridge
+    ipam:
+      config:
+        - subnet: 172.33.0.0/24
+
+services:
+  promptlooper-db:
+    image: postgres:16-alpine
+    container_name: promptlooper-db
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "5434:5432"
+    environment:
+      POSTGRES_USER: ${POSTGRES_USER:-promptlooper}
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env}
+      POSTGRES_DB: ${POSTGRES_DB:-promptlooper}
+    volumes:
+      - /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-promptlooper}"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  promptlooper-redis:
+    image: redis:7-alpine
+    container_name: promptlooper-redis
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    volumes:
+      - /vmPool/r/services/promptlooper_redis:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  promptlooper-api:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: api
+    container_name: promptlooper-api
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "8401:8401"  # MCP server
+    environment:
+      DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
+      REDIS_URL: redis://promptlooper-redis:6379/0
+      JWT_SECRET: ${JWT_SECRET:?Set JWT_SECRET in .env}
+      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
+      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
+      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
+      MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
+      MCP_ENABLED: ${MCP_ENABLED:-true}
+      MCP_PORT: 8401
+    depends_on:
+      promptlooper-db:
+        condition: service_healthy
+      promptlooper-redis:
+        condition: service_healthy
+
+  promptlooper-worker:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: api
+    container_name: promptlooper-worker
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    command: celery -A backend.worker:app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
+    environment:
+      DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
+      REDIS_URL: redis://promptlooper-redis:6379/0
+      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
+      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
+      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
+    depends_on:
+      promptlooper-db:
+        condition: service_healthy
+      promptlooper-redis:
+        condition: service_healthy
+
+  promptlooper-web:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: web
+    container_name: promptlooper-web
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "8400:80"
+    depends_on:
+      - promptlooper-api
--- a/env.example
+++ b/env.example
@ -0,0 +1,23 @@
+# PromptLooper — Environment Configuration
+# Copy to .env and fill in required values
+
+# ── Database ──────────────────────────────────────────────
+POSTGRES_USER=promptlooper
+POSTGRES_PASSWORD=          # REQUIRED: set a strong password
+POSTGRES_DB=promptlooper
+
+# ── Auth ──────────────────────────────────────────────────
+JWT_SECRET=                 # REQUIRED: generate with `openssl rand -hex 32`
+
+# ── Default LLM Endpoint (optional) ──────────────────────
+# Pre-configure an LLM endpoint so users don't have to add one manually
+DEFAULT_ENDPOINT_URL=       # e.g. http://chat.forgetyour.name/api/v1
+DEFAULT_ENDPOINT_KEY=       # API key for the default endpoint
+
+# ── Limits ────────────────────────────────────────────────
+MAX_CONCURRENT_RUNS=4       # Parallel run limit per sweep
+MAX_TOKENS_PER_SWEEP=0      # 0 = unlimited; set a number to cap token spend
+
+# ── MCP Server ────────────────────────────────────────────
+MCP_ENABLED=true            # Enable/disable MCP server for agent access
+# MCP_PORT=8401             # MCP server port (set in docker-compose)
--- a/promptlooper-spec.md
+++ b/promptlooper-spec.md
@ -0,0 +1,635 @@
+# PromptLooper
+
+> The one who loops prompts — a universal LLM pipeline tuning workbench.
+
+PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
+
+It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
+
+---
+
+## Problem Statement
+
+Anyone building LLM-powered applications faces the same painful loop:
+
+1. Write a system prompt
+2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
+3. Run it against sample data
+4. Read the output and decide if it's "good enough"
+5. Tweak something and repeat
+
+This process is manual, unscientific, and wasteful. There's no way to:
+- Systematically compare configurations side-by-side
+- Know if you've already tested a particular combination
+- Quantify "better" beyond gut feeling
+- Let an agent handle the iteration while you steer from above
+- Share optimized configurations between projects or team members
+
+PromptLooper makes this process systematic, observable, cached, and agent-drivable.
+
+---
+
+## Target Users
+
+| User | Use Case |
+|------|----------|
+| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
+| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
+| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
+| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
+| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
+
+---
+
+## Core Concepts
+
+### Experiment
+
+A named configuration that defines:
+- **Sample data**: Input documents, queries, or any text the pipeline will process
+- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
+- **Evaluation criteria**: Scoring functions that grade the output
+- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
+
+### Run
+
+A single execution of one specific configuration within an experiment. A run captures:
+- Full input configuration (prompt, model, all parameters)
+- Raw LLM response(s)
+- Timing data (latency, tokens in/out)
+- Evaluation scores
+- Configuration hash (for cache deduplication)
+
+### Sweep
+
+A batch of runs that systematically explores a parameter space. Types:
+- **Grid sweep**: Every combination of specified parameter values
+- **Random sweep**: Random sampling from parameter ranges
+- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
+
+### Scoring Function
+
+A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
+- **Embedding similarity**: How semantically close is the output to a reference answer?
+- **Length compliance**: Does the output meet length constraints?
+- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
+- **Keyword presence**: Do required terms appear in the output?
+- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
+- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
+- **Custom function**: User-provided Python snippet or HTTP webhook
+
+### Project
+
+A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
+- All experiments and their runs
+- Saved "best" configurations
+- Notes and annotations
+- Export history
+
+---
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Docker Compose: xpltd_promptlooper (ub01)                               │
+│  Network: promptlooper (172.33.0.0/24)                                   │
+│                                                                          │
+│  ┌────────────┐  ┌─────────────┐  ┌──────────────────────────────────┐  │
+│  │  PostgreSQL │  │    Redis    │  │         FastAPI (API)            │  │
+│  │  :5434      │  │  job queue  │  │  Experiments, Runs, Scoring,     │  │
+│  │  experiments│  │  pub/sub    │  │  Projects, Auth, MCP Server      │  │
+│  │  runs, cache│  │  live state │  │  WebSocket for live dashboard    │  │
+│  └─────┬───────┘  └──────┬──────┘  └──────────────┬───────────────────┘  │
+│        │                 │                        │                      │
+│  ┌─────┴─────────────────┴────────────────────────┴───────────────────┐  │
+│  │                      Celery Worker                                 │  │
+│  │  Executes runs against target LLM endpoints                        │  │
+│  │  Caches responses by config hash                                   │  │
+│  │  Streams progress via Redis pub/sub                                │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐  │
+│  │                    Web UI (React + Vite)                           │  │
+│  │  nginx → :8400                                                     │  │
+│  │  Dashboard, Experiment Builder, Live Observability, Steering       │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+└──────────────────────────────────────────────────────────────────────────┘
+                              │
+                              │  HTTP (OpenAI-compatible)
+                              ▼
+              ┌───────────────────────────────┐
+              │  Target LLM Endpoints          │
+              │  OpenWebUI, vLLM, Ollama,      │
+              │  OpenAI, Anthropic, any        │
+              │  OpenAI-compatible API          │
+              └───────────────────────────────┘
+```
+
+### Services (Production Compose)
+
+| Service | Image | Port | Purpose |
+|---------|-------|------|---------|
+| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
+| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
+| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
+| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
+| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
+
+### Single Container Mode
+
+When `DATABASE_URL` is not set, PromptLooper runs with:
+- SQLite at `/data/promptlooper.db`
+- In-process task queue (no Celery/Redis dependency)
+- All services in one container on port 8400
+
+```bash
+docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
+```
+
+---
+
+## Data Model
+
+### User
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| username | string | Unique, "admin" created on first boot |
+| password_hash | string | bcrypt |
+| is_admin | bool | Default true for first user |
+| created_at | timestamp | |
+
+### Project
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| name | string | |
+| description | text | Optional |
+| owner_id | UUID | FK → User |
+| created_at | timestamp | |
+| updated_at | timestamp | |
+
+### Experiment
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| project_id | UUID | FK → Project |
+| name | string | |
+| description | text | Optional |
+| sample_data | JSONB | Input documents/queries |
+| pipeline_stages | JSONB | Stage definitions with prompt templates |
+| scoring_config | JSONB | Which scoring functions to use and their weights |
+| parameter_space | JSONB | What to vary and ranges/options |
+| status | enum | draft, running, paused, completed |
+| created_at | timestamp | |
+| updated_at | timestamp | |
+
+### Run
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| experiment_id | UUID | FK → Experiment |
+| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
+| config | JSONB | Complete configuration snapshot |
+| status | enum | pending, running, completed, failed, cached |
+| started_at | timestamp | |
+| completed_at | timestamp | |
+| duration_ms | int | Wall clock time |
+| tokens_in | int | Total input tokens across all stages |
+| tokens_out | int | Total output tokens |
+| cost_estimate | decimal | Estimated cost based on model pricing |
+
+### StageResult
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| run_id | UUID | FK → Run |
+| stage_index | int | 0-based stage number |
+| prompt_sent | text | Actual prompt after template rendering |
+| response_raw | text | Raw LLM response |
+| model_used | string | Model identifier |
+| parameters | JSONB | Temperature, top_p, etc. |
+| tokens_in | int | This stage |
+| tokens_out | int | This stage |
+| latency_ms | int | This stage |
+
+### Score
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| run_id | UUID | FK → Run |
+| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
+| value | float | Normalized 0.0–1.0 |
+| metadata | JSONB | Scorer-specific details |
+| created_at | timestamp | |
+
+### ResponseCache
+| Field | Type | Notes |
+|-------|------|-------|
+| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
+| response | text | Cached LLM response |
+| model | string | |
+| tokens_in | int | |
+| tokens_out | int | |
+| latency_ms | int | Original latency |
+| created_at | timestamp | |
+
+### WebhookConfig
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
+| url | string | Target URL |
+| headers | JSONB | Optional auth headers |
+| is_active | bool | |
+
+---
+
+## API Endpoints
+
+### Auth
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/v1/auth/setup` | First-boot admin password setup |
+| POST | `/api/v1/auth/login` | Login, returns JWT |
+| GET | `/api/v1/auth/me` | Current user info |
+
+### Admin
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
+| PUT | `/api/v1/admin/settings` | Update settings |
+| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
+
+### Projects
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/projects` | List projects |
+| POST | `/api/v1/projects` | Create project |
+| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
+| PUT | `/api/v1/projects/{id}` | Update project |
+| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
+
+### Experiments
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments` | List experiments (filter by project) |
+| POST | `/api/v1/experiments` | Create experiment |
+| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
+| PUT | `/api/v1/experiments/{id}` | Update experiment config |
+| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
+| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
+| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
+| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
+| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
+
+### Runs
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
+| GET | `/api/v1/runs/{id}` | Run detail with stage results |
+| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
+| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
+| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
+
+### Export
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
+| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
+| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
+| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
+
+### LLM Endpoints (Target Management)
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/endpoints` | List configured LLM endpoints |
+| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
+| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
+| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
+| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
+
+### Webhooks
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/webhooks` | List webhook configs |
+| POST | `/api/v1/webhooks` | Create webhook |
+| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
+
+### WebSocket
+| Path | Description |
+|------|-------------|
+| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
+| `/ws/dashboard` | Global activity feed across all experiments |
+
+### Health
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Health check (DB + Redis connectivity) |
+
+---
+
+## MCP Server
+
+PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
+
+### MCP Tools
+
+| Tool | Description |
+|------|-------------|
+| `create_project` | Create a new project workspace |
+| `create_experiment` | Define an experiment with sample data, stages, and scoring |
+| `configure_endpoint` | Add or update an LLM target endpoint |
+| `run_single` | Execute one specific configuration and return results |
+| `run_sweep` | Start a parameter sweep (grid/random/guided) |
+| `get_leaderboard` | Get top N configurations ranked by score |
+| `get_run_detail` | Get full details of a specific run |
+| `export_best_config` | Export the best configuration in JSON/YAML/env format |
+| `pause_sweep` | Pause a running sweep |
+| `resume_sweep` | Resume a paused sweep |
+| `add_human_score` | Rate a run's output |
+| `get_experiment_status` | Check experiment progress |
+| `list_models` | List available models across all configured endpoints |
+
+### Example Agent Interaction
+
+```
+Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
+        that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
+        sweeping temperature from 0.1 to 0.9 in 0.2 increments.
+        Use embedding similarity scoring against these reference outputs.
+        Run a grid sweep."
+
+PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
+                  → streams progress → [get_leaderboard]
+
+Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
+        a .env snippet I can drop into Chrysopedia."
+
+PromptLooper MCP: [export_best_config format=env]
+```
+
+---
+
+## Response Caching
+
+Every LLM call is cached by a SHA-256 hash of:
+- Prompt text (after template rendering)
+- Model identifier
+- All inference parameters (temperature, top_p, max_tokens, etc.)
+- Input data
+
+If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
+- Re-running experiments with new scoring functions costs zero tokens
+- Adding a new scorer retroactively evaluates all historical runs
+- Accidentally re-running a sweep wastes nothing
+- Cache can be invalidated per-run or per-experiment if needed
+
+---
+
+## Authentication Model
+
+### First Boot
+- App detects no users exist
+- Presents a setup screen: create admin username + password
+- Admin account is created, user is logged in
+
+### Guest Access
+- Admin can toggle `allow_guest_access` in settings
+- Guests can view experiments and results (read-only)
+- Guests cannot create experiments, run sweeps, or modify configs
+- Default: guest access disabled
+
+### API Authentication
+- JWT tokens for the web UI
+- API key (generated in admin settings) for programmatic access and MCP
+- API key passed via `Authorization: Bearer <key>` header
+
+---
+
+## Real-Time Observability Dashboard
+
+The dashboard is the primary user interface during active experimentation. It provides:
+
+### Live Experiment View
+- Progress bar: X of Y runs completed
+- Token usage accumulator (running total)
+- Cost estimate (based on configured model pricing)
+- Cache hit rate for current sweep
+- Estimated time remaining
+
+### Side-by-Side Output Comparison
+- Pick any two runs and diff their outputs
+- Highlight differences in prompt, parameters, and response
+- Score comparison overlay
+
+### Leaderboard
+- Real-time ranked list of runs by weighted score
+- Sortable by any individual scorer
+- Click to expand full run detail
+
+### Steering Controls
+- **Pause**: Stop the sweep after current run completes
+- **Fork**: Create a new experiment branching from current best, with modified parameters
+- **Redirect**: Change remaining sweep parameters mid-flight
+- **Approve**: Mark a configuration as "good enough" and export
+- **Reject**: Exclude a run from leaderboard consideration
+
+### Activity Timeline
+- Chronological feed of events: run started, run completed, new best found, cache hit, error
+- Filterable by event type
+
+---
+
+## Webhook Events
+
+| Event | Payload | Trigger |
+|-------|---------|---------|
+| `experiment.started` | experiment_id, sweep config | Sweep begins |
+| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
+| `experiment.paused` | experiment_id, reason | Manual or budget pause |
+| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
+| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
+| `human_needed` | experiment_id, reason, context | Agent requests human review |
+| `run.failed` | run_id, error | Individual run error |
+
+---
+
+## Configuration Export Formats
+
+### JSON
+```json
+{
+  "model": "qwen2.5-72b-instruct",
+  "endpoint": "http://chat.forgetyour.name/api",
+  "temperature": 0.3,
+  "top_p": 0.85,
+  "max_tokens": 2048,
+  "system_prompt": "You are a music production knowledge extractor...",
+  "score": 0.87,
+  "experiment": "chrysopedia-extraction-v2",
+  "exported_at": "2026-04-06T12:00:00Z"
+}
+```
+
+### .env
+```bash
+LLM_MODEL=qwen2.5-72b-instruct
+LLM_API_URL=http://chat.forgetyour.name/api
+LLM_TEMPERATURE=0.3
+LLM_TOP_P=0.85
+LLM_MAX_TOKENS=2048
+# Score: 0.87 | Experiment: chrysopedia-extraction-v2
+```
+
+### YAML
+```yaml
+model: qwen2.5-72b-instruct
+endpoint: http://chat.forgetyour.name/api
+parameters:
+  temperature: 0.3
+  top_p: 0.85
+  max_tokens: 2048
+system_prompt: |
+  You are a music production knowledge extractor...
+metadata:
+  score: 0.87
+  experiment: chrysopedia-extraction-v2
+  exported_at: 2026-04-06T12:00:00Z
+```
+
+---
+
+## Environment Variables
+
+| Group | Variable | Default | Notes |
+|-------|----------|---------|-------|
+| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
+| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
+| **Server** | `HOST` | `0.0.0.0` | Bind address |
+| **Server** | `PORT` | `8400` | HTTP port |
+| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
+| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
+| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
+| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
+| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
+| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
+| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
+| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
+| **MCP** | `MCP_PORT` | `8401` | MCP server port |
+
+---
+
+## Docker Compose (Production — XPLTD Conventions)
+
+Project name: `xpltd_promptlooper`
+Network: `promptlooper` (`172.33.0.0/24`)
+Persistent data: `/vmPool/r/services/promptlooper_*`
+PostgreSQL port: `5434` (external)
+Web UI port: `8400` (external)
+
+---
+
+## Technology Stack
+
+| Layer | Technology | Rationale |
+|-------|-----------|-----------|
+| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
+| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
+| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
+| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
+| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
+| **Styling** | Tailwind CSS | Fast iteration, utility-first |
+| **MCP** | Python MCP SDK | Standard protocol for agent integration |
+| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
+
+---
+
+## Development & Deployment
+
+### Local Development
+```bash
+git clone git@git.xpltd.co:xpltdco/promptlooper.git
+cd promptlooper
+cp .env.example .env
+docker compose up -d promptlooper-db promptlooper-redis
+cd backend && pip install -r requirements.txt
+alembic upgrade head
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+# In another terminal:
+cd frontend && npm install && npm run dev
+```
+
+### Production Deployment (ub01)
+```bash
+ssh ub01
+cd /vmPool/r/repos/xpltdco/promptlooper
+git pull && docker compose build && docker compose up -d
+```
+
+### Project Structure
+```
+promptlooper/
+├── backend/
+│   ├── main.py                 # FastAPI entry point
+│   ├── config.py               # Pydantic Settings
+│   ├── models.py               # SQLAlchemy ORM
+│   ├── schemas.py              # Pydantic request/response
+│   ├── auth.py                 # JWT + API key auth
+│   ├── worker.py               # Celery app config
+│   ├── routers/
+│   │   ├── auth.py
+│   │   ├── projects.py
+│   │   ├── experiments.py
+│   │   ├── runs.py
+│   │   ├── endpoints.py
+│   │   ├── export.py
+│   │   ├── webhooks.py
+│   │   └── admin.py
+│   ├── engine/
+│   │   ├── runner.py           # Run execution logic
+│   │   ├── sweep.py            # Sweep orchestration
+│   │   ├── cache.py            # Response cache layer
+│   │   ├── adapters/           # LLM endpoint adapters
+│   │   │   ├── openai_compat.py
+│   │   │   └── base.py
+│   │   └── scorers/            # Pluggable scoring functions
+│   │       ├── embedding.py
+│   │       ├── format.py
+│   │       ├── keyword.py
+│   │       ├── llm_judge.py
+│   │       └── base.py
+│   ├── mcp/
+│   │   ├── server.py           # MCP server implementation
+│   │   └── tools.py            # MCP tool definitions
+│   ├── websocket/
+│   │   └── manager.py          # WebSocket connection management
+│   └── tests/
+├── frontend/
+│   └── src/
+│       ├── pages/
+│       │   ├── Setup.tsx       # First-boot admin setup
+│       │   ├── Login.tsx
+│       │   ├── Dashboard.tsx   # Global activity
+│       │   ├── Projects.tsx
+│       │   ├── Experiment.tsx  # Experiment builder + config
+│       │   ├── Live.tsx        # Real-time observability
+│       │   ├── Compare.tsx     # Side-by-side run comparison
+│       │   └── Admin.tsx       # System settings
+│       ├── components/
+│       │   ├── Leaderboard.tsx
+│       │   ├── SteeringControls.tsx
+│       │   ├── RunCard.tsx
+│       │   ├── ScoreChart.tsx
+│       │   └── Timeline.tsx
+│       └── api/
+├── docker/
+│   ├── Dockerfile              # Multi-stage: API + frontend
+│   └── nginx.conf
+├── alembic/
+├── docker-compose.yml
+├── .env.example
+├── CLAUDE.md
+└── README.md
+```