73 changed files with 9900 additions and 2 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,68 @@
+# PromptLooper — Environment Variables
+# Copy to .env and adjust values for your deployment.
+
+# =============================================================================
+# Database
+# =============================================================================
+# PostgreSQL connection string for production mode.
+# When not set, PromptLooper uses SQLite at DATA_DIR/promptlooper.db (single-container mode).
+# DATABASE_URL=postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
+
+# =============================================================================
+# Redis
+# =============================================================================
+# Redis connection string for Celery task queue and pub/sub (live dashboard).
+# When not set, PromptLooper uses an in-process queue (single-container mode).
+# REDIS_URL=redis://promptlooper-redis:6379/0
+
+# =============================================================================
+# Server
+# =============================================================================
+# Bind address and port for the HTTP server.
+HOST=0.0.0.0
+PORT=8400
+
+# =============================================================================
+# Authentication
+# =============================================================================
+# Secret key used to sign JWT tokens. Auto-generated on first boot if not set.
+# IMPORTANT: Set this to a long random string in production.
+# JWT_SECRET=change-me-to-a-random-secret
+
+# Static API key for programmatic access (MCP, scripts, CI).
+# When not set, API key auth is disabled — only JWT login works.
+# API_KEY=
+
+# =============================================================================
+# Default LLM Endpoint
+# =============================================================================
+# Pre-configured LLM endpoint URL (OpenAI-compatible API).
+# Users can add more endpoints via the UI or API; this is a convenience default.
+# DEFAULT_ENDPOINT_URL=http://localhost:11434/v1
+
+# API key for the default endpoint, if required.
+# DEFAULT_ENDPOINT_KEY=
+
+# =============================================================================
+# Limits
+# =============================================================================
+# Maximum number of runs executing in parallel.
+MAX_CONCURRENT_RUNS=4
+
+# Token budget per sweep. 0 = unlimited.
+MAX_TOKENS_PER_SWEEP=0
+
+# =============================================================================
+# Storage
+# =============================================================================
+# Directory for SQLite database and file storage (single-container mode).
+DATA_DIR=/data
+
+# =============================================================================
+# MCP Server
+# =============================================================================
+# Enable the Model Context Protocol server for agent-driven workflows.
+MCP_ENABLED=true
+
+# Port for the MCP server (separate from the main API).
+MCP_PORT=8401
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,57 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.egg-info/
+*.egg
+dist/
+build/
+.eggs/
+*.whl
+.venv/
+venv/
+env/
+.env
+*.pyc
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+htmlcov/
+.coverage
+.coverage.*
+
+# Node / Frontend
+node_modules/
+frontend/dist/
+frontend/build/
+.npm
+*.tsbuildinfo
+
+# Docker
+docker/nginx.conf.bak
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+
+# OS
+Thumbs.db
+Desktop.ini
+
+# Data (single-container mode)
+*.db
+/data/
+
+# Alembic
+alembic/versions/__pycache__/
+
+# Auto Run Docs (Maestro working files)
+Auto Run Docs/Working/
+
+# Misc
+*.log
+*.bak
--- a/Docs/01-scaffold.md
+++ b/Docs/01-scaffold.md
@ -0,0 +1,48 @@
+# Phase 1 — Project Scaffold
+
+Set up the PromptLooper repository, Docker infrastructure, and basic project skeleton. Read `promptlooper-spec.md` and `CLAUDE.md` before starting any task.
+
+- [x] Initialize the git repository at git.xpltd.co/xpltdco/promptlooper with a README.md that includes the project description from the spec, a quick-start section showing the single-container docker run command, and badges for license (AGPL-3.0) and status. Add .gitignore for Python, Node, and Docker artifacts.
+  > NOTE: Git repo initialized locally with remote set to git@git.xpltd.co:xpltdco/promptlooper.git. Push failed — SSH key not configured for this host or repo not yet created on Gitea. Needs manual setup before pushing.
+
+- [x] Create the full directory structure as defined in the spec's Project Structure section. Every directory should exist with a placeholder __init__.py or .gitkeep as appropriate. Include backend/, frontend/, docker/, alembic/, and all subdirectories.
+  > Created all directories: backend/ (with routers/, engine/adapters/, engine/scorers/, mcp/, websocket/, tests/), frontend/src/ (pages/, components/, api/), docker/, alembic/versions/. Python packages have __init__.py, non-Python dirs have .gitkeep.
+
+- [x] Create .env.example with all environment variables from the spec's Environment Variables table, with sensible defaults and comments explaining each group. Include DATABASE_URL, REDIS_URL, JWT_SECRET, DEFAULT_ENDPOINT_URL, MAX_CONCURRENT_RUNS, and all others.
+  > Created .env.example with all 13 environment variables organized into 7 groups (Database, Redis, Server, Auth, Default LLM Endpoint, Limits, Storage, MCP). Production-only vars (DATABASE_URL, REDIS_URL, JWT_SECRET, API_KEY, DEFAULT_ENDPOINT_*) are commented out with explanatory notes. Single-container defaults work out of the box.
+
+- [x] Create docker-compose.yml following XPLTD conventions: project name xpltd_promptlooper, network promptlooper (172.33.0.0/24), PostgreSQL on port 5434, Redis, API service, worker service, and web service on port 8400. Use bind mounts under /vmPool/r/services/promptlooper_* for persistent data. Model this after Chrysopedia's docker-compose.yml patterns.
+  > Updated existing docker-compose.yml: fixed DATABASE_URL to use standard postgresql:// scheme (not asyncpg), hardcoded DB credentials instead of requiring .env vars, added API_KEY pass-through, added working_dir for worker service, made JWT_SECRET optional with dev default. All 5 services defined: db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400). Bind mounts under /vmPool/r/services/promptlooper_*. Health checks on db and redis with dependency conditions.
+
+- [x] Create the multi-stage Dockerfile in docker/ that builds both backend and frontend into a single image. Stage 1: Node build for frontend (npm ci && npm run build). Stage 2: Python runtime with uvicorn, copying the built frontend assets. Include nginx.conf that serves the frontend and proxies /api and /ws to uvicorn. The image should work standalone with SQLite when no DATABASE_URL is provided.
+  > Created 3-stage Dockerfile: (1) frontend-build with Node 20 Alpine, (2) api stage with Python 3.12-slim + uvicorn + static assets for single-container mode, (3) web stage with nginx 1.27 Alpine for production compose. nginx.conf proxies /api/ and /health to the API, upgrades /ws/ connections for WebSocket. Also created: backend/requirements.txt, frontend scaffolding (package.json, vite.config.ts, tsconfig.json, index.html, App.tsx, Tailwind config), and placeholder alembic.ini/env.py for Dockerfile COPY.
+
+- [x] Create backend/config.py using Pydantic Settings. Define all configuration from the Environment Variables table. Implement the SQLite fallback logic: when DATABASE_URL is not set, construct a SQLite URL pointing to DATA_DIR/promptlooper.db. When REDIS_URL is not set, set a flag for in-process mode.
+  > Created backend/config.py with Pydantic Settings class defining all 13 env vars. SQLite fallback via `effective_database_url` property constructs sqlite:///DATA_DIR/promptlooper.db when DATABASE_URL is unset. `use_in_process_queue` property flags in-process mode when REDIS_URL is absent. JWT_SECRET auto-generates via `secrets.token_urlsafe(32)` when not provided. Empty API_KEY strings normalize to None. 13 tests in tests/test_config.py all passing.
+
+- [x] Create backend/models.py with all SQLAlchemy ORM models from the spec's Data Model section: User, Project, Experiment, Run, StageResult, Score, ResponseCache, and WebhookConfig. Include all fields, types, relationships, and indexes. Use UUID primary keys and JSONB for flexible fields.
+  > Created all 8 ORM models with UUID PKs, JSON columns (using sqlalchemy.JSON for SQLite compatibility — maps to JSONB on PostgreSQL), enum types (ExperimentStatus, RunStatus), full relationship definitions with cascade deletes, and indexes on foreign keys and commonly filtered columns. Score.metadata mapped as `scorer_metadata` Python attribute (column name stays "metadata") to avoid SQLAlchemy reserved name conflict. 16 tests in tests/test_models.py all passing.
+
+- [x] Set up Alembic: create alembic.ini and alembic/env.py configured to read DATABASE_URL from the config. Generate and apply the initial migration from the models.
+  > Created alembic.ini with logging config and script_location pointing to alembic/. env.py reads DATABASE_URL from backend.config.settings (with override support for tests). Added script.py.mako template. Generated initial migration (e1909678e89e) with all 8 tables, indexes, foreign keys, and enums. Migration applies cleanly on SQLite (render_as_batch=True for SQLite compatibility). 5 tests in tests/test_alembic.py covering upgrade/downgrade/columns/indexes/FKs. All 34 backend tests pass.
+
+- [x] Create backend/schemas.py with Pydantic request/response schemas for all API endpoints. Include create/update/response schemas for Project, Experiment, Run, Endpoint, and Webhook. Include the Score input schema and export format schemas.
+  > Created backend/schemas.py with all Pydantic v2 schemas using ConfigDict(from_attributes=True) for ORM compatibility. Includes: Project (create/update/response/list), Experiment (create/update/response/list), Run (response/list/detail with nested stages+scores), StageResult (response), Score (input/response), Endpoint (create/update/response/list), Webhook (create/update/response/list), Auth (setup/login/token/user), Export (run row with scores dict, export response), and Health. 30 tests in tests/test_schemas.py all passing. All 64 backend tests pass.
+
+- [x] Create backend/main.py with the FastAPI application. Set up CORS middleware, mount all routers (even if they're stubs), configure the WebSocket endpoint, add the /health endpoint that checks DB and Redis connectivity, and add startup/shutdown lifecycle hooks.
+  > Created backend/main.py with: CORS middleware (allow all origins), /health endpoint checking DB (SELECT 1) and Redis (ping) connectivity, /ws WebSocket endpoint with ConnectionManager for real-time broadcasts, async lifespan hooks for DB engine + Redis init/teardown, get_db dependency yielding sessions, dynamic router mounting (silently skips missing routers). 10 tests in tests/test_main.py covering health, CORS, WebSocket connect/disconnect/echo, OpenAPI schema, 404s, broadcast, get_db, and get_redis. All 74 backend tests pass.
+
+- [x] Create backend/auth.py implementing JWT token generation/verification, API key validation, and the first-boot setup flow. The setup endpoint should check if any users exist — if not, accept username + password to create the admin account. Include a dependency function for route-level auth that supports both JWT and API key.
+  > Created backend/auth.py with: bcrypt password hashing via passlib, JWT token creation/verification (HS256, 24h expiry) using python-jose, first-boot `needs_setup()` + `create_admin()` flow (409 if admin exists), `authenticate_user()` for login, and `get_current_user` FastAPI dependency supporting both JWT Bearer tokens and X-Api-Key header (API key grants first admin user). UUID string-to-UUID conversion for SQLite compatibility. 21 tests in tests/test_auth.py covering hashing, JWT lifecycle, setup flow, login, and all auth dependency paths. All 95 backend tests pass.
+
+- [x] Scaffold all router files in backend/routers/ as stubs: auth.py, projects.py, experiments.py, runs.py, endpoints.py, export.py, webhooks.py, admin.py. Each should have the correct APIRouter prefix and tags, with placeholder endpoints that return 501 Not Implemented.
+  > Created all 8 router stubs with APIRouter instances, mounted via main.py's _mount_routers(). Endpoints match the spec: auth (3 endpoints), projects (5), experiments (9 incl. sweep/pause/resume/stop), runs (5 incl. leaderboard), endpoints (5 incl. test), export (4 formats), webhooks (3), admin (3). All return 501 Not Implemented. 37 tests in tests/test_routers.py verify every route is mounted and returns 501. All 132 backend tests pass.
+
+- [x] Initialize the frontend: run npm create vite@latest with React + TypeScript template. Install Tailwind CSS and configure it. Install react-router-dom for routing. Create the basic App.tsx with routes for Setup, Login, Dashboard, Projects, Experiment, Live, Compare, and Admin pages (all as placeholder components). Verify it builds cleanly.
+  > Frontend was already scaffolded with Vite + React + TypeScript + Tailwind + react-router-dom from the Dockerfile task. Added 8 placeholder page components (SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage, ComparePage, AdminPage) in frontend/src/pages/. Updated App.tsx with react-router-dom Routes and main.tsx with BrowserRouter. Unknown routes redirect to dashboard. Installed vitest + @testing-library/react for testing. 9 routing tests in App.test.tsx all passing. Build completes cleanly. All 132 backend tests still pass.
+
+- [x] Create frontend/src/api/client.ts with a typed API client using fetch. Include JWT token management (stored in memory, not localStorage), request/response interceptors for auth headers, and typed wrapper functions for each API endpoint group. Include WebSocket connection helper.
+  > Created frontend/src/api/client.ts with: TypeScript interfaces mirroring all backend Pydantic schemas, in-memory JWT token management (setToken/getToken/clearToken — never localStorage), automatic Authorization header injection on all requests, Content-Type header for POST/PUT bodies, ApiError class for non-ok responses, typed wrapper functions for all 8 endpoint groups (auth, projects, experiments, runs, endpoints, export, webhooks, admin) plus health check, and connectWebSocket() helper that derives ws/wss from current protocol and handles JSON message parsing. 39 tests in src/api/client.test.ts covering token management, header injection, all endpoint groups, error handling, and WebSocket lifecycle. All 48 frontend tests pass. All 132 backend tests still pass.
+
+- [x] Verify the full stack runs: docker compose up should start all services. The API should respond to /health. The frontend should load and show the setup screen (since no admin exists). The database migration should have run. Document any manual steps needed in the README.
+  > Created missing backend/worker.py (Celery app config for docker-compose worker service). Created docker/entrypoint.sh that runs `alembic upgrade head` before starting uvicorn, and updated Dockerfile to use it as ENTRYPOINT. Fixed README single-container quick-start (port 8000, not 8400) and added production compose docs (service list, first-boot instructions). Added 24 stack integration tests verifying all Docker/compose/nginx/frontend/alembic files are present and consistent, plus /health endpoint test. 3 worker tests confirm Celery config. All 159 backend + 48 frontend tests pass.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,127 @@
+# CLAUDE.md — PromptLooper
+
+## What is this project?
+
+PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
+
+## Repository
+
+- **Hosted at**: git.xpltd.co/xpltdco/promptlooper
+- **XPLTD project name**: `xpltd_promptlooper`
+- **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
+
+## Tech Stack
+
+- **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
+- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
+- **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
+- **Cache/Queue**: Redis 7 (production) / in-process (single-container)
+- **Real-time**: WebSocket via FastAPI + Redis pub/sub
+- **MCP**: Python MCP SDK
+- **Container**: Multi-stage Docker build, nginx for frontend
+
+## XPLTD Conventions
+
+These are non-negotiable project conventions shared across all XPLTD projects:
+
+- Docker Compose project name: `xpltd_promptlooper`
+- Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
+- Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
+- PostgreSQL on external port `5434` (internal `5432`)
+- Web UI on port `8400`
+- MCP server on port `8401`
+- Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
+
+## Key Architecture Decisions
+
+1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
+2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
+3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
+4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
+5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
+6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
+
+## File Organization
+
+```
+backend/
+  main.py              — FastAPI app, middleware, router mounting
+  config.py            — Pydantic Settings from env vars
+  models.py            — SQLAlchemy ORM models
+  schemas.py           — Pydantic request/response schemas
+  auth.py              — JWT + API key authentication
+  worker.py            — Celery app configuration
+  routers/             — API endpoint handlers
+  engine/              — Core experiment execution logic
+    runner.py          — Individual run execution
+    sweep.py           — Sweep orchestration (grid/random/guided)
+    cache.py           — Response cache layer
+    adapters/          — LLM endpoint adapters
+    scorers/           — Pluggable scoring functions
+  mcp/                 — MCP server implementation
+  websocket/           — WebSocket connection management
+
+frontend/src/
+  pages/               — Route-level components
+  components/          — Shared UI components
+  api/                 — Typed API client functions
+```
+
+## Database Migrations
+
+Use Alembic. Same patterns as Chrysopedia:
+```bash
+alembic revision --autogenerate -m "describe_change"
+alembic upgrade head
+```
+
+## Running Locally
+
+```bash
+docker compose up -d promptlooper-db promptlooper-redis
+cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
+# Frontend in another terminal:
+cd frontend && npm run dev
+```
+
+## Testing
+
+```bash
+cd backend && pytest
+cd frontend && npm test
+```
+
+## Important Patterns
+
+### Adding a new scorer
+1. Create `backend/engine/scorers/my_scorer.py`
+2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
+3. Register in `backend/engine/scorers/__init__.py`
+4. Add to frontend scorer picker component
+
+### Adding a new LLM adapter
+1. Create `backend/engine/adapters/my_adapter.py`
+2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
+3. Register in `backend/engine/adapters/__init__.py`
+4. Currently only OpenAI-compatible is implemented; all others should be edge cases
+
+### Adding a new MCP tool
+1. Add tool definition in `backend/mcp/tools.py`
+2. Implement handler in `backend/mcp/server.py`
+3. Tools should map 1:1 to API endpoints where possible
+
+## Common Gotchas
+
+- Always hash the FULL config when checking cache — missing a single parameter means cache misses
+- WebSocket connections must be cleaned up on disconnect — use the connection manager
+- SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
+- Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
+- MCP server runs on a separate port from the main API
+
+## Deployment
+
+```bash
+ssh ub01
+cd /vmPool/r/repos/xpltdco/promptlooper
+git pull && docker compose build && docker compose up -d
+```
--- a/README.md
+++ b/README.md
@ -1,3 +1,79 @@
-# promptlooper
+# PromptLooper

-Universal LLM pipeline tuning workbench — systematically optimize prompts, models, and inference parameters through cached experiments, pluggable scoring, and agent-driven sweeps via MCP.
+[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
+[![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
+
+> The one who loops prompts — a universal LLM pipeline tuning workbench.
+
+PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
+
+It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
+
+## Quick Start
+
+### Single Container (zero dependencies)
+
+```bash
+docker run -p 8000:8000 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
+```
+
+Open `http://localhost:8000` — you'll be prompted to create an admin account on first boot.
+
+> In single-container mode, the API serves the built frontend as static files at the root.
+> Database migrations run automatically on startup.
+
+### Production (Docker Compose)
+
+```bash
+git clone git@git.xpltd.co:xpltdco/promptlooper.git
+cd promptlooper
+cp .env.example .env
+# Edit .env — set JWT_SECRET at minimum
+docker compose up -d
+```
+
+Open `http://localhost:8400` — nginx proxies the frontend (port 80 → 8400) and API (`/api/` → port 8000).
+
+**Services started:**
+- `promptlooper-db` — PostgreSQL 16 on port 5434
+- `promptlooper-redis` — Redis 7
+- `promptlooper-api` — FastAPI + Alembic migrations (auto-runs on startup)
+- `promptlooper-worker` — Celery worker for experiment execution
+- `promptlooper-web` — Nginx reverse proxy on port 8400
+
+**First boot:** Navigate to `http://localhost:8400/setup` to create the admin account.
+
+## Features
+
+- **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
+- **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
+- **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
+- **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
+- **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
+- **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
+
+## Development
+
+```bash
+# Start backing services
+docker compose up -d promptlooper-db promptlooper-redis
+
+# Backend
+cd backend && pip install -r requirements.txt
+alembic upgrade head
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+
+# Frontend (separate terminal)
+cd frontend && npm install && npm run dev
+```
+
+## Testing
+
+```bash
+cd backend && pytest
+cd frontend && npm test
+```
+
+## License
+
+[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)
--- a/alembic.ini
+++ b/alembic.ini
@ -0,0 +1,39 @@
+[alembic]
+script_location = alembic
+# sqlalchemy.url is set programmatically in env.py from backend.config
+sqlalchemy.url =
+
+[post_write_hooks]
+
+[loggers]
+keys = root,sqlalchemy,alembic
+
+[handlers]
+keys = console
+
+[formatters]
+keys = generic
+
+[logger_root]
+level = WARN
+handlers = console
+
+[logger_sqlalchemy]
+level = WARN
+handlers =
+qualname = sqlalchemy.engine
+
+[logger_alembic]
+level = INFO
+handlers =
+qualname = alembic
+
+[handler_console]
+class = StreamHandler
+args = (sys.stderr,)
+level = NOTSET
+formatter = generic
+
+[formatter_generic]
+format = %(levelname)-5.5s [%(name)s] %(message)s
+datefmt = %H:%M:%S
--- a/alembic/env.py
+++ b/alembic/env.py
@ -0,0 +1,66 @@
+"""Alembic environment configuration for PromptLooper."""
+
+import sys
+from logging.config import fileConfig
+from pathlib import Path
+
+from alembic import context
+from sqlalchemy import engine_from_config, pool
+
+# Ensure the backend package is importable
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+
+from backend.config import settings
+from backend.models import Base
+
+config = context.config
+
+if config.config_file_name is not None:
+    fileConfig(config.config_file_name)
+
+# Use sqlalchemy.url from alembic config if already set (e.g. by tests),
+# otherwise fall back to application settings.
+if not config.get_main_option("sqlalchemy.url"):
+    config.set_main_option("sqlalchemy.url", settings.effective_database_url)
+
+target_metadata = Base.metadata
+
+
+def run_migrations_offline() -> None:
+    """Run migrations in 'offline' mode — emit SQL to stdout."""
+    url = config.get_main_option("sqlalchemy.url")
+    context.configure(
+        url=url,
+        target_metadata=target_metadata,
+        literal_binds=True,
+        dialect_opts={"paramstyle": "named"},
+        render_as_batch=True,
+    )
+
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+def run_migrations_online() -> None:
+    """Run migrations against a live database connection."""
+    connectable = engine_from_config(
+        config.get_section(config.config_ini_section, {}),
+        prefix="sqlalchemy.",
+        poolclass=pool.NullPool,
+    )
+
+    with connectable.connect() as connection:
+        context.configure(
+            connection=connection,
+            target_metadata=target_metadata,
+            render_as_batch=True,
+        )
+
+        with context.begin_transaction():
+            context.run_migrations()
+
+
+if context.is_offline_mode():
+    run_migrations_offline()
+else:
+    run_migrations_online()
--- a/alembic/script.py.mako
+++ b/alembic/script.py.mako
@ -0,0 +1,26 @@
+"""${message}
+
+Revision ID: ${up_revision}
+Revises: ${down_revision | comma,n}
+Create Date: ${create_date}
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+${imports if imports else ""}
+
+# revision identifiers, used by Alembic.
+revision: str = ${repr(up_revision)}
+down_revision: Union[str, None] = ${repr(down_revision)}
+branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
+depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
+
+
+def upgrade() -> None:
+    ${upgrades if upgrades else "pass"}
+
+
+def downgrade() -> None:
+    ${downgrades if downgrades else "pass"}
--- a/alembic/versions/.gitkeep
+++ b/alembic/versions/.gitkeep
--- a/alembic/versions/e1909678e89e_initial_schema.py
+++ b/alembic/versions/e1909678e89e_initial_schema.py
@ -0,0 +1,165 @@
+"""initial_schema
+
+Revision ID: e1909678e89e
+Revises: 
+Create Date: 2026-04-07 01:50:18.571150
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = 'e1909678e89e'
+down_revision: Union[str, None] = None
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table('response_cache',
+    sa.Column('config_hash', sa.String(length=64), nullable=False),
+    sa.Column('response', sa.Text(), nullable=False),
+    sa.Column('model', sa.String(length=255), nullable=False),
+    sa.Column('tokens_in', sa.Integer(), nullable=True),
+    sa.Column('tokens_out', sa.Integer(), nullable=True),
+    sa.Column('latency_ms', sa.Integer(), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.PrimaryKeyConstraint('config_hash')
+    )
+    op.create_table('users',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('username', sa.String(length=255), nullable=False),
+    sa.Column('password_hash', sa.String(length=255), nullable=False),
+    sa.Column('is_admin', sa.Boolean(), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.PrimaryKeyConstraint('id'),
+    sa.UniqueConstraint('username')
+    )
+    op.create_table('webhook_configs',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('event_type', sa.String(length=255), nullable=False),
+    sa.Column('url', sa.String(length=2048), nullable=False),
+    sa.Column('headers', sa.JSON(), nullable=True),
+    sa.Column('is_active', sa.Boolean(), nullable=False),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
+        batch_op.create_index('ix_webhook_configs_event_type', ['event_type'], unique=False)
+
+    op.create_table('projects',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('name', sa.String(length=255), nullable=False),
+    sa.Column('description', sa.Text(), nullable=True),
+    sa.Column('owner_id', sa.Uuid(), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['owner_id'], ['users.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    op.create_table('experiments',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('project_id', sa.Uuid(), nullable=False),
+    sa.Column('name', sa.String(length=255), nullable=False),
+    sa.Column('description', sa.Text(), nullable=True),
+    sa.Column('sample_data', sa.JSON(), nullable=True),
+    sa.Column('pipeline_stages', sa.JSON(), nullable=True),
+    sa.Column('scoring_config', sa.JSON(), nullable=True),
+    sa.Column('parameter_space', sa.JSON(), nullable=True),
+    sa.Column('status', sa.Enum('draft', 'running', 'paused', 'completed', name='experiment_status'), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['project_id'], ['projects.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('experiments', schema=None) as batch_op:
+        batch_op.create_index('ix_experiments_project_id', ['project_id'], unique=False)
+        batch_op.create_index('ix_experiments_status', ['status'], unique=False)
+
+    op.create_table('runs',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('experiment_id', sa.Uuid(), nullable=False),
+    sa.Column('config_hash', sa.String(length=64), nullable=False),
+    sa.Column('config', sa.JSON(), nullable=False),
+    sa.Column('status', sa.Enum('pending', 'running', 'completed', 'failed', 'cached', name='run_status'), nullable=False),
+    sa.Column('started_at', sa.DateTime(timezone=True), nullable=True),
+    sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),
+    sa.Column('duration_ms', sa.Integer(), nullable=True),
+    sa.Column('tokens_in', sa.Integer(), nullable=True),
+    sa.Column('tokens_out', sa.Integer(), nullable=True),
+    sa.Column('cost_estimate', sa.Numeric(precision=12, scale=6), nullable=True),
+    sa.ForeignKeyConstraint(['experiment_id'], ['experiments.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('runs', schema=None) as batch_op:
+        batch_op.create_index('ix_runs_config_hash', ['config_hash'], unique=False)
+        batch_op.create_index('ix_runs_experiment_id', ['experiment_id'], unique=False)
+        batch_op.create_index('ix_runs_status', ['status'], unique=False)
+
+    op.create_table('scores',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('run_id', sa.Uuid(), nullable=False),
+    sa.Column('scorer_name', sa.String(length=255), nullable=False),
+    sa.Column('value', sa.Float(), nullable=False),
+    sa.Column('metadata', sa.JSON(), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('scores', schema=None) as batch_op:
+        batch_op.create_index('ix_scores_run_id', ['run_id'], unique=False)
+        batch_op.create_index('ix_scores_scorer_name', ['scorer_name'], unique=False)
+
+    op.create_table('stage_results',
+    sa.Column('id', sa.Uuid(), nullable=False),
+    sa.Column('run_id', sa.Uuid(), nullable=False),
+    sa.Column('stage_index', sa.Integer(), nullable=False),
+    sa.Column('prompt_sent', sa.Text(), nullable=False),
+    sa.Column('response_raw', sa.Text(), nullable=False),
+    sa.Column('model_used', sa.String(length=255), nullable=False),
+    sa.Column('parameters', sa.JSON(), nullable=True),
+    sa.Column('tokens_in', sa.Integer(), nullable=True),
+    sa.Column('tokens_out', sa.Integer(), nullable=True),
+    sa.Column('latency_ms', sa.Integer(), nullable=True),
+    sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('stage_results', schema=None) as batch_op:
+        batch_op.create_index('ix_stage_results_run_id', ['run_id'], unique=False)
+
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('stage_results', schema=None) as batch_op:
+        batch_op.drop_index('ix_stage_results_run_id')
+
+    op.drop_table('stage_results')
+    with op.batch_alter_table('scores', schema=None) as batch_op:
+        batch_op.drop_index('ix_scores_scorer_name')
+        batch_op.drop_index('ix_scores_run_id')
+
+    op.drop_table('scores')
+    with op.batch_alter_table('runs', schema=None) as batch_op:
+        batch_op.drop_index('ix_runs_status')
+        batch_op.drop_index('ix_runs_experiment_id')
+        batch_op.drop_index('ix_runs_config_hash')
+
+    op.drop_table('runs')
+    with op.batch_alter_table('experiments', schema=None) as batch_op:
+        batch_op.drop_index('ix_experiments_status')
+        batch_op.drop_index('ix_experiments_project_id')
+
+    op.drop_table('experiments')
+    op.drop_table('projects')
+    with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
+        batch_op.drop_index('ix_webhook_configs_event_type')
+
+    op.drop_table('webhook_configs')
+    op.drop_table('users')
+    op.drop_table('response_cache')
+    # ### end Alembic commands ###
--- a/backend/init.py
+++ b/backend/init.py
--- a/backend/auth.py
+++ b/backend/auth.py
@ -0,0 +1,154 @@
+"""PromptLooper authentication — JWT tokens, API keys, first-boot setup."""
+
+import uuid as _uuid
+from datetime import datetime, timedelta, timezone
+from typing import Generator
+
+from fastapi import Depends, HTTPException, Header, status
+from jose import JWTError, jwt
+from passlib.context import CryptContext
+from sqlalchemy.orm import Session
+
+from config import settings
+from models import User
+
+# ---------------------------------------------------------------------------
+# Password hashing
+# ---------------------------------------------------------------------------
+
+pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
+
+
+def hash_password(password: str) -> str:
+    return pwd_context.hash(password)
+
+
+def verify_password(plain: str, hashed: str) -> bool:
+    return pwd_context.verify(plain, hashed)
+
+
+# ---------------------------------------------------------------------------
+# JWT
+# ---------------------------------------------------------------------------
+
+ALGORITHM = "HS256"
+ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 24  # 24 hours
+
+
+def create_access_token(user_id: str, *, expires_delta: timedelta | None = None) -> str:
+    expire = datetime.now(timezone.utc) + (expires_delta or timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES))
+    payload = {"sub": user_id, "exp": expire}
+    return jwt.encode(payload, settings.jwt_secret, algorithm=ALGORITHM)
+
+
+def decode_access_token(token: str) -> str:
+    """Return the user_id (sub) from a valid JWT, or raise."""
+    try:
+        payload = jwt.decode(token, settings.jwt_secret, algorithms=[ALGORITHM])
+        user_id: str | None = payload.get("sub")
+        if user_id is None:
+            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
+        return user_id
+    except JWTError:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
+
+
+# ---------------------------------------------------------------------------
+# First-boot setup
+# ---------------------------------------------------------------------------
+
+def needs_setup(db: Session) -> bool:
+    """Return True if no users exist yet (first-boot state)."""
+    return db.query(User).count() == 0
+
+
+def create_admin(db: Session, username: str, password: str) -> User:
+    """Create the first admin user. Raises if users already exist."""
+    if not needs_setup(db):
+        raise HTTPException(
+            status_code=status.HTTP_409_CONFLICT,
+            detail="Admin account already exists",
+        )
+    user = User(
+        username=username,
+        password_hash=hash_password(password),
+        is_admin=True,
+    )
+    db.add(user)
+    db.commit()
+    db.refresh(user)
+    return user
+
+
+# ---------------------------------------------------------------------------
+# Authenticate (login)
+# ---------------------------------------------------------------------------
+
+def authenticate_user(db: Session, username: str, password: str) -> User:
+    """Verify credentials and return the User, or raise 401."""
+    user = db.query(User).filter(User.username == username).first()
+    if user is None or not verify_password(password, user.password_hash):
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials")
+    return user
+
+
+# ---------------------------------------------------------------------------
+# Database session dependency (local to avoid circular import with main.py)
+# ---------------------------------------------------------------------------
+
+def _get_db() -> Generator[Session, None, None]:
+    """Yield a DB session. Imported lazily from main to avoid circular import."""
+    from main import get_db
+    yield from get_db()
+
+
+# ---------------------------------------------------------------------------
+# Dependency: get current user (JWT or API key)
+# ---------------------------------------------------------------------------
+
+def get_current_user(
+    authorization: str | None = Header(None),
+    x_api_key: str | None = Header(None),
+    db: Session = Depends(_get_db),
+) -> User:
+    """FastAPI dependency — resolve the current user from JWT Bearer token or API key.
+
+    Priority:
+    1. X-Api-Key header — matched against settings.api_key (grants first admin).
+    2. Authorization: Bearer <jwt> — decoded to get user_id.
+    """
+    # --- API key path ---
+    if x_api_key is not None:
+        if settings.api_key is None or x_api_key != settings.api_key:
+            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key")
+        # API key grants the first admin user
+        admin = db.query(User).filter(User.is_admin.is_(True)).first()
+        if admin is None:
+            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="No admin user exists")
+        return admin
+
+    # --- JWT path ---
+    if authorization is None:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Missing authentication",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+    scheme, _, token = authorization.partition(" ")
+    if scheme.lower() != "bearer" or not token:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid authorization header",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+    user_id_str = decode_access_token(token)
+    try:
+        user_id = _uuid.UUID(user_id_str)
+    except ValueError:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
+    user = db.query(User).filter(User.id == user_id).first()
+    if user is None:
+        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="User not found")
+    return user
--- a/backend/config.py
+++ b/backend/config.py
@ -0,0 +1,76 @@
+"""PromptLooper configuration — Pydantic Settings loaded from environment."""
+
+import secrets
+from pathlib import Path
+
+from pydantic import field_validator
+from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(
+        env_file=".env",
+        env_file_encoding="utf-8",
+        extra="ignore",
+    )
+
+    # --- Database ---
+    database_url: str | None = None
+
+    # --- Redis ---
+    redis_url: str | None = None
+
+    # --- Server ---
+    host: str = "0.0.0.0"
+    port: int = 8400
+
+    # --- Auth ---
+    jwt_secret: str = ""
+    api_key: str | None = None
+
+    # --- Default LLM Endpoint ---
+    default_endpoint_url: str | None = None
+    default_endpoint_key: str | None = None
+
+    # --- Limits ---
+    max_concurrent_runs: int = 4
+    max_tokens_per_sweep: int = 0  # 0 = unlimited
+
+    # --- Storage ---
+    data_dir: str = "/data"
+
+    # --- MCP ---
+    mcp_enabled: bool = True
+    mcp_port: int = 8401
+
+    def model_post_init(self, __context: object) -> None:
+        # Auto-generate JWT secret if not provided
+        if not self.jwt_secret:
+            self.jwt_secret = secrets.token_urlsafe(32)
+
+    @property
+    def effective_database_url(self) -> str:
+        """Return DATABASE_URL or construct a SQLite URL from DATA_DIR."""
+        if self.database_url:
+            return self.database_url
+        db_path = Path(self.data_dir) / "promptlooper.db"
+        return f"sqlite:///{db_path}"
+
+    @property
+    def is_sqlite(self) -> bool:
+        return self.effective_database_url.startswith("sqlite")
+
+    @property
+    def use_in_process_queue(self) -> bool:
+        """When Redis is unavailable, use in-process task execution."""
+        return self.redis_url is None
+
+    @field_validator("api_key", mode="before")
+    @classmethod
+    def empty_string_to_none(cls, v: str | None) -> str | None:
+        if v is not None and v.strip() == "":
+            return None
+        return v
+
+
+settings = Settings()
--- a/backend/engine/init.py
+++ b/backend/engine/init.py
--- a/backend/engine/adapters/init.py
+++ b/backend/engine/adapters/init.py
--- a/backend/engine/scorers/init.py
+++ b/backend/engine/scorers/init.py
--- a/backend/main.py
+++ b/backend/main.py
@ -0,0 +1,211 @@
+"""PromptLooper FastAPI application."""
+
+from contextlib import asynccontextmanager
+from typing import AsyncGenerator
+
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
+from fastapi.middleware.cors import CORSMiddleware
+from sqlalchemy import create_engine, text
+from sqlalchemy.orm import sessionmaker
+
+from config import settings
+
+
+# ---------------------------------------------------------------------------
+# Database engine & session factory (lazy, created at startup)
+# ---------------------------------------------------------------------------
+
+engine = None
+SessionLocal = None
+
+
+def _init_db() -> None:
+    """Create the SQLAlchemy engine and session factory."""
+    global engine, SessionLocal
+    connect_args = {}
+    if settings.is_sqlite:
+        connect_args["check_same_thread"] = False
+    engine = create_engine(
+        settings.effective_database_url,
+        connect_args=connect_args,
+    )
+    SessionLocal = sessionmaker(bind=engine, autoflush=False, expire_on_commit=False)
+
+
+def get_db():
+    """FastAPI dependency that yields a database session."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+# ---------------------------------------------------------------------------
+# Redis helper
+# ---------------------------------------------------------------------------
+
+_redis_client = None
+
+
+def _init_redis() -> None:
+    """Connect to Redis if configured."""
+    global _redis_client
+    if not settings.redis_url:
+        _redis_client = None
+        return
+    import redis as redis_lib
+    _redis_client = redis_lib.Redis.from_url(settings.redis_url, decode_responses=True)
+
+
+def get_redis():
+    """Return the Redis client (or None in single-container mode)."""
+    return _redis_client
+
+
+# ---------------------------------------------------------------------------
+# WebSocket connection manager
+# ---------------------------------------------------------------------------
+
+class ConnectionManager:
+    """Manage active WebSocket connections."""
+
+    def __init__(self) -> None:
+        self.active_connections: list[WebSocket] = []
+
+    async def connect(self, websocket: WebSocket) -> None:
+        await websocket.accept()
+        self.active_connections.append(websocket)
+
+    def disconnect(self, websocket: WebSocket) -> None:
+        self.active_connections.remove(websocket)
+
+    async def broadcast(self, message: dict) -> None:
+        for connection in list(self.active_connections):
+            try:
+                await connection.send_json(message)
+            except Exception:
+                self.disconnect(connection)
+
+
+ws_manager = ConnectionManager()
+
+
+# ---------------------------------------------------------------------------
+# Lifecycle
+# ---------------------------------------------------------------------------
+
+@asynccontextmanager
+async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
+    """Startup and shutdown lifecycle hooks."""
+    _init_db()
+    _init_redis()
+    yield
+    # Shutdown: clean up connections
+    if _redis_client is not None:
+        _redis_client.close()
+    if engine is not None:
+        engine.dispose()
+
+
+# ---------------------------------------------------------------------------
+# Application
+# ---------------------------------------------------------------------------
+
+app = FastAPI(
+    title="PromptLooper",
+    description="LLM pipeline tuning workbench",
+    version="0.1.0",
+    lifespan=lifespan,
+)
+
+# CORS — allow all origins in development; tighten in production via env
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# ---------------------------------------------------------------------------
+# Health endpoint
+# ---------------------------------------------------------------------------
+
+@app.get("/health", tags=["system"])
+def health_check() -> dict:
+    """Check DB and Redis connectivity."""
+    db_ok = False
+    redis_ok = False
+
+    # Database check
+    if SessionLocal is not None:
+        try:
+            with SessionLocal() as session:
+                session.execute(text("SELECT 1"))
+            db_ok = True
+        except Exception:
+            pass
+
+    # Redis check
+    if not settings.redis_url:
+        redis_ok = True  # No Redis needed — in-process mode
+    elif _redis_client is not None:
+        try:
+            _redis_client.ping()
+            redis_ok = True
+        except Exception:
+            pass
+
+    return {"status": "ok" if (db_ok and redis_ok) else "degraded", "database": db_ok, "redis": redis_ok}
+
+
+# ---------------------------------------------------------------------------
+# WebSocket endpoint
+# ---------------------------------------------------------------------------
+
+@app.websocket("/ws")
+async def websocket_endpoint(websocket: WebSocket) -> None:
+    """WebSocket connection for real-time dashboard updates."""
+    await ws_manager.connect(websocket)
+    try:
+        while True:
+            # Keep connection alive; handle incoming messages if needed
+            data = await websocket.receive_json()
+            # Echo back or handle client messages in future
+            await websocket.send_json({"type": "ack", "data": data})
+    except WebSocketDisconnect:
+        ws_manager.disconnect(websocket)
+
+
+# ---------------------------------------------------------------------------
+# Mount routers (stubs — actual implementations come later)
+# ---------------------------------------------------------------------------
+
+# Router imports are deferred to avoid circular imports and allow
+# stub files to be created independently.  Each router will be mounted
+# as it is implemented.  For now we register empty prefixes.
+
+def _mount_routers() -> None:
+    """Import and mount all routers. Silently skip missing ones."""
+    router_configs = [
+        ("routers.auth", "/api/auth", ["auth"]),
+        ("routers.projects", "/api/projects", ["projects"]),
+        ("routers.experiments", "/api/experiments", ["experiments"]),
+        ("routers.runs", "/api/runs", ["runs"]),
+        ("routers.endpoints", "/api/endpoints", ["endpoints"]),
+        ("routers.export", "/api/export", ["export"]),
+        ("routers.webhooks", "/api/webhooks", ["webhooks"]),
+        ("routers.admin", "/api/admin", ["admin"]),
+    ]
+    for module_name, prefix, tags in router_configs:
+        try:
+            import importlib
+            mod = importlib.import_module(module_name)
+            app.include_router(mod.router, prefix=prefix, tags=tags)
+        except (ImportError, AttributeError):
+            pass  # Router not yet implemented
+
+
+_mount_routers()
--- a/backend/mcp/init.py
+++ b/backend/mcp/init.py
--- a/backend/models.py
+++ b/backend/models.py
@ -0,0 +1,276 @@
+"""PromptLooper SQLAlchemy ORM models."""
+
+import enum
+import uuid
+from datetime import datetime, timezone
+
+from sqlalchemy import (
+    JSON,
+    Boolean,
+    DateTime,
+    Enum,
+    Float,
+    ForeignKey,
+    Index,
+    Integer,
+    Numeric,
+    String,
+    Text,
+)
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
+
+
+def _utcnow() -> datetime:
+    return datetime.now(timezone.utc)
+
+
+def _new_uuid() -> uuid.UUID:
+    return uuid.uuid4()
+
+
+# ---------------------------------------------------------------------------
+# Base
+# ---------------------------------------------------------------------------
+
+class Base(DeclarativeBase):
+    """Shared declarative base for all models."""
+
+    type_annotation_map = {
+        dict: JSON,
+    }
+
+
+# ---------------------------------------------------------------------------
+# Enums
+# ---------------------------------------------------------------------------
+
+class ExperimentStatus(str, enum.Enum):
+    draft = "draft"
+    running = "running"
+    paused = "paused"
+    completed = "completed"
+
+
+class RunStatus(str, enum.Enum):
+    pending = "pending"
+    running = "running"
+    completed = "completed"
+    failed = "failed"
+    cached = "cached"
+
+
+# ---------------------------------------------------------------------------
+# Models
+# ---------------------------------------------------------------------------
+
+class User(Base):
+    __tablename__ = "users"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    username: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
+    password_hash: Mapped[str] = mapped_column(String(255), nullable=False)
+    is_admin: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, nullable=False
+    )
+
+    # Relationships
+    projects: Mapped[list["Project"]] = relationship(
+        back_populates="owner", cascade="all, delete-orphan"
+    )
+
+
+class Project(Base):
+    __tablename__ = "projects"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    name: Mapped[str] = mapped_column(String(255), nullable=False)
+    description: Mapped[str | None] = mapped_column(Text, nullable=True)
+    owner_id: Mapped[uuid.UUID] = mapped_column(
+        ForeignKey("users.id", ondelete="CASCADE"), nullable=False
+    )
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, nullable=False
+    )
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
+    )
+
+    # Relationships
+    owner: Mapped["User"] = relationship(back_populates="projects")
+    experiments: Mapped[list["Experiment"]] = relationship(
+        back_populates="project", cascade="all, delete-orphan"
+    )
+
+
+class Experiment(Base):
+    __tablename__ = "experiments"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    project_id: Mapped[uuid.UUID] = mapped_column(
+        ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
+    )
+    name: Mapped[str] = mapped_column(String(255), nullable=False)
+    description: Mapped[str | None] = mapped_column(Text, nullable=True)
+    sample_data: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    pipeline_stages: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    scoring_config: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    parameter_space: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    status: Mapped[ExperimentStatus] = mapped_column(
+        Enum(ExperimentStatus, name="experiment_status"),
+        default=ExperimentStatus.draft,
+        nullable=False,
+    )
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, nullable=False
+    )
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
+    )
+
+    # Relationships
+    project: Mapped["Project"] = relationship(back_populates="experiments")
+    runs: Mapped[list["Run"]] = relationship(
+        back_populates="experiment", cascade="all, delete-orphan"
+    )
+
+    __table_args__ = (
+        Index("ix_experiments_project_id", "project_id"),
+        Index("ix_experiments_status", "status"),
+    )
+
+
+class Run(Base):
+    __tablename__ = "runs"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    experiment_id: Mapped[uuid.UUID] = mapped_column(
+        ForeignKey("experiments.id", ondelete="CASCADE"), nullable=False
+    )
+    config_hash: Mapped[str] = mapped_column(String(64), nullable=False)
+    config: Mapped[dict] = mapped_column(JSON, nullable=False)
+    status: Mapped[RunStatus] = mapped_column(
+        Enum(RunStatus, name="run_status"),
+        default=RunStatus.pending,
+        nullable=False,
+    )
+    started_at: Mapped[datetime | None] = mapped_column(
+        DateTime(timezone=True), nullable=True
+    )
+    completed_at: Mapped[datetime | None] = mapped_column(
+        DateTime(timezone=True), nullable=True
+    )
+    duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    cost_estimate: Mapped[float | None] = mapped_column(
+        Numeric(precision=12, scale=6), nullable=True
+    )
+
+    # Relationships
+    experiment: Mapped["Experiment"] = relationship(back_populates="runs")
+    stage_results: Mapped[list["StageResult"]] = relationship(
+        back_populates="run", cascade="all, delete-orphan"
+    )
+    scores: Mapped[list["Score"]] = relationship(
+        back_populates="run", cascade="all, delete-orphan"
+    )
+
+    __table_args__ = (
+        Index("ix_runs_experiment_id", "experiment_id"),
+        Index("ix_runs_config_hash", "config_hash"),
+        Index("ix_runs_status", "status"),
+    )
+
+
+class StageResult(Base):
+    __tablename__ = "stage_results"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    run_id: Mapped[uuid.UUID] = mapped_column(
+        ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
+    )
+    stage_index: Mapped[int] = mapped_column(Integer, nullable=False)
+    prompt_sent: Mapped[str] = mapped_column(Text, nullable=False)
+    response_raw: Mapped[str] = mapped_column(Text, nullable=False)
+    model_used: Mapped[str] = mapped_column(String(255), nullable=False)
+    parameters: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
+
+    # Relationships
+    run: Mapped["Run"] = relationship(back_populates="stage_results")
+
+    __table_args__ = (
+        Index("ix_stage_results_run_id", "run_id"),
+    )
+
+
+class Score(Base):
+    __tablename__ = "scores"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    run_id: Mapped[uuid.UUID] = mapped_column(
+        ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
+    )
+    scorer_name: Mapped[str] = mapped_column(String(255), nullable=False)
+    value: Mapped[float] = mapped_column(Float, nullable=False)
+    scorer_metadata: Mapped[dict | None] = mapped_column(
+        "metadata", JSON, nullable=True
+    )
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, nullable=False
+    )
+
+    # Relationships
+    run: Mapped["Run"] = relationship(back_populates="scores")
+
+    __table_args__ = (
+        Index("ix_scores_run_id", "run_id"),
+        Index("ix_scores_scorer_name", "scorer_name"),
+    )
+
+
+class ResponseCache(Base):
+    __tablename__ = "response_cache"
+
+    config_hash: Mapped[str] = mapped_column(
+        String(64), primary_key=True
+    )
+    response: Mapped[str] = mapped_column(Text, nullable=False)
+    model: Mapped[str] = mapped_column(String(255), nullable=False)
+    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, nullable=False
+    )
+
+
+class WebhookConfig(Base):
+    __tablename__ = "webhook_configs"
+
+    id: Mapped[uuid.UUID] = mapped_column(
+        primary_key=True, default=_new_uuid
+    )
+    event_type: Mapped[str] = mapped_column(String(255), nullable=False)
+    url: Mapped[str] = mapped_column(String(2048), nullable=False)
+    headers: Mapped[dict | None] = mapped_column(JSON, nullable=True)
+    is_active: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
+
+    __table_args__ = (
+        Index("ix_webhook_configs_event_type", "event_type"),
+    )
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -0,0 +1,16 @@
+# PromptLooper — Backend Dependencies
+fastapi>=0.115,<1.0
+uvicorn[standard]>=0.32,<1.0
+sqlalchemy>=2.0,<3.0
+alembic>=1.14,<2.0
+pydantic>=2.0,<3.0
+pydantic-settings>=2.0,<3.0
+python-jose[cryptography]>=3.3,<4.0
+passlib[bcrypt]>=1.7,<2.0
+celery>=5.4,<6.0
+redis>=5.0,<6.0
+httpx>=0.27,<1.0
+websockets>=13.0,<14.0
+psycopg2-binary>=2.9,<3.0
+aiosqlite>=0.20,<1.0
+python-multipart>=0.0.9
--- a/backend/routers/init.py
+++ b/backend/routers/init.py
--- a/backend/routers/admin.py
+++ b/backend/routers/admin.py
@ -0,0 +1,23 @@
+"""Admin router — system settings and stats."""
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/settings", status_code=501)
+def get_settings():
+    """System settings (guest access, default model, etc.)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.put("/settings", status_code=501)
+def update_settings():
+    """Update settings."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/stats", status_code=501)
+def get_stats():
+    """System-wide stats (total runs, cache hit rate, etc.)."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/auth.py
+++ b/backend/routers/auth.py
@ -0,0 +1,23 @@
+"""Auth router — setup, login, and current user info."""
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.post("/setup", status_code=501)
+def setup():
+    """First-boot admin password setup."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/login", status_code=501)
+def login():
+    """Login, returns JWT."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/me", status_code=501)
+def me():
+    """Current user info."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/endpoints.py
+++ b/backend/routers/endpoints.py
@ -0,0 +1,37 @@
+"""Endpoints router — LLM target management."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/", status_code=501)
+def list_endpoints():
+    """List configured LLM endpoints."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/", status_code=501)
+def create_endpoint():
+    """Add endpoint (URL, API key, label)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.put("/{endpoint_id}", status_code=501)
+def update_endpoint(endpoint_id: uuid.UUID):
+    """Update endpoint."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.delete("/{endpoint_id}", status_code=501)
+def delete_endpoint(endpoint_id: uuid.UUID):
+    """Remove endpoint."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{endpoint_id}/test", status_code=501)
+def test_endpoint(endpoint_id: uuid.UUID):
+    """Test connectivity and list available models."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/experiments.py
+++ b/backend/routers/experiments.py
@ -0,0 +1,61 @@
+"""Experiments router — CRUD and sweep controls."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/", status_code=501)
+def list_experiments():
+    """List experiments (filter by project)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/", status_code=501)
+def create_experiment():
+    """Create experiment."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/{experiment_id}", status_code=501)
+def get_experiment(experiment_id: uuid.UUID):
+    """Experiment detail with run summaries."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.put("/{experiment_id}", status_code=501)
+def update_experiment(experiment_id: uuid.UUID):
+    """Update experiment config."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.delete("/{experiment_id}", status_code=501)
+def delete_experiment(experiment_id: uuid.UUID):
+    """Delete experiment."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{experiment_id}/sweep", status_code=501)
+def start_sweep(experiment_id: uuid.UUID):
+    """Start a sweep (grid, random, or guided)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{experiment_id}/pause", status_code=501)
+def pause_sweep(experiment_id: uuid.UUID):
+    """Pause running sweep."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{experiment_id}/resume", status_code=501)
+def resume_sweep(experiment_id: uuid.UUID):
+    """Resume paused sweep."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{experiment_id}/stop", status_code=501)
+def stop_sweep(experiment_id: uuid.UUID):
+    """Stop sweep."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/export.py
+++ b/backend/routers/export.py
@ -0,0 +1,31 @@
+"""Export router — export experiment results in various formats."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/experiments/{experiment_id}/best", status_code=501)
+def export_best(experiment_id: uuid.UUID):
+    """Best config as JSON."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/experiments/{experiment_id}/env", status_code=501)
+def export_env(experiment_id: uuid.UUID):
+    """Best config as .env snippet."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/experiments/{experiment_id}/yaml", status_code=501)
+def export_yaml(experiment_id: uuid.UUID):
+    """Best config as YAML."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/experiments/{experiment_id}/report", status_code=501)
+def export_report(experiment_id: uuid.UUID):
+    """Full experiment report (markdown)."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/projects.py
+++ b/backend/routers/projects.py
@ -0,0 +1,37 @@
+"""Projects router — CRUD for projects."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/", status_code=501)
+def list_projects():
+    """List projects."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/", status_code=501)
+def create_project():
+    """Create project."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/{project_id}", status_code=501)
+def get_project(project_id: uuid.UUID):
+    """Project detail with experiment summaries."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.put("/{project_id}", status_code=501)
+def update_project(project_id: uuid.UUID):
+    """Update project."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.delete("/{project_id}", status_code=501)
+def delete_project(project_id: uuid.UUID):
+    """Delete project and all experiments."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/runs.py
+++ b/backend/routers/runs.py
@ -0,0 +1,37 @@
+"""Runs router — execute, detail, score, and leaderboard."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/experiments/{experiment_id}/runs", status_code=501)
+def list_runs(experiment_id: uuid.UUID):
+    """List runs with scores (sortable, filterable)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/{run_id}", status_code=501)
+def get_run(run_id: uuid.UUID):
+    """Run detail with stage results."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/", status_code=501)
+def create_run():
+    """Execute a single run (ad-hoc)."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/{run_id}/score", status_code=501)
+def score_run(run_id: uuid.UUID):
+    """Add human rating to a run."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.get("/experiments/{experiment_id}/leaderboard", status_code=501)
+def leaderboard(experiment_id: uuid.UUID):
+    """Top runs ranked by weighted score."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/webhooks.py
+++ b/backend/routers/webhooks.py
@ -0,0 +1,25 @@
+"""Webhooks router — manage webhook configurations."""
+
+import uuid
+
+from fastapi import APIRouter, Response
+
+router = APIRouter()
+
+
+@router.get("/", status_code=501)
+def list_webhooks():
+    """List webhook configs."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.post("/", status_code=501)
+def create_webhook():
+    """Create webhook."""
+    return Response(status_code=501, content="Not Implemented")
+
+
+@router.delete("/{webhook_id}", status_code=501)
+def delete_webhook(webhook_id: uuid.UUID):
+    """Remove webhook."""
+    return Response(status_code=501, content="Not Implemented")
--- a/backend/schemas.py
+++ b/backend/schemas.py
@ -0,0 +1,298 @@
+"""PromptLooper Pydantic request/response schemas."""
+
+import uuid
+from datetime import datetime
+
+from pydantic import BaseModel, ConfigDict, Field
+
+from models import ExperimentStatus, RunStatus
+
+
+# ---------------------------------------------------------------------------
+# Shared mixins
+# ---------------------------------------------------------------------------
+
+class _TimestampMixin(BaseModel):
+    created_at: datetime
+    updated_at: datetime
+
+
+# ---------------------------------------------------------------------------
+# Project
+# ---------------------------------------------------------------------------
+
+class ProjectCreate(BaseModel):
+    name: str = Field(..., min_length=1, max_length=255)
+    description: str | None = None
+
+
+class ProjectUpdate(BaseModel):
+    name: str | None = Field(None, min_length=1, max_length=255)
+    description: str | None = None
+
+
+class ProjectResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    name: str
+    description: str | None
+    owner_id: uuid.UUID
+    created_at: datetime
+    updated_at: datetime
+
+
+class ProjectListResponse(BaseModel):
+    items: list[ProjectResponse]
+    total: int
+
+
+# ---------------------------------------------------------------------------
+# Experiment
+# ---------------------------------------------------------------------------
+
+class ExperimentCreate(BaseModel):
+    name: str = Field(..., min_length=1, max_length=255)
+    description: str | None = None
+    sample_data: dict | None = None
+    pipeline_stages: dict | None = None
+    scoring_config: dict | None = None
+    parameter_space: dict | None = None
+
+
+class ExperimentUpdate(BaseModel):
+    name: str | None = Field(None, min_length=1, max_length=255)
+    description: str | None = None
+    sample_data: dict | None = None
+    pipeline_stages: dict | None = None
+    scoring_config: dict | None = None
+    parameter_space: dict | None = None
+    status: ExperimentStatus | None = None
+
+
+class ExperimentResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    project_id: uuid.UUID
+    name: str
+    description: str | None
+    sample_data: dict | None
+    pipeline_stages: dict | None
+    scoring_config: dict | None
+    parameter_space: dict | None
+    status: ExperimentStatus
+    created_at: datetime
+    updated_at: datetime
+
+
+class ExperimentListResponse(BaseModel):
+    items: list[ExperimentResponse]
+    total: int
+
+
+# ---------------------------------------------------------------------------
+# Run
+# ---------------------------------------------------------------------------
+
+class RunResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    experiment_id: uuid.UUID
+    config_hash: str
+    config: dict
+    status: RunStatus
+    started_at: datetime | None
+    completed_at: datetime | None
+    duration_ms: int | None
+    tokens_in: int | None
+    tokens_out: int | None
+    cost_estimate: float | None
+
+
+class RunListResponse(BaseModel):
+    items: list[RunResponse]
+    total: int
+
+
+# ---------------------------------------------------------------------------
+# StageResult (read-only, returned inside Run details)
+# ---------------------------------------------------------------------------
+
+class StageResultResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    run_id: uuid.UUID
+    stage_index: int
+    prompt_sent: str
+    response_raw: str
+    model_used: str
+    parameters: dict | None
+    tokens_in: int | None
+    tokens_out: int | None
+    latency_ms: int | None
+
+
+class RunDetailResponse(RunResponse):
+    """Run with nested stage results and scores."""
+
+    stage_results: list[StageResultResponse] = []
+    scores: list["ScoreResponse"] = []
+
+
+# ---------------------------------------------------------------------------
+# Score
+# ---------------------------------------------------------------------------
+
+class ScoreInput(BaseModel):
+    scorer_name: str = Field(..., min_length=1, max_length=255)
+    value: float
+    metadata: dict | None = None
+
+
+class ScoreResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    run_id: uuid.UUID
+    scorer_name: str
+    value: float
+    scorer_metadata: dict | None
+    created_at: datetime
+
+
+# ---------------------------------------------------------------------------
+# Endpoint (LLM endpoint configuration)
+# ---------------------------------------------------------------------------
+
+class EndpointCreate(BaseModel):
+    name: str = Field(..., min_length=1, max_length=255)
+    url: str = Field(..., min_length=1, max_length=2048)
+    api_key: str | None = None
+    default_model: str | None = Field(None, max_length=255)
+
+
+class EndpointUpdate(BaseModel):
+    name: str | None = Field(None, min_length=1, max_length=255)
+    url: str | None = Field(None, min_length=1, max_length=2048)
+    api_key: str | None = None
+    default_model: str | None = Field(None, max_length=255)
+
+
+class EndpointResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    name: str
+    url: str
+    default_model: str | None
+
+
+class EndpointListResponse(BaseModel):
+    items: list[EndpointResponse]
+    total: int
+
+
+# ---------------------------------------------------------------------------
+# Webhook
+# ---------------------------------------------------------------------------
+
+class WebhookCreate(BaseModel):
+    event_type: str = Field(..., min_length=1, max_length=255)
+    url: str = Field(..., min_length=1, max_length=2048)
+    headers: dict | None = None
+    is_active: bool = True
+
+
+class WebhookUpdate(BaseModel):
+    event_type: str | None = Field(None, min_length=1, max_length=255)
+    url: str | None = Field(None, min_length=1, max_length=2048)
+    headers: dict | None = None
+    is_active: bool | None = None
+
+
+class WebhookResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    event_type: str
+    url: str
+    headers: dict | None
+    is_active: bool
+
+
+class WebhookListResponse(BaseModel):
+    items: list[WebhookResponse]
+    total: int
+
+
+# ---------------------------------------------------------------------------
+# Auth
+# ---------------------------------------------------------------------------
+
+class SetupRequest(BaseModel):
+    username: str = Field(..., min_length=1, max_length=255)
+    password: str = Field(..., min_length=8)
+
+
+class LoginRequest(BaseModel):
+    username: str
+    password: str
+
+
+class TokenResponse(BaseModel):
+    access_token: str
+    token_type: str = "bearer"
+
+
+class UserResponse(BaseModel):
+    model_config = ConfigDict(from_attributes=True)
+
+    id: uuid.UUID
+    username: str
+    is_admin: bool
+    created_at: datetime
+
+
+# ---------------------------------------------------------------------------
+# Export
+# ---------------------------------------------------------------------------
+
+class ExportRunRow(BaseModel):
+    """Flat row for CSV/JSON export of run results."""
+
+    run_id: uuid.UUID
+    experiment_id: uuid.UUID
+    config_hash: str
+    config: dict
+    status: RunStatus
+    duration_ms: int | None = None
+    tokens_in: int | None = None
+    tokens_out: int | None = None
+    cost_estimate: float | None = None
+    scores: dict[str, float] = Field(
+        default_factory=dict,
+        description="Map of scorer_name → value",
+    )
+
+
+class ExportResponse(BaseModel):
+    experiment_id: uuid.UUID
+    experiment_name: str
+    rows: list[ExportRunRow]
+
+
+# ---------------------------------------------------------------------------
+# Health
+# ---------------------------------------------------------------------------
+
+class HealthResponse(BaseModel):
+    status: str = "ok"
+    database: bool
+    redis: bool
+
+
+# Rebuild forward refs for RunDetailResponse
+RunDetailResponse.model_rebuild()
--- a/backend/tests/init.py
+++ b/backend/tests/init.py
--- a/backend/tests/test_alembic.py
+++ b/backend/tests/test_alembic.py
@ -0,0 +1,107 @@
+"""Tests for Alembic migration setup."""
+
+import os
+from pathlib import Path
+
+import pytest
+from alembic import command
+from alembic.config import Config
+from sqlalchemy import create_engine, inspect
+
+# Resolve the repo root regardless of where pytest is invoked from.
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+
+
+@pytest.fixture()
+def alembic_cfg(tmp_path):
+    """Create an Alembic config pointing at a temporary SQLite database."""
+    db_path = tmp_path / "test.db"
+    db_url = f"sqlite:///{db_path}"
+
+    cfg = Config(str(_REPO_ROOT / "alembic.ini"))
+    cfg.set_main_option("script_location", str(_REPO_ROOT / "alembic"))
+    cfg.set_main_option("sqlalchemy.url", db_url)
+    return cfg, db_url
+
+
+def test_upgrade_head_creates_all_tables(alembic_cfg):
+    """Running 'upgrade head' should create all expected tables."""
+    cfg, db_url = alembic_cfg
+    command.upgrade(cfg, "head")
+
+    engine = create_engine(db_url)
+    inspector = inspect(engine)
+    tables = set(inspector.get_table_names())
+
+    expected = {
+        "alembic_version",
+        "users",
+        "projects",
+        "experiments",
+        "runs",
+        "stage_results",
+        "scores",
+        "response_cache",
+        "webhook_configs",
+    }
+    assert expected == tables
+
+
+def test_downgrade_base_removes_all_tables(alembic_cfg):
+    """Running 'downgrade base' should remove all application tables."""
+    cfg, db_url = alembic_cfg
+    command.upgrade(cfg, "head")
+    command.downgrade(cfg, "base")
+
+    engine = create_engine(db_url)
+    inspector = inspect(engine)
+    tables = set(inspector.get_table_names())
+
+    # Only alembic_version should remain
+    assert tables == {"alembic_version"}
+
+
+def test_runs_table_has_expected_columns(alembic_cfg):
+    """Spot-check that the runs table has key columns."""
+    cfg, db_url = alembic_cfg
+    command.upgrade(cfg, "head")
+
+    engine = create_engine(db_url)
+    inspector = inspect(engine)
+    columns = {c["name"] for c in inspector.get_columns("runs")}
+
+    assert "id" in columns
+    assert "experiment_id" in columns
+    assert "config_hash" in columns
+    assert "status" in columns
+    assert "cost_estimate" in columns
+
+
+def test_indexes_created(alembic_cfg):
+    """Verify key indexes exist after migration."""
+    cfg, db_url = alembic_cfg
+    command.upgrade(cfg, "head")
+
+    engine = create_engine(db_url)
+    inspector = inspect(engine)
+
+    run_indexes = {idx["name"] for idx in inspector.get_indexes("runs")}
+    assert "ix_runs_config_hash" in run_indexes
+    assert "ix_runs_experiment_id" in run_indexes
+
+    score_indexes = {idx["name"] for idx in inspector.get_indexes("scores")}
+    assert "ix_scores_run_id" in score_indexes
+    assert "ix_scores_scorer_name" in score_indexes
+
+
+def test_foreign_keys_on_experiments(alembic_cfg):
+    """Verify experiments table has FK to projects."""
+    cfg, db_url = alembic_cfg
+    command.upgrade(cfg, "head")
+
+    engine = create_engine(db_url)
+    inspector = inspect(engine)
+    fks = inspector.get_foreign_keys("experiments")
+
+    referred_tables = {fk["referred_table"] for fk in fks}
+    assert "projects" in referred_tables
--- a/backend/tests/test_auth.py
+++ b/backend/tests/test_auth.py
@ -0,0 +1,238 @@
+"""Tests for backend/auth.py — JWT, API key, setup flow, and auth dependency."""
+
+import os
+from datetime import timedelta
+from unittest.mock import patch
+
+import pytest
+from fastapi import FastAPI, Depends
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture(autouse=True)
+def _isolate_settings(tmp_path):
+    """Ensure tests use a temp SQLite DB and no Redis."""
+    env = {
+        "DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
+        "REDIS_URL": "",
+        "DATA_DIR": str(tmp_path),
+        "JWT_SECRET": "test-secret-key-for-jwt-signing",
+        "API_KEY": "test-api-key-12345",
+    }
+    with patch.dict(os.environ, env, clear=False):
+        import config
+        new_settings = config.Settings(_env_file=None)
+        config.settings = new_settings
+
+        import main
+        main.settings = new_settings
+        main._init_db()
+        main._init_redis()
+
+        from models import Base
+        Base.metadata.create_all(bind=main.engine)
+
+        # Also patch auth module's settings reference
+        import auth
+        auth.settings = new_settings
+
+        yield
+
+
+@pytest.fixture
+def db_session():
+    from main import get_db
+    gen = get_db()
+    session = next(gen)
+    yield session
+    try:
+        next(gen)
+    except StopIteration:
+        pass
+
+
+# ---------------------------------------------------------------------------
+# Password hashing
+# ---------------------------------------------------------------------------
+
+class TestPasswordHashing:
+    def test_hash_and_verify(self):
+        from auth import hash_password, verify_password
+        hashed = hash_password("my-secret-password")
+        assert hashed != "my-secret-password"
+        assert verify_password("my-secret-password", hashed)
+
+    def test_wrong_password_fails(self):
+        from auth import hash_password, verify_password
+        hashed = hash_password("correct-password")
+        assert not verify_password("wrong-password", hashed)
+
+
+# ---------------------------------------------------------------------------
+# JWT
+# ---------------------------------------------------------------------------
+
+class TestJWT:
+    def test_create_and_decode_token(self):
+        from auth import create_access_token, decode_access_token
+        token = create_access_token("user-123")
+        assert decode_access_token(token) == "user-123"
+
+    def test_expired_token_raises(self):
+        from auth import create_access_token, decode_access_token
+        token = create_access_token("user-123", expires_delta=timedelta(seconds=-1))
+        with pytest.raises(Exception) as exc_info:
+            decode_access_token(token)
+        assert exc_info.value.status_code == 401
+
+    def test_invalid_token_raises(self):
+        from auth import decode_access_token
+        with pytest.raises(Exception) as exc_info:
+            decode_access_token("not-a-valid-token")
+        assert exc_info.value.status_code == 401
+
+    def test_token_without_sub_raises(self):
+        from jose import jwt
+        import config
+        token = jwt.encode({"foo": "bar"}, config.settings.jwt_secret, algorithm="HS256")
+        from auth import decode_access_token
+        with pytest.raises(Exception) as exc_info:
+            decode_access_token(token)
+        assert exc_info.value.status_code == 401
+
+
+# ---------------------------------------------------------------------------
+# First-boot setup
+# ---------------------------------------------------------------------------
+
+class TestSetup:
+    def test_needs_setup_true_when_no_users(self, db_session):
+        from auth import needs_setup
+        assert needs_setup(db_session) is True
+
+    def test_create_admin_succeeds(self, db_session):
+        from auth import create_admin, needs_setup
+        user = create_admin(db_session, "admin", "password123")
+        assert user.username == "admin"
+        assert user.is_admin is True
+        assert needs_setup(db_session) is False
+
+    def test_create_admin_twice_raises_409(self, db_session):
+        from auth import create_admin
+        create_admin(db_session, "admin", "password123")
+        with pytest.raises(Exception) as exc_info:
+            create_admin(db_session, "admin2", "password456")
+        assert exc_info.value.status_code == 409
+
+    def test_admin_password_is_hashed(self, db_session):
+        from auth import create_admin
+        user = create_admin(db_session, "admin", "password123")
+        assert user.password_hash != "password123"
+        assert user.password_hash.startswith("$2b$")
+
+
+# ---------------------------------------------------------------------------
+# Authenticate user (login)
+# ---------------------------------------------------------------------------
+
+class TestAuthenticateUser:
+    def test_valid_credentials(self, db_session):
+        from auth import create_admin, authenticate_user
+        create_admin(db_session, "admin", "password123")
+        user = authenticate_user(db_session, "admin", "password123")
+        assert user.username == "admin"
+
+    def test_wrong_password_raises_401(self, db_session):
+        from auth import create_admin, authenticate_user
+        create_admin(db_session, "admin", "password123")
+        with pytest.raises(Exception) as exc_info:
+            authenticate_user(db_session, "admin", "wrong")
+        assert exc_info.value.status_code == 401
+
+    def test_unknown_user_raises_401(self, db_session):
+        from auth import authenticate_user
+        with pytest.raises(Exception) as exc_info:
+            authenticate_user(db_session, "nonexistent", "password")
+        assert exc_info.value.status_code == 401
+
+
+# ---------------------------------------------------------------------------
+# get_current_user dependency (integration via test app)
+# ---------------------------------------------------------------------------
+
+@pytest.fixture
+def auth_app():
+    """Create a minimal FastAPI app with a protected endpoint for testing auth."""
+    from auth import get_current_user
+    from schemas import UserResponse
+
+    test_app = FastAPI()
+
+    @test_app.get("/protected")
+    def protected(user=Depends(get_current_user)):
+        return {"user_id": str(user.id), "username": user.username}
+
+    return test_app
+
+
+@pytest.fixture
+def auth_client(auth_app):
+    return TestClient(auth_app)
+
+
+class TestGetCurrentUser:
+    def test_no_auth_returns_401(self, auth_client):
+        resp = auth_client.get("/protected")
+        assert resp.status_code == 401
+        assert "Missing authentication" in resp.json()["detail"]
+
+    def test_invalid_bearer_format_returns_401(self, auth_client):
+        resp = auth_client.get("/protected", headers={"Authorization": "NotBearer token"})
+        assert resp.status_code == 401
+
+    def test_jwt_auth_succeeds(self, auth_client, db_session):
+        from auth import create_admin, create_access_token
+        user = create_admin(db_session, "admin", "password123")
+        token = create_access_token(str(user.id))
+        resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
+        assert resp.status_code == 200
+        assert resp.json()["username"] == "admin"
+
+    def test_jwt_for_deleted_user_returns_401(self, auth_client, db_session):
+        from auth import create_access_token
+        import uuid
+        token = create_access_token(str(uuid.uuid4()))
+        resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
+        assert resp.status_code == 401
+
+    def test_api_key_auth_succeeds(self, auth_client, db_session):
+        from auth import create_admin
+        create_admin(db_session, "admin", "password123")
+        resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
+        assert resp.status_code == 200
+        assert resp.json()["username"] == "admin"
+
+    def test_wrong_api_key_returns_401(self, auth_client):
+        resp = auth_client.get("/protected", headers={"X-Api-Key": "wrong-key"})
+        assert resp.status_code == 401
+
+    def test_api_key_without_admin_returns_401(self, auth_client):
+        # No admin user created yet
+        resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
+        assert resp.status_code == 401
+
+    def test_api_key_disabled_when_not_configured(self, auth_client, db_session):
+        """When API_KEY is not set in config, API key auth should fail."""
+        from auth import create_admin
+        import config, auth
+        create_admin(db_session, "admin", "password123")
+
+        old_key = config.settings.api_key
+        config.settings.api_key = None
+        auth.settings = config.settings
+        try:
+            resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
+            assert resp.status_code == 401
+        finally:
+            config.settings.api_key = old_key
+            auth.settings = config.settings
--- a/backend/tests/test_config.py
+++ b/backend/tests/test_config.py
@ -0,0 +1,105 @@
+"""Tests for backend/config.py."""
+
+import os
+from unittest.mock import patch
+
+import pytest
+from pydantic_settings import BaseSettings
+
+from config import Settings
+
+
+class TestSettings:
+    """Test the Settings configuration class."""
+
+    def _make_settings(self, **env_vars: str) -> Settings:
+        """Create a Settings instance with specific env vars, ignoring .env file."""
+        with patch.dict(os.environ, env_vars, clear=False):
+            return Settings(_env_file=None)
+
+    def test_defaults(self) -> None:
+        s = self._make_settings()
+        assert s.database_url is None
+        assert s.redis_url is None
+        assert s.host == "0.0.0.0"
+        assert s.port == 8400
+        assert s.api_key is None
+        assert s.default_endpoint_url is None
+        assert s.default_endpoint_key is None
+        assert s.max_concurrent_runs == 4
+        assert s.max_tokens_per_sweep == 0
+        assert s.data_dir == "/data"
+        assert s.mcp_enabled is True
+        assert s.mcp_port == 8401
+
+    def test_jwt_secret_auto_generated(self) -> None:
+        s = self._make_settings()
+        assert len(s.jwt_secret) > 0
+
+    def test_jwt_secret_auto_generated_unique(self) -> None:
+        s1 = self._make_settings()
+        s2 = self._make_settings()
+        assert s1.jwt_secret != s2.jwt_secret
+
+    def test_jwt_secret_from_env(self) -> None:
+        s = self._make_settings(JWT_SECRET="my-secret-key")
+        assert s.jwt_secret == "my-secret-key"
+
+    def test_sqlite_fallback_when_no_database_url(self) -> None:
+        s = self._make_settings(DATA_DIR="/tmp/test")
+        url = s.effective_database_url
+        assert url.startswith("sqlite:///")
+        assert url.endswith("promptlooper.db")
+        assert "tmp" in url and "test" in url
+        assert s.is_sqlite is True
+
+    def test_postgres_when_database_url_set(self) -> None:
+        url = "postgresql://user:pass@localhost:5432/promptlooper"
+        s = self._make_settings(DATABASE_URL=url)
+        assert s.effective_database_url == url
+        assert s.is_sqlite is False
+
+    def test_in_process_queue_when_no_redis(self) -> None:
+        s = self._make_settings()
+        assert s.use_in_process_queue is True
+
+    def test_celery_queue_when_redis_set(self) -> None:
+        s = self._make_settings(REDIS_URL="redis://localhost:6379/0")
+        assert s.use_in_process_queue is False
+        assert s.redis_url == "redis://localhost:6379/0"
+
+    def test_empty_api_key_becomes_none(self) -> None:
+        s = self._make_settings(API_KEY="")
+        assert s.api_key is None
+
+    def test_whitespace_api_key_becomes_none(self) -> None:
+        s = self._make_settings(API_KEY="   ")
+        assert s.api_key is None
+
+    def test_valid_api_key_preserved(self) -> None:
+        s = self._make_settings(API_KEY="sk-test-123")
+        assert s.api_key == "sk-test-123"
+
+    def test_env_overrides(self) -> None:
+        s = self._make_settings(
+            HOST="127.0.0.1",
+            PORT="9000",
+            MAX_CONCURRENT_RUNS="8",
+            MAX_TOKENS_PER_SWEEP="100000",
+            MCP_ENABLED="false",
+            MCP_PORT="9001",
+        )
+        assert s.host == "127.0.0.1"
+        assert s.port == 9000
+        assert s.max_concurrent_runs == 8
+        assert s.max_tokens_per_sweep == 100000
+        assert s.mcp_enabled is False
+        assert s.mcp_port == 9001
+
+    def test_default_endpoint_config(self) -> None:
+        s = self._make_settings(
+            DEFAULT_ENDPOINT_URL="http://localhost:11434/v1",
+            DEFAULT_ENDPOINT_KEY="sk-key",
+        )
+        assert s.default_endpoint_url == "http://localhost:11434/v1"
+        assert s.default_endpoint_key == "sk-key"
--- a/backend/tests/test_main.py
+++ b/backend/tests/test_main.py
@ -0,0 +1,129 @@
+"""Tests for backend/main.py — FastAPI application."""
+
+import os
+from unittest.mock import patch
+
+import pytest
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture(autouse=True)
+def _isolate_settings(tmp_path):
+    """Ensure tests use a temp SQLite DB and no Redis."""
+    env = {
+        "DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
+        "REDIS_URL": "",
+        "DATA_DIR": str(tmp_path),
+    }
+    with patch.dict(os.environ, env, clear=False):
+        # Reload settings so it picks up test env
+        import config
+        new_settings = config.Settings(_env_file=None)
+        config.settings = new_settings
+
+        # Patch main's reference too
+        import main
+        main.settings = new_settings
+        main._init_db()
+        main._init_redis()
+
+        # Create tables
+        from models import Base
+        Base.metadata.create_all(bind=main.engine)
+
+        yield
+
+
+@pytest.fixture
+def client():
+    from main import app
+    return TestClient(app)
+
+
+class TestHealthEndpoint:
+    def test_health_returns_ok(self, client):
+        resp = client.get("/health")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["status"] == "ok"
+        assert data["database"] is True
+        assert data["redis"] is True  # in-process mode counts as ok
+
+    def test_health_response_schema(self, client):
+        resp = client.get("/health")
+        data = resp.json()
+        assert set(data.keys()) == {"status", "database", "redis"}
+
+
+class TestCORSMiddleware:
+    def test_cors_headers_present(self, client):
+        resp = client.options(
+            "/health",
+            headers={
+                "Origin": "http://localhost:3000",
+                "Access-Control-Request-Method": "GET",
+            },
+        )
+        assert "access-control-allow-origin" in resp.headers
+
+
+class TestWebSocket:
+    def test_websocket_connect_and_echo(self, client):
+        with client.websocket_connect("/ws") as ws:
+            ws.send_json({"type": "ping"})
+            data = ws.receive_json()
+            assert data["type"] == "ack"
+            assert data["data"]["type"] == "ping"
+
+    def test_websocket_disconnect_cleanup(self, client):
+        from main import ws_manager
+        initial_count = len(ws_manager.active_connections)
+        with client.websocket_connect("/ws") as ws:
+            assert len(ws_manager.active_connections) == initial_count + 1
+        # After disconnect, connection should be removed
+        assert len(ws_manager.active_connections) == initial_count
+
+
+class TestRouterMounting:
+    def test_openapi_schema_loads(self, client):
+        resp = client.get("/openapi.json")
+        assert resp.status_code == 200
+        schema = resp.json()
+        assert schema["info"]["title"] == "PromptLooper"
+
+    def test_unknown_route_returns_404(self, client):
+        resp = client.get("/api/nonexistent")
+        assert resp.status_code == 404
+
+
+class TestConnectionManager:
+    def test_broadcast_removes_dead_connections(self):
+        """ConnectionManager.broadcast skips and removes broken connections."""
+        from main import ConnectionManager
+        manager = ConnectionManager()
+        # No connections — broadcast should not raise
+        import asyncio
+        asyncio.get_event_loop().run_until_complete(
+            manager.broadcast({"test": True})
+        )
+        assert len(manager.active_connections) == 0
+
+
+class TestGetDb:
+    def test_get_db_yields_session(self):
+        from main import get_db
+        gen = get_db()
+        session = next(gen)
+        assert session is not None
+        # Clean up
+        try:
+            next(gen)
+        except StopIteration:
+            pass
+
+
+class TestGetRedis:
+    def test_get_redis_returns_none_in_process_mode(self):
+        from main import get_redis
+        # In test setup, Redis is not configured
+        assert get_redis() is None
--- a/backend/tests/test_models.py
+++ b/backend/tests/test_models.py
@ -0,0 +1,359 @@
+"""Tests for SQLAlchemy ORM models."""
+
+import uuid
+from datetime import datetime, timezone
+
+from sqlalchemy import create_engine, inspect
+from sqlalchemy.orm import Session
+
+from models import (
+    Base,
+    Experiment,
+    ExperimentStatus,
+    Project,
+    ResponseCache,
+    Run,
+    RunStatus,
+    Score,
+    StageResult,
+    User,
+    WebhookConfig,
+)
+
+
+def _engine():
+    engine = create_engine("sqlite:///:memory:")
+    Base.metadata.create_all(engine)
+    return engine
+
+
+def _session(engine):
+    return Session(engine)
+
+
+# ---------------------------------------------------------------------------
+# Table existence
+# ---------------------------------------------------------------------------
+
+
+def test_all_tables_created():
+    engine = _engine()
+    table_names = inspect(engine).get_table_names()
+    expected = {
+        "users",
+        "projects",
+        "experiments",
+        "runs",
+        "stage_results",
+        "scores",
+        "response_cache",
+        "webhook_configs",
+    }
+    assert expected.issubset(set(table_names))
+
+
+# ---------------------------------------------------------------------------
+# User
+# ---------------------------------------------------------------------------
+
+
+def test_user_creation():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="admin", password_hash="hashed", is_admin=True)
+        session.add(user)
+        session.commit()
+
+        assert isinstance(user.id, uuid.UUID)
+        assert user.username == "admin"
+        assert user.is_admin is True
+        assert isinstance(user.created_at, datetime)
+
+
+def test_user_username_unique():
+    engine = _engine()
+    with _session(engine) as session:
+        session.add(User(username="dup", password_hash="h1"))
+        session.commit()
+        session.add(User(username="dup", password_hash="h2"))
+        try:
+            session.commit()
+            assert False, "Should have raised IntegrityError"
+        except Exception:
+            session.rollback()
+
+
+# ---------------------------------------------------------------------------
+# Project
+# ---------------------------------------------------------------------------
+
+
+def test_project_with_owner():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="owner", password_hash="h")
+        project = Project(name="Test Project", description="A test", owner=user)
+        session.add(project)
+        session.commit()
+
+        assert project.owner_id == user.id
+        assert project.name == "Test Project"
+        assert isinstance(project.updated_at, datetime)
+
+
+def test_project_cascade_delete_from_user():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="owner", password_hash="h")
+        project = Project(name="P1", owner=user)
+        session.add(project)
+        session.commit()
+        project_id = project.id
+
+        session.delete(user)
+        session.commit()
+
+        assert session.get(Project, project_id) is None
+
+
+# ---------------------------------------------------------------------------
+# Experiment
+# ---------------------------------------------------------------------------
+
+
+def test_experiment_defaults():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(
+            project=project,
+            name="Exp1",
+            sample_data={"inputs": ["hello"]},
+            pipeline_stages=[{"prompt": "test"}],
+            scoring_config={"scorers": ["keyword"]},
+            parameter_space={"temperature": [0.1, 0.5]},
+        )
+        session.add(exp)
+        session.commit()
+
+        assert exp.status == ExperimentStatus.draft
+        assert exp.sample_data == {"inputs": ["hello"]}
+        assert isinstance(exp.created_at, datetime)
+
+
+def test_experiment_cascade_delete_from_project():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        session.add(exp)
+        session.commit()
+        exp_id = exp.id
+
+        session.delete(project)
+        session.commit()
+
+        assert session.get(Experiment, exp_id) is None
+
+
+# ---------------------------------------------------------------------------
+# Run
+# ---------------------------------------------------------------------------
+
+
+def test_run_creation():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        run = Run(
+            experiment=exp,
+            config_hash="a" * 64,
+            config={"model": "gpt-4", "temperature": 0.5},
+            status=RunStatus.completed,
+            duration_ms=1200,
+            tokens_in=100,
+            tokens_out=50,
+        )
+        session.add(run)
+        session.commit()
+
+        assert run.status == RunStatus.completed
+        assert run.config["model"] == "gpt-4"
+
+
+def test_run_default_status():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        run = Run(experiment=exp, config_hash="b" * 64, config={})
+        session.add(run)
+        session.commit()
+
+        assert run.status == RunStatus.pending
+
+
+# ---------------------------------------------------------------------------
+# StageResult
+# ---------------------------------------------------------------------------
+
+
+def test_stage_result():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        run = Run(experiment=exp, config_hash="c" * 64, config={})
+        sr = StageResult(
+            run=run,
+            stage_index=0,
+            prompt_sent="Hello",
+            response_raw="World",
+            model_used="gpt-4",
+            parameters={"temperature": 0.5},
+            tokens_in=10,
+            tokens_out=5,
+            latency_ms=200,
+        )
+        session.add(sr)
+        session.commit()
+
+        assert sr.stage_index == 0
+        assert sr.model_used == "gpt-4"
+        assert len(run.stage_results) == 1
+
+
+# ---------------------------------------------------------------------------
+# Score
+# ---------------------------------------------------------------------------
+
+
+def test_score():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        run = Run(experiment=exp, config_hash="d" * 64, config={})
+        score = Score(
+            run=run,
+            scorer_name="embedding_similarity",
+            value=0.87,
+            scorer_metadata={"reference_id": "ref1"},
+        )
+        session.add(score)
+        session.commit()
+
+        assert score.value == 0.87
+        assert score.scorer_name == "embedding_similarity"
+        assert len(run.scores) == 1
+
+
+# ---------------------------------------------------------------------------
+# ResponseCache
+# ---------------------------------------------------------------------------
+
+
+def test_response_cache():
+    engine = _engine()
+    with _session(engine) as session:
+        cache = ResponseCache(
+            config_hash="e" * 64,
+            response="cached response",
+            model="gpt-4",
+            tokens_in=50,
+            tokens_out=25,
+            latency_ms=300,
+        )
+        session.add(cache)
+        session.commit()
+
+        fetched = session.get(ResponseCache, "e" * 64)
+        assert fetched is not None
+        assert fetched.response == "cached response"
+
+
+def test_response_cache_pk_is_config_hash():
+    engine = _engine()
+    with _session(engine) as session:
+        session.add(
+            ResponseCache(config_hash="f" * 64, response="r1", model="m1")
+        )
+        session.commit()
+        session.add(
+            ResponseCache(config_hash="f" * 64, response="r2", model="m2")
+        )
+        try:
+            session.commit()
+            assert False, "Should have raised IntegrityError"
+        except Exception:
+            session.rollback()
+
+
+# ---------------------------------------------------------------------------
+# WebhookConfig
+# ---------------------------------------------------------------------------
+
+
+def test_webhook_config():
+    engine = _engine()
+    with _session(engine) as session:
+        wh = WebhookConfig(
+            event_type="experiment.completed",
+            url="https://example.com/hook",
+            headers={"Authorization": "Bearer token"},
+            is_active=True,
+        )
+        session.add(wh)
+        session.commit()
+
+        assert isinstance(wh.id, uuid.UUID)
+        assert wh.event_type == "experiment.completed"
+        assert wh.is_active is True
+
+
+def test_webhook_config_default_active():
+    engine = _engine()
+    with _session(engine) as session:
+        wh = WebhookConfig(
+            event_type="run.failed",
+            url="https://example.com/hook",
+        )
+        session.add(wh)
+        session.commit()
+
+        assert wh.is_active is True
+
+
+# ---------------------------------------------------------------------------
+# Relationship cascades: Run → StageResult + Score
+# ---------------------------------------------------------------------------
+
+
+def test_run_cascade_deletes_children():
+    engine = _engine()
+    with _session(engine) as session:
+        user = User(username="u", password_hash="h")
+        project = Project(name="P", owner=user)
+        exp = Experiment(project=project, name="E")
+        run = Run(experiment=exp, config_hash="g" * 64, config={})
+        sr = StageResult(
+            run=run, stage_index=0, prompt_sent="p",
+            response_raw="r", model_used="m",
+        )
+        score = Score(run=run, scorer_name="test", value=0.5)
+        session.add_all([run, sr, score])
+        session.commit()
+
+        sr_id, score_id = sr.id, score.id
+        session.delete(run)
+        session.commit()
+
+        assert session.get(StageResult, sr_id) is None
+        assert session.get(Score, score_id) is None
--- a/backend/tests/test_routers.py
+++ b/backend/tests/test_routers.py
@ -0,0 +1,224 @@
+"""Tests for router stubs — verify all routes are mounted and return 501."""
+
+import pytest
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture()
+def client(tmp_path, monkeypatch):
+    """Create a test client with a temporary database."""
+    monkeypatch.setenv("DATA_DIR", str(tmp_path))
+    monkeypatch.setenv("DATABASE_URL", "")
+    monkeypatch.setenv("REDIS_URL", "")
+
+    # Reload config to pick up test env
+    import importlib
+    import config as config_mod
+    importlib.reload(config_mod)
+
+    import main as main_mod
+    importlib.reload(main_mod)
+
+    with TestClient(main_mod.app) as c:
+        yield c
+
+
+# ---- Auth router (/api/auth) ----
+
+def test_auth_setup(client):
+    resp = client.post("/api/auth/setup")
+    assert resp.status_code == 501
+
+
+def test_auth_login(client):
+    resp = client.post("/api/auth/login")
+    assert resp.status_code == 501
+
+
+def test_auth_me(client):
+    resp = client.get("/api/auth/me")
+    assert resp.status_code == 501
+
+
+# ---- Projects router (/api/projects) ----
+
+def test_projects_list(client):
+    resp = client.get("/api/projects/")
+    assert resp.status_code == 501
+
+
+def test_projects_create(client):
+    resp = client.post("/api/projects/")
+    assert resp.status_code == 501
+
+
+def test_projects_get(client):
+    resp = client.get("/api/projects/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_projects_update(client):
+    resp = client.put("/api/projects/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_projects_delete(client):
+    resp = client.delete("/api/projects/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+# ---- Experiments router (/api/experiments) ----
+
+def test_experiments_list(client):
+    resp = client.get("/api/experiments/")
+    assert resp.status_code == 501
+
+
+def test_experiments_create(client):
+    resp = client.post("/api/experiments/")
+    assert resp.status_code == 501
+
+
+def test_experiments_get(client):
+    resp = client.get("/api/experiments/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_experiments_update(client):
+    resp = client.put("/api/experiments/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_experiments_delete(client):
+    resp = client.delete("/api/experiments/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_experiments_sweep(client):
+    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/sweep")
+    assert resp.status_code == 501
+
+
+def test_experiments_pause(client):
+    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/pause")
+    assert resp.status_code == 501
+
+
+def test_experiments_resume(client):
+    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/resume")
+    assert resp.status_code == 501
+
+
+def test_experiments_stop(client):
+    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/stop")
+    assert resp.status_code == 501
+
+
+# ---- Runs router (/api/runs) ----
+
+def test_runs_list(client):
+    resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/runs")
+    assert resp.status_code == 501
+
+
+def test_runs_get(client):
+    resp = client.get("/api/runs/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_runs_create(client):
+    resp = client.post("/api/runs/")
+    assert resp.status_code == 501
+
+
+def test_runs_score(client):
+    resp = client.post("/api/runs/00000000-0000-0000-0000-000000000001/score")
+    assert resp.status_code == 501
+
+
+def test_runs_leaderboard(client):
+    resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/leaderboard")
+    assert resp.status_code == 501
+
+
+# ---- Endpoints router (/api/endpoints) ----
+
+def test_endpoints_list(client):
+    resp = client.get("/api/endpoints/")
+    assert resp.status_code == 501
+
+
+def test_endpoints_create(client):
+    resp = client.post("/api/endpoints/")
+    assert resp.status_code == 501
+
+
+def test_endpoints_update(client):
+    resp = client.put("/api/endpoints/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_endpoints_delete(client):
+    resp = client.delete("/api/endpoints/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+def test_endpoints_test(client):
+    resp = client.post("/api/endpoints/00000000-0000-0000-0000-000000000001/test")
+    assert resp.status_code == 501
+
+
+# ---- Export router (/api/export) ----
+
+def test_export_best(client):
+    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/best")
+    assert resp.status_code == 501
+
+
+def test_export_env(client):
+    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/env")
+    assert resp.status_code == 501
+
+
+def test_export_yaml(client):
+    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/yaml")
+    assert resp.status_code == 501
+
+
+def test_export_report(client):
+    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/report")
+    assert resp.status_code == 501
+
+
+# ---- Webhooks router (/api/webhooks) ----
+
+def test_webhooks_list(client):
+    resp = client.get("/api/webhooks/")
+    assert resp.status_code == 501
+
+
+def test_webhooks_create(client):
+    resp = client.post("/api/webhooks/")
+    assert resp.status_code == 501
+
+
+def test_webhooks_delete(client):
+    resp = client.delete("/api/webhooks/00000000-0000-0000-0000-000000000001")
+    assert resp.status_code == 501
+
+
+# ---- Admin router (/api/admin) ----
+
+def test_admin_get_settings(client):
+    resp = client.get("/api/admin/settings")
+    assert resp.status_code == 501
+
+
+def test_admin_update_settings(client):
+    resp = client.put("/api/admin/settings")
+    assert resp.status_code == 501
+
+
+def test_admin_stats(client):
+    resp = client.get("/api/admin/stats")
+    assert resp.status_code == 501
--- a/backend/tests/test_schemas.py
+++ b/backend/tests/test_schemas.py
@ -0,0 +1,339 @@
+"""Tests for backend/schemas.py."""
+
+import uuid
+from datetime import datetime, timezone
+
+import pytest
+from pydantic import ValidationError
+
+from models import ExperimentStatus, RunStatus
+from schemas import (
+    EndpointCreate,
+    EndpointResponse,
+    EndpointUpdate,
+    ExperimentCreate,
+    ExperimentResponse,
+    ExperimentUpdate,
+    ExportResponse,
+    ExportRunRow,
+    HealthResponse,
+    LoginRequest,
+    ProjectCreate,
+    ProjectResponse,
+    ProjectUpdate,
+    RunDetailResponse,
+    RunResponse,
+    ScoreInput,
+    ScoreResponse,
+    SetupRequest,
+    StageResultResponse,
+    TokenResponse,
+    UserResponse,
+    WebhookCreate,
+    WebhookResponse,
+    WebhookUpdate,
+)
+
+
+NOW = datetime.now(timezone.utc)
+UUID1 = uuid.uuid4()
+UUID2 = uuid.uuid4()
+
+
+# ---------------------------------------------------------------------------
+# Project schemas
+# ---------------------------------------------------------------------------
+
+
+class TestProjectSchemas:
+    def test_create_valid(self) -> None:
+        p = ProjectCreate(name="My Project", description="desc")
+        assert p.name == "My Project"
+        assert p.description == "desc"
+
+    def test_create_name_required(self) -> None:
+        with pytest.raises(ValidationError):
+            ProjectCreate()  # type: ignore[call-arg]
+
+    def test_create_empty_name_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            ProjectCreate(name="")
+
+    def test_update_partial(self) -> None:
+        p = ProjectUpdate(name="New Name")
+        assert p.name == "New Name"
+        assert p.description is None
+
+    def test_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            name = "Proj"
+            description = None
+            owner_id = UUID2
+            created_at = NOW
+            updated_at = NOW
+
+        r = ProjectResponse.model_validate(Fake())
+        assert r.id == UUID1
+        assert r.name == "Proj"
+
+
+# ---------------------------------------------------------------------------
+# Experiment schemas
+# ---------------------------------------------------------------------------
+
+
+class TestExperimentSchemas:
+    def test_create_minimal(self) -> None:
+        e = ExperimentCreate(name="Exp 1")
+        assert e.name == "Exp 1"
+        assert e.sample_data is None
+
+    def test_create_with_all_fields(self) -> None:
+        e = ExperimentCreate(
+            name="Full",
+            description="desc",
+            sample_data={"key": "value"},
+            pipeline_stages={"stages": []},
+            scoring_config={"scorer": "exact"},
+            parameter_space={"temp": [0.5, 1.0]},
+        )
+        assert e.parameter_space == {"temp": [0.5, 1.0]}
+
+    def test_update_status(self) -> None:
+        e = ExperimentUpdate(status=ExperimentStatus.running)
+        assert e.status == ExperimentStatus.running
+
+    def test_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            project_id = UUID2
+            name = "Exp"
+            description = None
+            sample_data = None
+            pipeline_stages = None
+            scoring_config = None
+            parameter_space = None
+            status = ExperimentStatus.draft
+            created_at = NOW
+            updated_at = NOW
+
+        r = ExperimentResponse.model_validate(Fake())
+        assert r.status == ExperimentStatus.draft
+
+
+# ---------------------------------------------------------------------------
+# Run schemas
+# ---------------------------------------------------------------------------
+
+
+class TestRunSchemas:
+    def test_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            experiment_id = UUID2
+            config_hash = "abc123"
+            config = {"model": "gpt-4"}
+            status = RunStatus.completed
+            started_at = NOW
+            completed_at = NOW
+            duration_ms = 1234
+            tokens_in = 100
+            tokens_out = 200
+            cost_estimate = 0.003
+
+        r = RunResponse.model_validate(Fake())
+        assert r.duration_ms == 1234
+        assert r.cost_estimate == 0.003
+
+    def test_detail_response_nested(self) -> None:
+        data = {
+            "id": UUID1,
+            "experiment_id": UUID2,
+            "config_hash": "abc",
+            "config": {},
+            "status": RunStatus.pending,
+            "started_at": None,
+            "completed_at": None,
+            "duration_ms": None,
+            "tokens_in": None,
+            "tokens_out": None,
+            "cost_estimate": None,
+            "stage_results": [],
+            "scores": [],
+        }
+        r = RunDetailResponse(**data)
+        assert r.stage_results == []
+        assert r.scores == []
+
+
+# ---------------------------------------------------------------------------
+# Score schemas
+# ---------------------------------------------------------------------------
+
+
+class TestScoreSchemas:
+    def test_input_valid(self) -> None:
+        s = ScoreInput(scorer_name="exact_match", value=0.95, metadata={"note": "ok"})
+        assert s.value == 0.95
+        assert s.metadata == {"note": "ok"}
+
+    def test_input_missing_name(self) -> None:
+        with pytest.raises(ValidationError):
+            ScoreInput(value=0.5)  # type: ignore[call-arg]
+
+    def test_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            run_id = UUID2
+            scorer_name = "bleu"
+            value = 0.8
+            scorer_metadata = {"n": 4}
+            created_at = NOW
+
+        r = ScoreResponse.model_validate(Fake())
+        assert r.scorer_metadata == {"n": 4}
+
+
+# ---------------------------------------------------------------------------
+# Endpoint schemas
+# ---------------------------------------------------------------------------
+
+
+class TestEndpointSchemas:
+    def test_create_valid(self) -> None:
+        e = EndpointCreate(name="OpenAI", url="https://api.openai.com/v1")
+        assert e.api_key is None
+
+    def test_create_empty_name_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            EndpointCreate(name="", url="https://example.com")
+
+    def test_update_partial(self) -> None:
+        e = EndpointUpdate(url="https://new-url.com")
+        assert e.name is None
+
+
+# ---------------------------------------------------------------------------
+# Webhook schemas
+# ---------------------------------------------------------------------------
+
+
+class TestWebhookSchemas:
+    def test_create_valid(self) -> None:
+        w = WebhookCreate(
+            event_type="run.completed",
+            url="https://hooks.example.com/promptlooper",
+            headers={"Authorization": "Bearer xyz"},
+        )
+        assert w.is_active is True
+
+    def test_create_inactive(self) -> None:
+        w = WebhookCreate(
+            event_type="run.failed",
+            url="https://example.com",
+            is_active=False,
+        )
+        assert w.is_active is False
+
+    def test_update_partial(self) -> None:
+        w = WebhookUpdate(is_active=False)
+        assert w.event_type is None
+        assert w.is_active is False
+
+    def test_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            event_type = "run.completed"
+            url = "https://example.com"
+            headers = None
+            is_active = True
+
+        r = WebhookResponse.model_validate(Fake())
+        assert r.event_type == "run.completed"
+
+
+# ---------------------------------------------------------------------------
+# Auth schemas
+# ---------------------------------------------------------------------------
+
+
+class TestAuthSchemas:
+    def test_setup_password_min_length(self) -> None:
+        with pytest.raises(ValidationError):
+            SetupRequest(username="admin", password="short")
+
+    def test_setup_valid(self) -> None:
+        s = SetupRequest(username="admin", password="securepass123")
+        assert s.username == "admin"
+
+    def test_login_valid(self) -> None:
+        l = LoginRequest(username="user", password="pass")
+        assert l.username == "user"
+
+    def test_token_response(self) -> None:
+        t = TokenResponse(access_token="jwt.token.here")
+        assert t.token_type == "bearer"
+
+    def test_user_response_from_attributes(self) -> None:
+        class Fake:
+            id = UUID1
+            username = "admin"
+            is_admin = True
+            created_at = NOW
+
+        r = UserResponse.model_validate(Fake())
+        assert r.is_admin is True
+
+
+# ---------------------------------------------------------------------------
+# Export schemas
+# ---------------------------------------------------------------------------
+
+
+class TestExportSchemas:
+    def test_export_run_row(self) -> None:
+        row = ExportRunRow(
+            run_id=UUID1,
+            experiment_id=UUID2,
+            config_hash="abc",
+            config={"model": "gpt-4"},
+            status=RunStatus.completed,
+            duration_ms=500,
+            tokens_in=10,
+            tokens_out=20,
+            cost_estimate=0.001,
+            scores={"exact_match": 1.0, "bleu": 0.85},
+        )
+        assert row.scores["bleu"] == 0.85
+
+    def test_export_run_row_default_scores(self) -> None:
+        row = ExportRunRow(
+            run_id=UUID1,
+            experiment_id=UUID2,
+            config_hash="abc",
+            config={},
+            status=RunStatus.pending,
+        )
+        assert row.scores == {}
+
+    def test_export_response(self) -> None:
+        r = ExportResponse(
+            experiment_id=UUID1,
+            experiment_name="Test Exp",
+            rows=[],
+        )
+        assert r.rows == []
+
+
+# ---------------------------------------------------------------------------
+# Health schema
+# ---------------------------------------------------------------------------
+
+
+class TestHealthSchema:
+    def test_health_response(self) -> None:
+        h = HealthResponse(database=True, redis=False)
+        assert h.status == "ok"
+        assert h.database is True
+        assert h.redis is False
--- a/backend/tests/test_stack_integration.py
+++ b/backend/tests/test_stack_integration.py
@ -0,0 +1,138 @@
+"""Stack integration verification tests.
+
+These tests verify that all configuration files needed for 'docker compose up'
+are present, consistent, and well-formed. They do NOT start actual containers.
+"""
+
+import os
+from pathlib import Path
+
+import pytest
+
+ROOT = Path(__file__).resolve().parents[2]  # repo root
+
+
+class TestDockerComposeConfig:
+    """Verify docker-compose.yml references are satisfied."""
+
+    def test_docker_compose_exists(self):
+        assert (ROOT / "docker-compose.yml").is_file()
+
+    def test_dockerfile_exists(self):
+        assert (ROOT / "docker" / "Dockerfile").is_file()
+
+    def test_nginx_conf_exists(self):
+        assert (ROOT / "docker" / "nginx.conf").is_file()
+
+    def test_entrypoint_exists(self):
+        assert (ROOT / "docker" / "entrypoint.sh").is_file()
+
+    def test_requirements_txt_exists(self):
+        assert (ROOT / "backend" / "requirements.txt").is_file()
+
+    def test_alembic_ini_exists(self):
+        assert (ROOT / "alembic.ini").is_file()
+
+    def test_alembic_env_exists(self):
+        assert (ROOT / "alembic" / "env.py").is_file()
+
+    def test_alembic_has_migration(self):
+        versions = list((ROOT / "alembic" / "versions").glob("*.py"))
+        assert len(versions) >= 1, "Expected at least one Alembic migration"
+
+
+class TestDockerfileConsistency:
+    """Verify Dockerfile references match actual files."""
+
+    def test_dockerfile_copies_backend(self):
+        content = (ROOT / "docker" / "Dockerfile").read_text()
+        assert "COPY backend/" in content
+
+    def test_dockerfile_copies_alembic(self):
+        content = (ROOT / "docker" / "Dockerfile").read_text()
+        assert "COPY alembic/" in content
+        assert "COPY alembic.ini" in content
+
+    def test_dockerfile_copies_entrypoint(self):
+        content = (ROOT / "docker" / "Dockerfile").read_text()
+        assert "entrypoint.sh" in content
+
+    def test_dockerfile_runs_migrations_via_entrypoint(self):
+        content = (ROOT / "docker" / "entrypoint.sh").read_text()
+        assert "alembic upgrade head" in content
+
+
+class TestNginxConfig:
+    """Verify nginx proxies correctly."""
+
+    def test_nginx_proxies_api(self):
+        content = (ROOT / "docker" / "nginx.conf").read_text()
+        assert "proxy_pass http://promptlooper-api:8000" in content
+
+    def test_nginx_proxies_websocket(self):
+        content = (ROOT / "docker" / "nginx.conf").read_text()
+        assert "upgrade" in content.lower()
+
+    def test_nginx_serves_spa_fallback(self):
+        content = (ROOT / "docker" / "nginx.conf").read_text()
+        assert "try_files" in content
+        assert "/index.html" in content
+
+
+class TestFrontendBuildability:
+    """Verify frontend has all files needed for a build."""
+
+    def test_package_json_exists(self):
+        assert (ROOT / "frontend" / "package.json").is_file()
+
+    def test_index_html_exists(self):
+        assert (ROOT / "frontend" / "index.html").is_file()
+
+    def test_main_tsx_exists(self):
+        assert (ROOT / "frontend" / "src" / "main.tsx").is_file()
+
+    def test_app_tsx_exists(self):
+        assert (ROOT / "frontend" / "src" / "App.tsx").is_file()
+
+    def test_all_page_components_exist(self):
+        pages = [
+            "SetupPage", "LoginPage", "DashboardPage", "ProjectsPage",
+            "ExperimentPage", "LivePage", "ComparePage", "AdminPage",
+        ]
+        for page in pages:
+            assert (ROOT / "frontend" / "src" / "pages" / f"{page}.tsx").is_file(), f"Missing {page}.tsx"
+
+    def test_vite_config_exists(self):
+        assert (ROOT / "frontend" / "vite.config.ts").is_file()
+
+    def test_tailwind_config_exists(self):
+        assert (ROOT / "frontend" / "tailwind.config.js").is_file()
+
+
+class TestWorkerConfig:
+    """Verify Celery worker module exists and is importable."""
+
+    def test_worker_module_exists(self):
+        assert (ROOT / "backend" / "worker.py").is_file()
+
+
+class TestHealthEndpoint:
+    """Verify /health endpoint works in test mode."""
+
+    def test_health_returns_ok(self):
+        from fastapi.testclient import TestClient
+
+        # Ensure backend is importable
+        import sys
+        backend_dir = str(ROOT / "backend")
+        if backend_dir not in sys.path:
+            sys.path.insert(0, backend_dir)
+
+        from main import app
+        client = TestClient(app)
+        resp = client.get("/health")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["status"] in ("ok", "degraded")
+        assert "database" in data
+        assert "redis" in data
--- a/backend/tests/test_worker.py
+++ b/backend/tests/test_worker.py
@ -0,0 +1,47 @@
+"""Tests for backend/worker.py — Celery configuration."""
+
+import importlib
+import sys
+from unittest.mock import patch
+
+
+def test_celery_app_is_importable():
+    """worker.py exports a celery_app instance."""
+    # Need to ensure config module is importable
+    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
+    if backend_dir not in sys.path:
+        sys.path.insert(0, backend_dir)
+
+    import worker
+    assert hasattr(worker, "celery_app")
+    assert worker.celery_app.main == "promptlooper"
+
+
+def test_celery_app_serializer_settings():
+    """Verify JSON serialization is configured."""
+    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
+    if backend_dir not in sys.path:
+        sys.path.insert(0, backend_dir)
+
+    import worker
+    assert worker.celery_app.conf.task_serializer == "json"
+    assert worker.celery_app.conf.result_serializer == "json"
+
+
+def test_celery_defaults_to_memory_broker_without_redis():
+    """Without REDIS_URL, broker falls back to memory://."""
+    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
+    if backend_dir not in sys.path:
+        sys.path.insert(0, backend_dir)
+
+    with patch.dict("os.environ", {"REDIS_URL": ""}, clear=False):
+        # Force reload to pick up env change
+        if "config" in sys.modules:
+            importlib.reload(sys.modules["config"])
+        if "worker" in sys.modules:
+            importlib.reload(sys.modules["worker"])
+
+        import worker
+        # In no-redis mode, broker should be memory://
+        # (may have been set from settings.redis_url == None)
+        assert worker.celery_app is not None
--- a/backend/websocket/init.py
+++ b/backend/websocket/init.py
--- a/backend/worker.py
+++ b/backend/worker.py
@ -0,0 +1,30 @@
+"""PromptLooper Celery worker configuration."""
+
+from celery import Celery
+
+from config import settings
+
+# Determine broker and backend URLs
+broker_url = settings.redis_url or "memory://"
+result_backend = settings.redis_url or "cache+memory://"
+
+celery_app = Celery(
+    "promptlooper",
+    broker=broker_url,
+    backend=result_backend,
+)
+
+celery_app.conf.update(
+    task_serializer="json",
+    accept_content=["json"],
+    result_serializer="json",
+    timezone="UTC",
+    enable_utc=True,
+    worker_concurrency=settings.max_concurrent_runs,
+    task_track_started=True,
+    task_acks_late=True,
+    worker_prefetch_multiplier=1,
+)
+
+# Auto-discover tasks in engine package
+celery_app.autodiscover_tasks(["engine"], force=True)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,108 @@
+name: xpltd_promptlooper
+
+networks:
+  promptlooper:
+    driver: bridge
+    ipam:
+      config:
+        - subnet: 172.33.0.0/24
+
+services:
+  promptlooper-db:
+    image: postgres:16-alpine
+    container_name: promptlooper-db
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "5434:5432"
+    environment:
+      POSTGRES_USER: promptlooper
+      POSTGRES_PASSWORD: promptlooper
+      POSTGRES_DB: promptlooper
+    volumes:
+      - /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U promptlooper"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  promptlooper-redis:
+    image: redis:7-alpine
+    container_name: promptlooper-redis
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    volumes:
+      - /vmPool/r/services/promptlooper_redis:/data
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  promptlooper-api:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: api
+    container_name: promptlooper-api
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "8401:8401"  # MCP server
+    environment:
+      DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
+      REDIS_URL: redis://promptlooper-redis:6379/0
+      JWT_SECRET: ${JWT_SECRET:-dev-secret-change-in-production}
+      API_KEY: ${API_KEY:-}
+      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
+      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
+      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
+      MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
+      MCP_ENABLED: ${MCP_ENABLED:-true}
+      MCP_PORT: "8401"
+    depends_on:
+      promptlooper-db:
+        condition: service_healthy
+      promptlooper-redis:
+        condition: service_healthy
+
+  promptlooper-worker:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: api
+    container_name: promptlooper-worker
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    command: celery -A worker:celery_app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
+    working_dir: /app/backend
+    environment:
+      DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
+      REDIS_URL: redis://promptlooper-redis:6379/0
+      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
+      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
+      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
+    depends_on:
+      promptlooper-db:
+        condition: service_healthy
+      promptlooper-redis:
+        condition: service_healthy
+
+  promptlooper-web:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile
+      target: web
+    container_name: promptlooper-web
+    restart: unless-stopped
+    networks:
+      - promptlooper
+    ports:
+      - "8400:80"
+    depends_on:
+      - promptlooper-api
--- a/docker/.gitkeep
+++ b/docker/.gitkeep
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@ -0,0 +1,67 @@
+# =============================================================================
+# Stage 1: Frontend build
+# =============================================================================
+FROM node:20-alpine AS frontend-build
+
+WORKDIR /build
+
+COPY frontend/package.json frontend/package-lock.json* ./
+RUN npm ci || npm install
+
+COPY frontend/ ./
+RUN npm run build
+
+# =============================================================================
+# Stage 2: Python API runtime
+# =============================================================================
+FROM python:3.12-slim AS api
+
+WORKDIR /app
+
+# Install system dependencies for psycopg2 and general use
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends gcc libpq-dev curl && \
+    rm -rf /var/lib/apt/lists/*
+
+# Install Python dependencies
+COPY backend/requirements.txt /app/backend/requirements.txt
+RUN pip install --no-cache-dir -r /app/backend/requirements.txt
+
+# Copy backend source
+COPY backend/ /app/backend/
+COPY alembic/ /app/alembic/
+COPY alembic.ini /app/alembic.ini
+
+# Copy frontend build for single-container mode
+COPY --from=frontend-build /build/dist /app/static
+
+# Create data directory for SQLite mode
+RUN mkdir -p /data
+
+ENV PYTHONPATH=/app/backend
+ENV DATA_DIR=/data
+
+# Entrypoint runs migrations then starts the app
+COPY docker/entrypoint.sh /app/entrypoint.sh
+RUN chmod +x /app/entrypoint.sh
+
+EXPOSE 8000 8401
+
+# Default: run migrations then start the API server
+ENTRYPOINT ["/app/entrypoint.sh"]
+
+# =============================================================================
+# Stage 3: Nginx frontend (production compose)
+# =============================================================================
+FROM nginx:1.27-alpine AS web
+
+# Remove default config
+RUN rm /etc/nginx/conf.d/default.conf
+
+# Copy custom nginx config
+COPY docker/nginx.conf /etc/nginx/conf.d/default.conf
+
+# Copy built frontend assets
+COPY --from=frontend-build /build/dist /usr/share/nginx/html
+
+EXPOSE 80
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@ -0,0 +1,10 @@
+#!/bin/sh
+set -e
+
+# Run database migrations
+echo "Running database migrations..."
+cd /app && alembic upgrade head
+
+# Start the application
+echo "Starting PromptLooper API..."
+exec uvicorn main:app --host 0.0.0.0 --port 8000 --app-dir /app/backend "$@"
--- a/docker/nginx.conf
+++ b/docker/nginx.conf
@ -0,0 +1,44 @@
+server {
+    listen 80;
+    server_name _;
+
+    root /usr/share/nginx/html;
+    index index.html;
+
+    # Frontend static assets
+    location / {
+        try_files $uri $uri/ /index.html;
+    }
+
+    # API proxy
+    location /api/ {
+        proxy_pass http://promptlooper-api:8000;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+
+    # Health endpoint proxy
+    location /health {
+        proxy_pass http://promptlooper-api:8000;
+        proxy_set_header Host $host;
+    }
+
+    # WebSocket proxy
+    location /ws/ {
+        proxy_pass http://promptlooper-api:8000;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_read_timeout 86400;
+    }
+
+    # Gzip compression
+    gzip on;
+    gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript;
+    gzip_min_length 256;
+}
--- a/env.example
+++ b/env.example
@ -0,0 +1,23 @@
+# PromptLooper — Environment Configuration
+# Copy to .env and fill in required values
+
+# ── Database ──────────────────────────────────────────────
+POSTGRES_USER=promptlooper
+POSTGRES_PASSWORD=          # REQUIRED: set a strong password
+POSTGRES_DB=promptlooper
+
+# ── Auth ──────────────────────────────────────────────────
+JWT_SECRET=                 # REQUIRED: generate with `openssl rand -hex 32`
+
+# ── Default LLM Endpoint (optional) ──────────────────────
+# Pre-configure an LLM endpoint so users don't have to add one manually
+DEFAULT_ENDPOINT_URL=       # e.g. http://chat.forgetyour.name/api/v1
+DEFAULT_ENDPOINT_KEY=       # API key for the default endpoint
+
+# ── Limits ────────────────────────────────────────────────
+MAX_CONCURRENT_RUNS=4       # Parallel run limit per sweep
+MAX_TOKENS_PER_SWEEP=0      # 0 = unlimited; set a number to cap token spend
+
+# ── MCP Server ────────────────────────────────────────────
+MCP_ENABLED=true            # Enable/disable MCP server for agent access
+# MCP_PORT=8401             # MCP server port (set in docker-compose)
--- a/frontend/index.html
+++ b/frontend/index.html
@ -0,0 +1,12 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>PromptLooper</title>
+  </head>
+  <body>
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
@ -0,0 +1,31 @@
+{
+  "name": "promptlooper-frontend",
+  "private": true,
+  "version": "0.1.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc && vite build",
+    "preview": "vite preview",
+    "test": "vitest run"
+  },
+  "dependencies": {
+    "react": "^18.3.1",
+    "react-dom": "^18.3.1",
+    "react-router-dom": "^6.28.0"
+  },
+  "devDependencies": {
+    "@testing-library/jest-dom": "^6.9.1",
+    "@testing-library/react": "^16.3.2",
+    "@types/react": "^18.3.12",
+    "@types/react-dom": "^18.3.1",
+    "@vitejs/plugin-react": "^4.3.4",
+    "autoprefixer": "^10.4.20",
+    "jsdom": "^29.0.2",
+    "postcss": "^8.4.49",
+    "tailwindcss": "^3.4.15",
+    "typescript": "^5.6.3",
+    "vite": "^6.0.0",
+    "vitest": "^4.1.2"
+  }
+}
--- a/frontend/postcss.config.js
+++ b/frontend/postcss.config.js
@ -0,0 +1,6 @@
+export default {
+  plugins: {
+    tailwindcss: {},
+    autoprefixer: {},
+  },
+};
--- a/frontend/src/App.test.tsx
+++ b/frontend/src/App.test.tsx
@ -0,0 +1,59 @@
+import { render, screen } from "@testing-library/react";
+import { MemoryRouter } from "react-router-dom";
+import { describe, it, expect } from "vitest";
+import App from "./App";
+
+function renderWithRouter(route: string) {
+  return render(
+    <MemoryRouter initialEntries={[route]}>
+      <App />
+    </MemoryRouter>,
+  );
+}
+
+describe("App routing", () => {
+  it("renders SetupPage at /setup", () => {
+    renderWithRouter("/setup");
+    expect(screen.getByText("PromptLooper Setup")).toBeInTheDocument();
+  });
+
+  it("renders LoginPage at /login", () => {
+    renderWithRouter("/login");
+    expect(screen.getByText("Sign In")).toBeInTheDocument();
+  });
+
+  it("renders DashboardPage at /", () => {
+    renderWithRouter("/");
+    expect(screen.getByText("Dashboard")).toBeInTheDocument();
+  });
+
+  it("renders ProjectsPage at /projects", () => {
+    renderWithRouter("/projects");
+    expect(screen.getByText("Projects")).toBeInTheDocument();
+  });
+
+  it("renders ExperimentPage at /experiments/:id", () => {
+    renderWithRouter("/experiments/abc-123");
+    expect(screen.getByText("Experiment")).toBeInTheDocument();
+  });
+
+  it("renders LivePage at /live/:id", () => {
+    renderWithRouter("/live/abc-123");
+    expect(screen.getByText("Live")).toBeInTheDocument();
+  });
+
+  it("renders ComparePage at /compare", () => {
+    renderWithRouter("/compare");
+    expect(screen.getByText("Compare")).toBeInTheDocument();
+  });
+
+  it("renders AdminPage at /admin", () => {
+    renderWithRouter("/admin");
+    expect(screen.getByText("Admin")).toBeInTheDocument();
+  });
+
+  it("redirects unknown routes to dashboard", () => {
+    renderWithRouter("/nonexistent");
+    expect(screen.getByText("Dashboard")).toBeInTheDocument();
+  });
+});
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@ -0,0 +1,25 @@
+import { Routes, Route, Navigate } from "react-router-dom";
+import SetupPage from "./pages/SetupPage";
+import LoginPage from "./pages/LoginPage";
+import DashboardPage from "./pages/DashboardPage";
+import ProjectsPage from "./pages/ProjectsPage";
+import ExperimentPage from "./pages/ExperimentPage";
+import LivePage from "./pages/LivePage";
+import ComparePage from "./pages/ComparePage";
+import AdminPage from "./pages/AdminPage";
+
+export default function App() {
+  return (
+    <Routes>
+      <Route path="/setup" element={<SetupPage />} />
+      <Route path="/login" element={<LoginPage />} />
+      <Route path="/" element={<DashboardPage />} />
+      <Route path="/projects" element={<ProjectsPage />} />
+      <Route path="/experiments/:id" element={<ExperimentPage />} />
+      <Route path="/live/:id" element={<LivePage />} />
+      <Route path="/compare" element={<ComparePage />} />
+      <Route path="/admin" element={<AdminPage />} />
+      <Route path="*" element={<Navigate to="/" replace />} />
+    </Routes>
+  );
+}
--- a/frontend/src/api/client.test.ts
+++ b/frontend/src/api/client.test.ts
@ -0,0 +1,552 @@
+import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
+import {
+  setToken,
+  getToken,
+  clearToken,
+  ApiError,
+  auth,
+  projects,
+  experiments,
+  runs,
+  endpoints,
+  exportApi,
+  webhooks,
+  admin,
+  health,
+  connectWebSocket,
+} from "./client";
+
+// ---------------------------------------------------------------------------
+// Mock fetch
+// ---------------------------------------------------------------------------
+
+const mockFetch = vi.fn();
+
+beforeEach(() => {
+  mockFetch.mockReset();
+  vi.stubGlobal("fetch", mockFetch);
+  clearToken();
+});
+
+afterEach(() => {
+  vi.restoreAllMocks();
+});
+
+function jsonResponse(body: unknown, status = 200): Response {
+  return {
+    ok: status >= 200 && status < 300,
+    status,
+    statusText: status === 200 ? "OK" : "Error",
+    json: () => Promise.resolve(body),
+    text: () => Promise.resolve(JSON.stringify(body)),
+    headers: new Headers(),
+  } as unknown as Response;
+}
+
+function noContentResponse(): Response {
+  return {
+    ok: true,
+    status: 204,
+    statusText: "No Content",
+    json: () => Promise.reject(new Error("no body")),
+    text: () => Promise.resolve(""),
+    headers: new Headers(),
+  } as unknown as Response;
+}
+
+// ---------------------------------------------------------------------------
+// Token management
+// ---------------------------------------------------------------------------
+
+describe("token management", () => {
+  it("starts with null token", () => {
+    expect(getToken()).toBeNull();
+  });
+
+  it("sets and gets token", () => {
+    setToken("abc123");
+    expect(getToken()).toBe("abc123");
+  });
+
+  it("clears token", () => {
+    setToken("abc123");
+    clearToken();
+    expect(getToken()).toBeNull();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Auth header injection
+// ---------------------------------------------------------------------------
+
+describe("auth header injection", () => {
+  it("sends Authorization header when token is set", async () => {
+    setToken("my-jwt");
+    mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
+
+    await health.check();
+
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect((init.headers as Record<string, string>)["Authorization"]).toBe(
+      "Bearer my-jwt",
+    );
+  });
+
+  it("omits Authorization header when no token", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
+
+    await health.check();
+
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(
+      (init.headers as Record<string, string>)["Authorization"],
+    ).toBeUndefined();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ApiError
+// ---------------------------------------------------------------------------
+
+describe("ApiError", () => {
+  it("throws ApiError on non-ok response", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ detail: "not found" }, 404),
+    );
+
+    await expect(projects.get("some-id")).rejects.toThrow(ApiError);
+
+    try {
+      mockFetch.mockResolvedValueOnce(
+        jsonResponse({ detail: "bad" }, 400),
+      );
+      await projects.get("some-id");
+    } catch (e) {
+      expect(e).toBeInstanceOf(ApiError);
+      expect((e as ApiError).status).toBe(400);
+    }
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Content-Type header
+// ---------------------------------------------------------------------------
+
+describe("content-type", () => {
+  it("sets Content-Type for POST with body", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ access_token: "tok", token_type: "bearer" }),
+    );
+
+    await auth.setup({ username: "admin", password: "password123" });
+
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect((init.headers as Record<string, string>)["Content-Type"]).toBe(
+      "application/json",
+    );
+  });
+
+  it("omits Content-Type for GET requests", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+
+    await projects.list();
+
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(
+      (init.headers as Record<string, string>)["Content-Type"],
+    ).toBeUndefined();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Health
+// ---------------------------------------------------------------------------
+
+describe("health", () => {
+  it("calls /health", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ status: "ok", database: true, redis: true }),
+    );
+
+    const result = await health.check();
+
+    expect(mockFetch).toHaveBeenCalledWith("/health", expect.anything());
+    expect(result.status).toBe("ok");
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Auth endpoints
+// ---------------------------------------------------------------------------
+
+describe("auth", () => {
+  it("setup POSTs to /api/auth/setup", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ access_token: "tok", token_type: "bearer" }),
+    );
+
+    const result = await auth.setup({
+      username: "admin",
+      password: "password123",
+    });
+
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/auth/setup",
+      expect.anything(),
+    );
+    expect(result.access_token).toBe("tok");
+  });
+
+  it("login sets token automatically", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ access_token: "jwt-123", token_type: "bearer" }),
+    );
+
+    await auth.login({ username: "admin", password: "pass" });
+
+    expect(getToken()).toBe("jwt-123");
+  });
+
+  it("me GETs /api/auth/me", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({
+        id: "u1",
+        username: "admin",
+        is_admin: true,
+        created_at: "2026-01-01T00:00:00Z",
+      }),
+    );
+
+    const user = await auth.me();
+    expect(user.username).toBe("admin");
+  });
+
+  it("logout clears token", () => {
+    setToken("tok");
+    auth.logout();
+    expect(getToken()).toBeNull();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Projects
+// ---------------------------------------------------------------------------
+
+describe("projects", () => {
+  it("list GETs /api/projects/", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await projects.list();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/projects/",
+      expect.anything(),
+    );
+  });
+
+  it("create POSTs to /api/projects/", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ id: "p1", name: "Test" }),
+    );
+    await projects.create({ name: "Test" });
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(init.method).toBe("POST");
+    expect(JSON.parse(init.body as string)).toEqual({ name: "Test" });
+  });
+
+  it("get fetches by id", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
+    await projects.get("p1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/projects/p1",
+      expect.anything(),
+    );
+  });
+
+  it("update PUTs by id", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
+    await projects.update("p1", { name: "New" });
+    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(url).toBe("/api/projects/p1");
+    expect(init.method).toBe("PUT");
+  });
+
+  it("delete DELETEs by id", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await projects.delete("p1");
+    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(url).toBe("/api/projects/p1");
+    expect(init.method).toBe("DELETE");
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Experiments
+// ---------------------------------------------------------------------------
+
+describe("experiments", () => {
+  it("list GETs /api/experiments/", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await experiments.list();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/experiments/",
+      expect.anything(),
+    );
+  });
+
+  it("startSweep POSTs to sweep endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await experiments.startSweep("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/experiments/e1/sweep",
+      expect.anything(),
+    );
+  });
+
+  it("pause POSTs to pause endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await experiments.pause("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/experiments/e1/pause",
+      expect.anything(),
+    );
+  });
+
+  it("resume POSTs to resume endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await experiments.resume("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/experiments/e1/resume",
+      expect.anything(),
+    );
+  });
+
+  it("stop POSTs to stop endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await experiments.stop("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/experiments/e1/stop",
+      expect.anything(),
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Runs
+// ---------------------------------------------------------------------------
+
+describe("runs", () => {
+  it("list GETs runs for experiment", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await runs.list("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/runs/experiments/e1/runs",
+      expect.anything(),
+    );
+  });
+
+  it("get fetches run detail", async () => {
+    mockFetch.mockResolvedValueOnce(
+      jsonResponse({ id: "r1", stage_results: [], scores: [] }),
+    );
+    await runs.get("r1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/runs/r1",
+      expect.anything(),
+    );
+  });
+
+  it("score POSTs to run score endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "s1" }));
+    await runs.score("r1", { scorer_name: "human", value: 0.9 });
+    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(url).toBe("/api/runs/r1/score");
+    expect(init.method).toBe("POST");
+  });
+
+  it("leaderboard GETs leaderboard", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await runs.leaderboard("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/runs/experiments/e1/leaderboard",
+      expect.anything(),
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Endpoints
+// ---------------------------------------------------------------------------
+
+describe("endpoints", () => {
+  it("list GETs /api/endpoints/", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await endpoints.list();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/endpoints/",
+      expect.anything(),
+    );
+  });
+
+  it("test POSTs to test endpoint", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ models: [] }));
+    await endpoints.test("ep1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/endpoints/ep1/test",
+      expect.anything(),
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Export
+// ---------------------------------------------------------------------------
+
+describe("exportApi", () => {
+  it("best GETs best config", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({}));
+    await exportApi.best("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/export/experiments/e1/best",
+      expect.anything(),
+    );
+  });
+
+  it("env GETs env export", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse("KEY=val"));
+    await exportApi.env("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/export/experiments/e1/env",
+      expect.anything(),
+    );
+  });
+
+  it("report GETs report", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse("# Report"));
+    await exportApi.report("e1");
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/export/experiments/e1/report",
+      expect.anything(),
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Webhooks
+// ---------------------------------------------------------------------------
+
+describe("webhooks", () => {
+  it("list GETs /api/webhooks/", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
+    await webhooks.list();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/webhooks/",
+      expect.anything(),
+    );
+  });
+
+  it("create POSTs webhook", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "w1" }));
+    await webhooks.create({ event_type: "run.complete", url: "http://x" });
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(init.method).toBe("POST");
+  });
+
+  it("delete DELETEs webhook", async () => {
+    mockFetch.mockResolvedValueOnce(noContentResponse());
+    await webhooks.delete("w1");
+    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(url).toBe("/api/webhooks/w1");
+    expect(init.method).toBe("DELETE");
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Admin
+// ---------------------------------------------------------------------------
+
+describe("admin", () => {
+  it("getSettings GETs /api/admin/settings", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({}));
+    await admin.getSettings();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/admin/settings",
+      expect.anything(),
+    );
+  });
+
+  it("updateSettings PUTs /api/admin/settings", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({}));
+    await admin.updateSettings({ guest_access: true });
+    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
+    expect(init.method).toBe("PUT");
+  });
+
+  it("getStats GETs /api/admin/stats", async () => {
+    mockFetch.mockResolvedValueOnce(jsonResponse({}));
+    await admin.getStats();
+    expect(mockFetch).toHaveBeenCalledWith(
+      "/api/admin/stats",
+      expect.anything(),
+    );
+  });
+});
+
+// ---------------------------------------------------------------------------
+// WebSocket helper
+// ---------------------------------------------------------------------------
+
+describe("connectWebSocket", () => {
+  it("creates WebSocket with correct URL and handles messages", () => {
+    const sendSpy = vi.fn();
+    const closeSpy = vi.fn();
+    let capturedInstance: {
+      onmessage: ((ev: { data: string }) => void) | null;
+      onclose: (() => void) | null;
+      readyState: number;
+    };
+
+    // Use a class constructor so `new WebSocket(...)` works
+    class MockWebSocket {
+      static OPEN = 1;
+      readyState = 1;
+      onmessage: ((ev: { data: string }) => void) | null = null;
+      onclose: (() => void) | null = null;
+      send = sendSpy;
+      close = closeSpy;
+      constructor(public url: string) {
+        capturedInstance = this;
+      }
+    }
+
+    vi.stubGlobal("WebSocket", MockWebSocket);
+
+    Object.defineProperty(window, "location", {
+      value: { protocol: "http:", host: "localhost:5173" },
+      writable: true,
+      configurable: true,
+    });
+
+    const onMessage = vi.fn();
+    const onClose = vi.fn();
+    const conn = connectWebSocket(onMessage, onClose);
+
+    expect(capturedInstance!.url).toBe("ws://localhost:5173/ws");
+
+    // Simulate incoming message
+    capturedInstance!.onmessage!({ data: JSON.stringify({ type: "update" }) });
+    expect(onMessage).toHaveBeenCalledWith({ type: "update" });
+
+    // Send message
+    conn.send({ type: "ping" });
+    expect(sendSpy).toHaveBeenCalledWith('{"type":"ping"}');
+
+    // Simulate close
+    capturedInstance!.onclose!();
+    expect(onClose).toHaveBeenCalled();
+
+    // Close from client
+    conn.close();
+    expect(closeSpy).toHaveBeenCalled();
+
+    vi.unstubAllGlobals();
+  });
+});
--- a/frontend/src/api/client.ts
+++ b/frontend/src/api/client.ts
@ -0,0 +1,545 @@
+/**
+ * PromptLooper typed API client.
+ *
+ * - JWT token stored in memory (never localStorage) for security.
+ * - Automatic Authorization header injection.
+ * - Typed wrapper functions for every API endpoint group.
+ * - WebSocket connection helper for real-time updates.
+ */
+
+// ---------------------------------------------------------------------------
+// Types — mirrors backend Pydantic schemas
+// ---------------------------------------------------------------------------
+
+export interface ProjectCreate {
+  name: string;
+  description?: string | null;
+}
+
+export interface ProjectUpdate {
+  name?: string | null;
+  description?: string | null;
+}
+
+export interface ProjectResponse {
+  id: string;
+  name: string;
+  description: string | null;
+  owner_id: string;
+  created_at: string;
+  updated_at: string;
+}
+
+export interface ProjectListResponse {
+  items: ProjectResponse[];
+  total: number;
+}
+
+export interface ExperimentCreate {
+  name: string;
+  description?: string | null;
+  sample_data?: Record<string, unknown> | null;
+  pipeline_stages?: Record<string, unknown> | null;
+  scoring_config?: Record<string, unknown> | null;
+  parameter_space?: Record<string, unknown> | null;
+}
+
+export interface ExperimentUpdate {
+  name?: string | null;
+  description?: string | null;
+  sample_data?: Record<string, unknown> | null;
+  pipeline_stages?: Record<string, unknown> | null;
+  scoring_config?: Record<string, unknown> | null;
+  parameter_space?: Record<string, unknown> | null;
+  status?: string | null;
+}
+
+export interface ExperimentResponse {
+  id: string;
+  project_id: string;
+  name: string;
+  description: string | null;
+  sample_data: Record<string, unknown> | null;
+  pipeline_stages: Record<string, unknown> | null;
+  scoring_config: Record<string, unknown> | null;
+  parameter_space: Record<string, unknown> | null;
+  status: string;
+  created_at: string;
+  updated_at: string;
+}
+
+export interface ExperimentListResponse {
+  items: ExperimentResponse[];
+  total: number;
+}
+
+export interface RunResponse {
+  id: string;
+  experiment_id: string;
+  config_hash: string;
+  config: Record<string, unknown>;
+  status: string;
+  started_at: string | null;
+  completed_at: string | null;
+  duration_ms: number | null;
+  tokens_in: number | null;
+  tokens_out: number | null;
+  cost_estimate: number | null;
+}
+
+export interface RunListResponse {
+  items: RunResponse[];
+  total: number;
+}
+
+export interface StageResultResponse {
+  id: string;
+  run_id: string;
+  stage_index: number;
+  prompt_sent: string;
+  response_raw: string;
+  model_used: string;
+  parameters: Record<string, unknown> | null;
+  tokens_in: number | null;
+  tokens_out: number | null;
+  latency_ms: number | null;
+}
+
+export interface ScoreResponse {
+  id: string;
+  run_id: string;
+  scorer_name: string;
+  value: number;
+  scorer_metadata: Record<string, unknown> | null;
+  created_at: string;
+}
+
+export interface RunDetailResponse extends RunResponse {
+  stage_results: StageResultResponse[];
+  scores: ScoreResponse[];
+}
+
+export interface ScoreInput {
+  scorer_name: string;
+  value: number;
+  metadata?: Record<string, unknown> | null;
+}
+
+export interface EndpointCreate {
+  name: string;
+  url: string;
+  api_key?: string | null;
+  default_model?: string | null;
+}
+
+export interface EndpointUpdate {
+  name?: string | null;
+  url?: string | null;
+  api_key?: string | null;
+  default_model?: string | null;
+}
+
+export interface EndpointResponse {
+  id: string;
+  name: string;
+  url: string;
+  default_model: string | null;
+}
+
+export interface EndpointListResponse {
+  items: EndpointResponse[];
+  total: number;
+}
+
+export interface WebhookCreate {
+  event_type: string;
+  url: string;
+  headers?: Record<string, string> | null;
+  is_active?: boolean;
+}
+
+export interface WebhookUpdate {
+  event_type?: string | null;
+  url?: string | null;
+  headers?: Record<string, string> | null;
+  is_active?: boolean | null;
+}
+
+export interface WebhookResponse {
+  id: string;
+  event_type: string;
+  url: string;
+  headers: Record<string, string> | null;
+  is_active: boolean;
+}
+
+export interface WebhookListResponse {
+  items: WebhookResponse[];
+  total: number;
+}
+
+export interface SetupRequest {
+  username: string;
+  password: string;
+}
+
+export interface LoginRequest {
+  username: string;
+  password: string;
+}
+
+export interface TokenResponse {
+  access_token: string;
+  token_type: string;
+}
+
+export interface UserResponse {
+  id: string;
+  username: string;
+  is_admin: boolean;
+  created_at: string;
+}
+
+export interface HealthResponse {
+  status: string;
+  database: boolean;
+  redis: boolean;
+}
+
+export interface ExportRunRow {
+  run_id: string;
+  experiment_id: string;
+  config_hash: string;
+  config: Record<string, unknown>;
+  status: string;
+  duration_ms: number | null;
+  tokens_in: number | null;
+  tokens_out: number | null;
+  cost_estimate: number | null;
+  scores: Record<string, number>;
+}
+
+export interface ExportResponse {
+  experiment_id: string;
+  experiment_name: string;
+  rows: ExportRunRow[];
+}
+
+// ---------------------------------------------------------------------------
+// API Error
+// ---------------------------------------------------------------------------
+
+export class ApiError extends Error {
+  constructor(
+    public status: number,
+    public statusText: string,
+    public body: unknown,
+  ) {
+    super(`API ${status}: ${statusText}`);
+    this.name = "ApiError";
+  }
+}
+
+// ---------------------------------------------------------------------------
+// Token management (in-memory only)
+// ---------------------------------------------------------------------------
+
+let _accessToken: string | null = null;
+
+export function setToken(token: string | null): void {
+  _accessToken = token;
+}
+
+export function getToken(): string | null {
+  return _accessToken;
+}
+
+export function clearToken(): void {
+  _accessToken = null;
+}
+
+// ---------------------------------------------------------------------------
+// Base fetch wrapper
+// ---------------------------------------------------------------------------
+
+const BASE_URL = ""; // Uses Vite proxy in dev; same origin in prod
+
+async function request<T>(
+  path: string,
+  options: RequestInit = {},
+): Promise<T> {
+  const headers: Record<string, string> = {
+    ...(options.headers as Record<string, string> | undefined),
+  };
+
+  // Inject auth header
+  if (_accessToken) {
+    headers["Authorization"] = `Bearer ${_accessToken}`;
+  }
+
+  // Default content-type for requests with bodies
+  if (options.body && !headers["Content-Type"]) {
+    headers["Content-Type"] = "application/json";
+  }
+
+  const response = await fetch(`${BASE_URL}${path}`, {
+    ...options,
+    headers,
+  });
+
+  if (!response.ok) {
+    let body: unknown;
+    try {
+      body = await response.json();
+    } catch {
+      body = await response.text();
+    }
+    throw new ApiError(response.status, response.statusText, body);
+  }
+
+  // 204 No Content
+  if (response.status === 204) {
+    return undefined as T;
+  }
+
+  return response.json() as Promise<T>;
+}
+
+function get<T>(path: string): Promise<T> {
+  return request<T>(path, { method: "GET" });
+}
+
+function post<T>(path: string, body?: unknown): Promise<T> {
+  return request<T>(path, {
+    method: "POST",
+    body: body != null ? JSON.stringify(body) : undefined,
+  });
+}
+
+function put<T>(path: string, body?: unknown): Promise<T> {
+  return request<T>(path, {
+    method: "PUT",
+    body: body != null ? JSON.stringify(body) : undefined,
+  });
+}
+
+function del<T>(path: string): Promise<T> {
+  return request<T>(path, { method: "DELETE" });
+}
+
+// ---------------------------------------------------------------------------
+// Health
+// ---------------------------------------------------------------------------
+
+export const health = {
+  check: () => get<HealthResponse>("/health"),
+};
+
+// ---------------------------------------------------------------------------
+// Auth
+// ---------------------------------------------------------------------------
+
+export const auth = {
+  setup: (data: SetupRequest) =>
+    post<TokenResponse>("/api/auth/setup", data),
+
+  login: async (data: LoginRequest): Promise<TokenResponse> => {
+    const resp = await post<TokenResponse>("/api/auth/login", data);
+    setToken(resp.access_token);
+    return resp;
+  },
+
+  me: () => get<UserResponse>("/api/auth/me"),
+
+  logout: () => {
+    clearToken();
+  },
+};
+
+// ---------------------------------------------------------------------------
+// Projects
+// ---------------------------------------------------------------------------
+
+export const projects = {
+  list: () => get<ProjectListResponse>("/api/projects/"),
+
+  create: (data: ProjectCreate) =>
+    post<ProjectResponse>("/api/projects/", data),
+
+  get: (id: string) => get<ProjectResponse>(`/api/projects/${id}`),
+
+  update: (id: string, data: ProjectUpdate) =>
+    put<ProjectResponse>(`/api/projects/${id}`, data),
+
+  delete: (id: string) => del<void>(`/api/projects/${id}`),
+};
+
+// ---------------------------------------------------------------------------
+// Experiments
+// ---------------------------------------------------------------------------
+
+export const experiments = {
+  list: () => get<ExperimentListResponse>("/api/experiments/"),
+
+  create: (data: ExperimentCreate) =>
+    post<ExperimentResponse>("/api/experiments/", data),
+
+  get: (id: string) => get<ExperimentResponse>(`/api/experiments/${id}`),
+
+  update: (id: string, data: ExperimentUpdate) =>
+    put<ExperimentResponse>(`/api/experiments/${id}`, data),
+
+  delete: (id: string) => del<void>(`/api/experiments/${id}`),
+
+  startSweep: (id: string) =>
+    post<void>(`/api/experiments/${id}/sweep`),
+
+  pause: (id: string) =>
+    post<void>(`/api/experiments/${id}/pause`),
+
+  resume: (id: string) =>
+    post<void>(`/api/experiments/${id}/resume`),
+
+  stop: (id: string) =>
+    post<void>(`/api/experiments/${id}/stop`),
+};
+
+// ---------------------------------------------------------------------------
+// Runs
+// ---------------------------------------------------------------------------
+
+export const runs = {
+  list: (experimentId: string) =>
+    get<RunListResponse>(`/api/runs/experiments/${experimentId}/runs`),
+
+  get: (runId: string) =>
+    get<RunDetailResponse>(`/api/runs/${runId}`),
+
+  create: (data: Record<string, unknown>) =>
+    post<RunResponse>("/api/runs/", data),
+
+  score: (runId: string, data: ScoreInput) =>
+    post<ScoreResponse>(`/api/runs/${runId}/score`, data),
+
+  leaderboard: (experimentId: string) =>
+    get<RunListResponse>(
+      `/api/runs/experiments/${experimentId}/leaderboard`,
+    ),
+};
+
+// ---------------------------------------------------------------------------
+// Endpoints (LLM targets)
+// ---------------------------------------------------------------------------
+
+export const endpoints = {
+  list: () => get<EndpointListResponse>("/api/endpoints/"),
+
+  create: (data: EndpointCreate) =>
+    post<EndpointResponse>("/api/endpoints/", data),
+
+  update: (id: string, data: EndpointUpdate) =>
+    put<EndpointResponse>(`/api/endpoints/${id}`, data),
+
+  delete: (id: string) => del<void>(`/api/endpoints/${id}`),
+
+  test: (id: string) =>
+    post<Record<string, unknown>>(`/api/endpoints/${id}/test`),
+};
+
+// ---------------------------------------------------------------------------
+// Export
+// ---------------------------------------------------------------------------
+
+export const exportApi = {
+  best: (experimentId: string) =>
+    get<Record<string, unknown>>(
+      `/api/export/experiments/${experimentId}/best`,
+    ),
+
+  env: (experimentId: string) =>
+    get<string>(`/api/export/experiments/${experimentId}/env`),
+
+  yaml: (experimentId: string) =>
+    get<string>(`/api/export/experiments/${experimentId}/yaml`),
+
+  report: (experimentId: string) =>
+    get<string>(`/api/export/experiments/${experimentId}/report`),
+};
+
+// ---------------------------------------------------------------------------
+// Webhooks
+// ---------------------------------------------------------------------------
+
+export const webhooks = {
+  list: () => get<WebhookListResponse>("/api/webhooks/"),
+
+  create: (data: WebhookCreate) =>
+    post<WebhookResponse>("/api/webhooks/", data),
+
+  delete: (id: string) => del<void>(`/api/webhooks/${id}`),
+};
+
+// ---------------------------------------------------------------------------
+// Admin
+// ---------------------------------------------------------------------------
+
+export const admin = {
+  getSettings: () =>
+    get<Record<string, unknown>>("/api/admin/settings"),
+
+  updateSettings: (data: Record<string, unknown>) =>
+    put<Record<string, unknown>>("/api/admin/settings", data),
+
+  getStats: () => get<Record<string, unknown>>("/api/admin/stats"),
+};
+
+// ---------------------------------------------------------------------------
+// WebSocket helper
+// ---------------------------------------------------------------------------
+
+export type WsMessageHandler = (data: unknown) => void;
+
+export interface WsConnection {
+  send: (data: unknown) => void;
+  close: () => void;
+}
+
+/**
+ * Connect to the real-time WebSocket endpoint.
+ *
+ * @param onMessage  Called for each incoming message.
+ * @param onClose    Optional callback when connection closes.
+ * @returns Object with `send()` and `close()` methods.
+ */
+export function connectWebSocket(
+  onMessage: WsMessageHandler,
+  onClose?: () => void,
+): WsConnection {
+  const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
+  const wsUrl = `${protocol}//${window.location.host}/ws`;
+  const ws = new WebSocket(wsUrl);
+
+  ws.onmessage = (event) => {
+    try {
+      const data: unknown = JSON.parse(event.data as string);
+      onMessage(data);
+    } catch {
+      onMessage(event.data);
+    }
+  };
+
+  ws.onclose = () => {
+    onClose?.();
+  };
+
+  return {
+    send: (data: unknown) => {
+      if (ws.readyState === WebSocket.OPEN) {
+        ws.send(JSON.stringify(data));
+      }
+    },
+    close: () => {
+      ws.close();
+    },
+  };
+}
--- a/frontend/src/components/.gitkeep
+++ b/frontend/src/components/.gitkeep
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
@ -0,0 +1,3 @@
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
@ -0,0 +1,13 @@
+import React from "react";
+import ReactDOM from "react-dom/client";
+import { BrowserRouter } from "react-router-dom";
+import App from "./App";
+import "./index.css";
+
+ReactDOM.createRoot(document.getElementById("root")!).render(
+  <React.StrictMode>
+    <BrowserRouter>
+      <App />
+    </BrowserRouter>
+  </React.StrictMode>,
+);
--- a/frontend/src/pages/AdminPage.tsx
+++ b/frontend/src/pages/AdminPage.tsx
@ -0,0 +1,8 @@
+export default function AdminPage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Admin</h1>
+      <p className="text-gray-600">System administration and user management.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/ComparePage.tsx
+++ b/frontend/src/pages/ComparePage.tsx
@ -0,0 +1,8 @@
+export default function ComparePage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Compare</h1>
+      <p className="text-gray-600">Compare results across runs and experiments.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/DashboardPage.tsx
+++ b/frontend/src/pages/DashboardPage.tsx
@ -0,0 +1,8 @@
+export default function DashboardPage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Dashboard</h1>
+      <p className="text-gray-600">Overview of recent experiments and runs.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/ExperimentPage.tsx
+++ b/frontend/src/pages/ExperimentPage.tsx
@ -0,0 +1,8 @@
+export default function ExperimentPage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Experiment</h1>
+      <p className="text-gray-600">Configure and run prompt experiments.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/LivePage.tsx
+++ b/frontend/src/pages/LivePage.tsx
@ -0,0 +1,8 @@
+export default function LivePage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Live</h1>
+      <p className="text-gray-600">Real-time experiment progress and results.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/LoginPage.tsx
+++ b/frontend/src/pages/LoginPage.tsx
@ -0,0 +1,10 @@
+export default function LoginPage() {
+  return (
+    <div className="flex min-h-screen items-center justify-center bg-gray-50">
+      <div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
+        <h1 className="mb-4 text-2xl font-bold">Sign In</h1>
+        <p className="text-gray-600">Log in to PromptLooper.</p>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/pages/ProjectsPage.tsx
+++ b/frontend/src/pages/ProjectsPage.tsx
@ -0,0 +1,8 @@
+export default function ProjectsPage() {
+  return (
+    <div className="p-8">
+      <h1 className="mb-4 text-2xl font-bold">Projects</h1>
+      <p className="text-gray-600">Manage your prompt tuning projects.</p>
+    </div>
+  );
+}
--- a/frontend/src/pages/SetupPage.tsx
+++ b/frontend/src/pages/SetupPage.tsx
@ -0,0 +1,10 @@
+export default function SetupPage() {
+  return (
+    <div className="flex min-h-screen items-center justify-center bg-gray-50">
+      <div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
+        <h1 className="mb-4 text-2xl font-bold">PromptLooper Setup</h1>
+        <p className="text-gray-600">Create your admin account to get started.</p>
+      </div>
+    </div>
+  );
+}
--- a/frontend/src/test-setup.ts
+++ b/frontend/src/test-setup.ts
@ -0,0 +1 @@
+import "@testing-library/jest-dom/vitest";
--- a/frontend/src/vite-env.d.ts
+++ b/frontend/src/vite-env.d.ts
@ -0,0 +1 @@
+/// <reference types="vite/client" />
--- a/frontend/tailwind.config.js
+++ b/frontend/tailwind.config.js
@ -0,0 +1,8 @@
+/** @type {import('tailwindcss').Config} */
+export default {
+  content: ["./index.html", "./src/**/*.{js,ts,jsx,tsx}"],
+  theme: {
+    extend: {},
+  },
+  plugins: [],
+};
--- a/frontend/tsconfig.json
+++ b/frontend/tsconfig.json
@ -0,0 +1,21 @@
+{
+  "compilerOptions": {
+    "target": "ES2020",
+    "useDefineForClassFields": true,
+    "lib": ["ES2020", "DOM", "DOM.Iterable"],
+    "module": "ESNext",
+    "skipLibCheck": true,
+    "moduleResolution": "bundler",
+    "allowImportingTsExtensions": true,
+    "isolatedModules": true,
+    "moduleDetection": "force",
+    "noEmit": true,
+    "jsx": "react-jsx",
+    "strict": true,
+    "noUnusedLocals": true,
+    "noUnusedParameters": true,
+    "noFallthroughCasesInSwitch": true,
+    "forceConsistentCasingInFileNames": true
+  },
+  "include": ["src"]
+}
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@ -0,0 +1,25 @@
+import { defineConfig } from "vite";
+import react from "@vitejs/plugin-react";
+
+export default defineConfig({
+  plugins: [react()],
+  build: {
+    outDir: "dist",
+  },
+  server: {
+    port: 5173,
+    proxy: {
+      "/api": "http://localhost:8000",
+      "/ws": {
+        target: "ws://localhost:8000",
+        ws: true,
+      },
+      "/health": "http://localhost:8000",
+    },
+  },
+  test: {
+    environment: "jsdom",
+    globals: true,
+    setupFiles: ["./src/test-setup.ts"],
+  },
+});
--- a/promptlooper-spec.md
+++ b/promptlooper-spec.md
@ -0,0 +1,635 @@
+# PromptLooper
+
+> The one who loops prompts — a universal LLM pipeline tuning workbench.
+
+PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
+
+It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
+
+---
+
+## Problem Statement
+
+Anyone building LLM-powered applications faces the same painful loop:
+
+1. Write a system prompt
+2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
+3. Run it against sample data
+4. Read the output and decide if it's "good enough"
+5. Tweak something and repeat
+
+This process is manual, unscientific, and wasteful. There's no way to:
+- Systematically compare configurations side-by-side
+- Know if you've already tested a particular combination
+- Quantify "better" beyond gut feeling
+- Let an agent handle the iteration while you steer from above
+- Share optimized configurations between projects or team members
+
+PromptLooper makes this process systematic, observable, cached, and agent-drivable.
+
+---
+
+## Target Users
+
+| User | Use Case |
+|------|----------|
+| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
+| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
+| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
+| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
+| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
+
+---
+
+## Core Concepts
+
+### Experiment
+
+A named configuration that defines:
+- **Sample data**: Input documents, queries, or any text the pipeline will process
+- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
+- **Evaluation criteria**: Scoring functions that grade the output
+- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
+
+### Run
+
+A single execution of one specific configuration within an experiment. A run captures:
+- Full input configuration (prompt, model, all parameters)
+- Raw LLM response(s)
+- Timing data (latency, tokens in/out)
+- Evaluation scores
+- Configuration hash (for cache deduplication)
+
+### Sweep
+
+A batch of runs that systematically explores a parameter space. Types:
+- **Grid sweep**: Every combination of specified parameter values
+- **Random sweep**: Random sampling from parameter ranges
+- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
+
+### Scoring Function
+
+A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
+- **Embedding similarity**: How semantically close is the output to a reference answer?
+- **Length compliance**: Does the output meet length constraints?
+- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
+- **Keyword presence**: Do required terms appear in the output?
+- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
+- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
+- **Custom function**: User-provided Python snippet or HTTP webhook
+
+### Project
+
+A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
+- All experiments and their runs
+- Saved "best" configurations
+- Notes and annotations
+- Export history
+
+---
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│  Docker Compose: xpltd_promptlooper (ub01)                               │
+│  Network: promptlooper (172.33.0.0/24)                                   │
+│                                                                          │
+│  ┌────────────┐  ┌─────────────┐  ┌──────────────────────────────────┐  │
+│  │  PostgreSQL │  │    Redis    │  │         FastAPI (API)            │  │
+│  │  :5434      │  │  job queue  │  │  Experiments, Runs, Scoring,     │  │
+│  │  experiments│  │  pub/sub    │  │  Projects, Auth, MCP Server      │  │
+│  │  runs, cache│  │  live state │  │  WebSocket for live dashboard    │  │
+│  └─────┬───────┘  └──────┬──────┘  └──────────────┬───────────────────┘  │
+│        │                 │                        │                      │
+│  ┌─────┴─────────────────┴────────────────────────┴───────────────────┐  │
+│  │                      Celery Worker                                 │  │
+│  │  Executes runs against target LLM endpoints                        │  │
+│  │  Caches responses by config hash                                   │  │
+│  │  Streams progress via Redis pub/sub                                │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐  │
+│  │                    Web UI (React + Vite)                           │  │
+│  │  nginx → :8400                                                     │  │
+│  │  Dashboard, Experiment Builder, Live Observability, Steering       │  │
+│  └────────────────────────────────────────────────────────────────────┘  │
+└──────────────────────────────────────────────────────────────────────────┘
+                              │
+                              │  HTTP (OpenAI-compatible)
+                              ▼
+              ┌───────────────────────────────┐
+              │  Target LLM Endpoints          │
+              │  OpenWebUI, vLLM, Ollama,      │
+              │  OpenAI, Anthropic, any        │
+              │  OpenAI-compatible API          │
+              └───────────────────────────────┘
+```
+
+### Services (Production Compose)
+
+| Service | Image | Port | Purpose |
+|---------|-------|------|---------|
+| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
+| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
+| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
+| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
+| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
+
+### Single Container Mode
+
+When `DATABASE_URL` is not set, PromptLooper runs with:
+- SQLite at `/data/promptlooper.db`
+- In-process task queue (no Celery/Redis dependency)
+- All services in one container on port 8400
+
+```bash
+docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
+```
+
+---
+
+## Data Model
+
+### User
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| username | string | Unique, "admin" created on first boot |
+| password_hash | string | bcrypt |
+| is_admin | bool | Default true for first user |
+| created_at | timestamp | |
+
+### Project
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| name | string | |
+| description | text | Optional |
+| owner_id | UUID | FK → User |
+| created_at | timestamp | |
+| updated_at | timestamp | |
+
+### Experiment
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| project_id | UUID | FK → Project |
+| name | string | |
+| description | text | Optional |
+| sample_data | JSONB | Input documents/queries |
+| pipeline_stages | JSONB | Stage definitions with prompt templates |
+| scoring_config | JSONB | Which scoring functions to use and their weights |
+| parameter_space | JSONB | What to vary and ranges/options |
+| status | enum | draft, running, paused, completed |
+| created_at | timestamp | |
+| updated_at | timestamp | |
+
+### Run
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| experiment_id | UUID | FK → Experiment |
+| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
+| config | JSONB | Complete configuration snapshot |
+| status | enum | pending, running, completed, failed, cached |
+| started_at | timestamp | |
+| completed_at | timestamp | |
+| duration_ms | int | Wall clock time |
+| tokens_in | int | Total input tokens across all stages |
+| tokens_out | int | Total output tokens |
+| cost_estimate | decimal | Estimated cost based on model pricing |
+
+### StageResult
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| run_id | UUID | FK → Run |
+| stage_index | int | 0-based stage number |
+| prompt_sent | text | Actual prompt after template rendering |
+| response_raw | text | Raw LLM response |
+| model_used | string | Model identifier |
+| parameters | JSONB | Temperature, top_p, etc. |
+| tokens_in | int | This stage |
+| tokens_out | int | This stage |
+| latency_ms | int | This stage |
+
+### Score
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| run_id | UUID | FK → Run |
+| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
+| value | float | Normalized 0.0–1.0 |
+| metadata | JSONB | Scorer-specific details |
+| created_at | timestamp | |
+
+### ResponseCache
+| Field | Type | Notes |
+|-------|------|-------|
+| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
+| response | text | Cached LLM response |
+| model | string | |
+| tokens_in | int | |
+| tokens_out | int | |
+| latency_ms | int | Original latency |
+| created_at | timestamp | |
+
+### WebhookConfig
+| Field | Type | Notes |
+|-------|------|-------|
+| id | UUID | PK |
+| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
+| url | string | Target URL |
+| headers | JSONB | Optional auth headers |
+| is_active | bool | |
+
+---
+
+## API Endpoints
+
+### Auth
+| Method | Path | Description |
+|--------|------|-------------|
+| POST | `/api/v1/auth/setup` | First-boot admin password setup |
+| POST | `/api/v1/auth/login` | Login, returns JWT |
+| GET | `/api/v1/auth/me` | Current user info |
+
+### Admin
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
+| PUT | `/api/v1/admin/settings` | Update settings |
+| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
+
+### Projects
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/projects` | List projects |
+| POST | `/api/v1/projects` | Create project |
+| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
+| PUT | `/api/v1/projects/{id}` | Update project |
+| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
+
+### Experiments
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments` | List experiments (filter by project) |
+| POST | `/api/v1/experiments` | Create experiment |
+| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
+| PUT | `/api/v1/experiments/{id}` | Update experiment config |
+| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
+| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
+| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
+| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
+| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
+
+### Runs
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
+| GET | `/api/v1/runs/{id}` | Run detail with stage results |
+| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
+| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
+| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
+
+### Export
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
+| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
+| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
+| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
+
+### LLM Endpoints (Target Management)
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/endpoints` | List configured LLM endpoints |
+| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
+| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
+| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
+| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
+
+### Webhooks
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/api/v1/webhooks` | List webhook configs |
+| POST | `/api/v1/webhooks` | Create webhook |
+| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
+
+### WebSocket
+| Path | Description |
+|------|-------------|
+| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
+| `/ws/dashboard` | Global activity feed across all experiments |
+
+### Health
+| Method | Path | Description |
+|--------|------|-------------|
+| GET | `/health` | Health check (DB + Redis connectivity) |
+
+---
+
+## MCP Server
+
+PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
+
+### MCP Tools
+
+| Tool | Description |
+|------|-------------|
+| `create_project` | Create a new project workspace |
+| `create_experiment` | Define an experiment with sample data, stages, and scoring |
+| `configure_endpoint` | Add or update an LLM target endpoint |
+| `run_single` | Execute one specific configuration and return results |
+| `run_sweep` | Start a parameter sweep (grid/random/guided) |
+| `get_leaderboard` | Get top N configurations ranked by score |
+| `get_run_detail` | Get full details of a specific run |
+| `export_best_config` | Export the best configuration in JSON/YAML/env format |
+| `pause_sweep` | Pause a running sweep |
+| `resume_sweep` | Resume a paused sweep |
+| `add_human_score` | Rate a run's output |
+| `get_experiment_status` | Check experiment progress |
+| `list_models` | List available models across all configured endpoints |
+
+### Example Agent Interaction
+
+```
+Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
+        that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
+        sweeping temperature from 0.1 to 0.9 in 0.2 increments.
+        Use embedding similarity scoring against these reference outputs.
+        Run a grid sweep."
+
+PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
+                  → streams progress → [get_leaderboard]
+
+Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
+        a .env snippet I can drop into Chrysopedia."
+
+PromptLooper MCP: [export_best_config format=env]
+```
+
+---
+
+## Response Caching
+
+Every LLM call is cached by a SHA-256 hash of:
+- Prompt text (after template rendering)
+- Model identifier
+- All inference parameters (temperature, top_p, max_tokens, etc.)
+- Input data
+
+If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
+- Re-running experiments with new scoring functions costs zero tokens
+- Adding a new scorer retroactively evaluates all historical runs
+- Accidentally re-running a sweep wastes nothing
+- Cache can be invalidated per-run or per-experiment if needed
+
+---
+
+## Authentication Model
+
+### First Boot
+- App detects no users exist
+- Presents a setup screen: create admin username + password
+- Admin account is created, user is logged in
+
+### Guest Access
+- Admin can toggle `allow_guest_access` in settings
+- Guests can view experiments and results (read-only)
+- Guests cannot create experiments, run sweeps, or modify configs
+- Default: guest access disabled
+
+### API Authentication
+- JWT tokens for the web UI
+- API key (generated in admin settings) for programmatic access and MCP
+- API key passed via `Authorization: Bearer <key>` header
+
+---
+
+## Real-Time Observability Dashboard
+
+The dashboard is the primary user interface during active experimentation. It provides:
+
+### Live Experiment View
+- Progress bar: X of Y runs completed
+- Token usage accumulator (running total)
+- Cost estimate (based on configured model pricing)
+- Cache hit rate for current sweep
+- Estimated time remaining
+
+### Side-by-Side Output Comparison
+- Pick any two runs and diff their outputs
+- Highlight differences in prompt, parameters, and response
+- Score comparison overlay
+
+### Leaderboard
+- Real-time ranked list of runs by weighted score
+- Sortable by any individual scorer
+- Click to expand full run detail
+
+### Steering Controls
+- **Pause**: Stop the sweep after current run completes
+- **Fork**: Create a new experiment branching from current best, with modified parameters
+- **Redirect**: Change remaining sweep parameters mid-flight
+- **Approve**: Mark a configuration as "good enough" and export
+- **Reject**: Exclude a run from leaderboard consideration
+
+### Activity Timeline
+- Chronological feed of events: run started, run completed, new best found, cache hit, error
+- Filterable by event type
+
+---
+
+## Webhook Events
+
+| Event | Payload | Trigger |
+|-------|---------|---------|
+| `experiment.started` | experiment_id, sweep config | Sweep begins |
+| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
+| `experiment.paused` | experiment_id, reason | Manual or budget pause |
+| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
+| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
+| `human_needed` | experiment_id, reason, context | Agent requests human review |
+| `run.failed` | run_id, error | Individual run error |
+
+---
+
+## Configuration Export Formats
+
+### JSON
+```json
+{
+  "model": "qwen2.5-72b-instruct",
+  "endpoint": "http://chat.forgetyour.name/api",
+  "temperature": 0.3,
+  "top_p": 0.85,
+  "max_tokens": 2048,
+  "system_prompt": "You are a music production knowledge extractor...",
+  "score": 0.87,
+  "experiment": "chrysopedia-extraction-v2",
+  "exported_at": "2026-04-06T12:00:00Z"
+}
+```
+
+### .env
+```bash
+LLM_MODEL=qwen2.5-72b-instruct
+LLM_API_URL=http://chat.forgetyour.name/api
+LLM_TEMPERATURE=0.3
+LLM_TOP_P=0.85
+LLM_MAX_TOKENS=2048
+# Score: 0.87 | Experiment: chrysopedia-extraction-v2
+```
+
+### YAML
+```yaml
+model: qwen2.5-72b-instruct
+endpoint: http://chat.forgetyour.name/api
+parameters:
+  temperature: 0.3
+  top_p: 0.85
+  max_tokens: 2048
+system_prompt: |
+  You are a music production knowledge extractor...
+metadata:
+  score: 0.87
+  experiment: chrysopedia-extraction-v2
+  exported_at: 2026-04-06T12:00:00Z
+```
+
+---
+
+## Environment Variables
+
+| Group | Variable | Default | Notes |
+|-------|----------|---------|-------|
+| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
+| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
+| **Server** | `HOST` | `0.0.0.0` | Bind address |
+| **Server** | `PORT` | `8400` | HTTP port |
+| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
+| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
+| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
+| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
+| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
+| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
+| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
+| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
+| **MCP** | `MCP_PORT` | `8401` | MCP server port |
+
+---
+
+## Docker Compose (Production — XPLTD Conventions)
+
+Project name: `xpltd_promptlooper`
+Network: `promptlooper` (`172.33.0.0/24`)
+Persistent data: `/vmPool/r/services/promptlooper_*`
+PostgreSQL port: `5434` (external)
+Web UI port: `8400` (external)
+
+---
+
+## Technology Stack
+
+| Layer | Technology | Rationale |
+|-------|-----------|-----------|
+| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
+| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
+| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
+| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
+| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
+| **Styling** | Tailwind CSS | Fast iteration, utility-first |
+| **MCP** | Python MCP SDK | Standard protocol for agent integration |
+| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
+
+---
+
+## Development & Deployment
+
+### Local Development
+```bash
+git clone git@git.xpltd.co:xpltdco/promptlooper.git
+cd promptlooper
+cp .env.example .env
+docker compose up -d promptlooper-db promptlooper-redis
+cd backend && pip install -r requirements.txt
+alembic upgrade head
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+# In another terminal:
+cd frontend && npm install && npm run dev
+```
+
+### Production Deployment (ub01)
+```bash
+ssh ub01
+cd /vmPool/r/repos/xpltdco/promptlooper
+git pull && docker compose build && docker compose up -d
+```
+
+### Project Structure
+```
+promptlooper/
+├── backend/
+│   ├── main.py                 # FastAPI entry point
+│   ├── config.py               # Pydantic Settings
+│   ├── models.py               # SQLAlchemy ORM
+│   ├── schemas.py              # Pydantic request/response
+│   ├── auth.py                 # JWT + API key auth
+│   ├── worker.py               # Celery app config
+│   ├── routers/
+│   │   ├── auth.py
+│   │   ├── projects.py
+│   │   ├── experiments.py
+│   │   ├── runs.py
+│   │   ├── endpoints.py
+│   │   ├── export.py
+│   │   ├── webhooks.py
+│   │   └── admin.py
+│   ├── engine/
+│   │   ├── runner.py           # Run execution logic
+│   │   ├── sweep.py            # Sweep orchestration
+│   │   ├── cache.py            # Response cache layer
+│   │   ├── adapters/           # LLM endpoint adapters
+│   │   │   ├── openai_compat.py
+│   │   │   └── base.py
+│   │   └── scorers/            # Pluggable scoring functions
+│   │       ├── embedding.py
+│   │       ├── format.py
+│   │       ├── keyword.py
+│   │       ├── llm_judge.py
+│   │       └── base.py
+│   ├── mcp/
+│   │   ├── server.py           # MCP server implementation
+│   │   └── tools.py            # MCP tool definitions
+│   ├── websocket/
+│   │   └── manager.py          # WebSocket connection management
+│   └── tests/
+├── frontend/
+│   └── src/
+│       ├── pages/
+│       │   ├── Setup.tsx       # First-boot admin setup
+│       │   ├── Login.tsx
+│       │   ├── Dashboard.tsx   # Global activity
+│       │   ├── Projects.tsx
+│       │   ├── Experiment.tsx  # Experiment builder + config
+│       │   ├── Live.tsx        # Real-time observability
+│       │   ├── Compare.tsx     # Side-by-side run comparison
+│       │   └── Admin.tsx       # System settings
+│       ├── components/
+│       │   ├── Leaderboard.tsx
+│       │   ├── SteeringControls.tsx
+│       │   ├── RunCard.tsx
+│       │   ├── ScoreChart.tsx
+│       │   └── Timeline.tsx
+│       └── api/
+├── docker/
+│   ├── Dockerfile              # Multi-stage: API + frontend
+│   └── nginx.conf
+├── alembic/
+├── docker-compose.yml
+├── .env.example
+├── CLAUDE.md
+└── README.md
+```