73 changed files with 9900 additions and 2 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,68 @@
 # PromptLooper — Environment Variables
 # Copy to .env and adjust values for your deployment.
 # =============================================================================
 # Database
 # =============================================================================
 # PostgreSQL connection string for production mode.
 # When not set, PromptLooper uses SQLite at DATA_DIR/promptlooper.db (single-container mode).
 # DATABASE_URL=postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
 # =============================================================================
 # Redis
 # =============================================================================
 # Redis connection string for Celery task queue and pub/sub (live dashboard).
 # When not set, PromptLooper uses an in-process queue (single-container mode).
 # REDIS_URL=redis://promptlooper-redis:6379/0
 # =============================================================================
 # Server
 # =============================================================================
 # Bind address and port for the HTTP server.
 HOST=0.0.0.0
 PORT=8400
 # =============================================================================
 # Authentication
 # =============================================================================
 # Secret key used to sign JWT tokens. Auto-generated on first boot if not set.
 # IMPORTANT: Set this to a long random string in production.
 # JWT_SECRET=change-me-to-a-random-secret
 # Static API key for programmatic access (MCP, scripts, CI).
 # When not set, API key auth is disabled — only JWT login works.
 # API_KEY=
 # =============================================================================
 # Default LLM Endpoint
 # =============================================================================
 # Pre-configured LLM endpoint URL (OpenAI-compatible API).
 # Users can add more endpoints via the UI or API; this is a convenience default.
 # DEFAULT_ENDPOINT_URL=http://localhost:11434/v1
 # API key for the default endpoint, if required.
 # DEFAULT_ENDPOINT_KEY=
 # =============================================================================
 # Limits
 # =============================================================================
 # Maximum number of runs executing in parallel.
 MAX_CONCURRENT_RUNS=4
 # Token budget per sweep. 0 = unlimited.
 MAX_TOKENS_PER_SWEEP=0
 # =============================================================================
 # Storage
 # =============================================================================
 # Directory for SQLite database and file storage (single-container mode).
 DATA_DIR=/data
 # =============================================================================
 # MCP Server
 # =============================================================================
 # Enable the Model Context Protocol server for agent-driven workflows.
 MCP_ENABLED=true
 # Port for the MCP server (separate from the main API).
 MCP_PORT=8401
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,57 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.egg-info/
 *.egg
 dist/
 build/
 .eggs/
 *.whl
 .venv/
 venv/
 env/
 .env
 *.pyc
 .pytest_cache/
 .mypy_cache/
 .ruff_cache/
 htmlcov/
 .coverage
 .coverage.*
 # Node / Frontend
 node_modules/
 frontend/dist/
 frontend/build/
 .npm
 *.tsbuildinfo
 # Docker
 docker/nginx.conf.bak
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 .DS_Store
 # OS
 Thumbs.db
 Desktop.ini
 # Data (single-container mode)
 *.db
 /data/
 # Alembic
 alembic/versions/__pycache__/
 # Auto Run Docs (Maestro working files)
 Auto Run Docs/Working/
 # Misc
 *.log
 *.bak
--- a/Docs/01-scaffold.md
+++ b/Docs/01-scaffold.md
@ -0,0 +1,48 @@
 # Phase 1 — Project Scaffold
 Set up the PromptLooper repository, Docker infrastructure, and basic project skeleton. Read `promptlooper-spec.md` and `CLAUDE.md` before starting any task.
 - [x] Initialize the git repository at git.xpltd.co/xpltdco/promptlooper with a README.md that includes the project description from the spec, a quick-start section showing the single-container docker run command, and badges for license (AGPL-3.0) and status. Add .gitignore for Python, Node, and Docker artifacts.
  > NOTE: Git repo initialized locally with remote set to git@git.xpltd.co:xpltdco/promptlooper.git. Push failed — SSH key not configured for this host or repo not yet created on Gitea. Needs manual setup before pushing.
 - [x] Create the full directory structure as defined in the spec's Project Structure section. Every directory should exist with a placeholder __init__.py or .gitkeep as appropriate. Include backend/, frontend/, docker/, alembic/, and all subdirectories.
  > Created all directories: backend/ (with routers/, engine/adapters/, engine/scorers/, mcp/, websocket/, tests/), frontend/src/ (pages/, components/, api/), docker/, alembic/versions/. Python packages have __init__.py, non-Python dirs have .gitkeep.
 - [x] Create .env.example with all environment variables from the spec's Environment Variables table, with sensible defaults and comments explaining each group. Include DATABASE_URL, REDIS_URL, JWT_SECRET, DEFAULT_ENDPOINT_URL, MAX_CONCURRENT_RUNS, and all others.
  > Created .env.example with all 13 environment variables organized into 7 groups (Database, Redis, Server, Auth, Default LLM Endpoint, Limits, Storage, MCP). Production-only vars (DATABASE_URL, REDIS_URL, JWT_SECRET, API_KEY, DEFAULT_ENDPOINT_*) are commented out with explanatory notes. Single-container defaults work out of the box.
 - [x] Create docker-compose.yml following XPLTD conventions: project name xpltd_promptlooper, network promptlooper (172.33.0.0/24), PostgreSQL on port 5434, Redis, API service, worker service, and web service on port 8400. Use bind mounts under /vmPool/r/services/promptlooper_* for persistent data. Model this after Chrysopedia's docker-compose.yml patterns.
  > Updated existing docker-compose.yml: fixed DATABASE_URL to use standard postgresql:// scheme (not asyncpg), hardcoded DB credentials instead of requiring .env vars, added API_KEY pass-through, added working_dir for worker service, made JWT_SECRET optional with dev default. All 5 services defined: db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400). Bind mounts under /vmPool/r/services/promptlooper_*. Health checks on db and redis with dependency conditions.
 - [x] Create the multi-stage Dockerfile in docker/ that builds both backend and frontend into a single image. Stage 1: Node build for frontend (npm ci && npm run build). Stage 2: Python runtime with uvicorn, copying the built frontend assets. Include nginx.conf that serves the frontend and proxies /api and /ws to uvicorn. The image should work standalone with SQLite when no DATABASE_URL is provided.
  > Created 3-stage Dockerfile: (1) frontend-build with Node 20 Alpine, (2) api stage with Python 3.12-slim + uvicorn + static assets for single-container mode, (3) web stage with nginx 1.27 Alpine for production compose. nginx.conf proxies /api/ and /health to the API, upgrades /ws/ connections for WebSocket. Also created: backend/requirements.txt, frontend scaffolding (package.json, vite.config.ts, tsconfig.json, index.html, App.tsx, Tailwind config), and placeholder alembic.ini/env.py for Dockerfile COPY.
 - [x] Create backend/config.py using Pydantic Settings. Define all configuration from the Environment Variables table. Implement the SQLite fallback logic: when DATABASE_URL is not set, construct a SQLite URL pointing to DATA_DIR/promptlooper.db. When REDIS_URL is not set, set a flag for in-process mode.
  > Created backend/config.py with Pydantic Settings class defining all 13 env vars. SQLite fallback via `effective_database_url` property constructs sqlite:///DATA_DIR/promptlooper.db when DATABASE_URL is unset. `use_in_process_queue` property flags in-process mode when REDIS_URL is absent. JWT_SECRET auto-generates via `secrets.token_urlsafe(32)` when not provided. Empty API_KEY strings normalize to None. 13 tests in tests/test_config.py all passing.
 - [x] Create backend/models.py with all SQLAlchemy ORM models from the spec's Data Model section: User, Project, Experiment, Run, StageResult, Score, ResponseCache, and WebhookConfig. Include all fields, types, relationships, and indexes. Use UUID primary keys and JSONB for flexible fields.
  > Created all 8 ORM models with UUID PKs, JSON columns (using sqlalchemy.JSON for SQLite compatibility — maps to JSONB on PostgreSQL), enum types (ExperimentStatus, RunStatus), full relationship definitions with cascade deletes, and indexes on foreign keys and commonly filtered columns. Score.metadata mapped as `scorer_metadata` Python attribute (column name stays "metadata") to avoid SQLAlchemy reserved name conflict. 16 tests in tests/test_models.py all passing.
 - [x] Set up Alembic: create alembic.ini and alembic/env.py configured to read DATABASE_URL from the config. Generate and apply the initial migration from the models.
  > Created alembic.ini with logging config and script_location pointing to alembic/. env.py reads DATABASE_URL from backend.config.settings (with override support for tests). Added script.py.mako template. Generated initial migration (e1909678e89e) with all 8 tables, indexes, foreign keys, and enums. Migration applies cleanly on SQLite (render_as_batch=True for SQLite compatibility). 5 tests in tests/test_alembic.py covering upgrade/downgrade/columns/indexes/FKs. All 34 backend tests pass.
 - [x] Create backend/schemas.py with Pydantic request/response schemas for all API endpoints. Include create/update/response schemas for Project, Experiment, Run, Endpoint, and Webhook. Include the Score input schema and export format schemas.
  > Created backend/schemas.py with all Pydantic v2 schemas using ConfigDict(from_attributes=True) for ORM compatibility. Includes: Project (create/update/response/list), Experiment (create/update/response/list), Run (response/list/detail with nested stages+scores), StageResult (response), Score (input/response), Endpoint (create/update/response/list), Webhook (create/update/response/list), Auth (setup/login/token/user), Export (run row with scores dict, export response), and Health. 30 tests in tests/test_schemas.py all passing. All 64 backend tests pass.
 - [x] Create backend/main.py with the FastAPI application. Set up CORS middleware, mount all routers (even if they're stubs), configure the WebSocket endpoint, add the /health endpoint that checks DB and Redis connectivity, and add startup/shutdown lifecycle hooks.
  > Created backend/main.py with: CORS middleware (allow all origins), /health endpoint checking DB (SELECT 1) and Redis (ping) connectivity, /ws WebSocket endpoint with ConnectionManager for real-time broadcasts, async lifespan hooks for DB engine + Redis init/teardown, get_db dependency yielding sessions, dynamic router mounting (silently skips missing routers). 10 tests in tests/test_main.py covering health, CORS, WebSocket connect/disconnect/echo, OpenAPI schema, 404s, broadcast, get_db, and get_redis. All 74 backend tests pass.
 - [x] Create backend/auth.py implementing JWT token generation/verification, API key validation, and the first-boot setup flow. The setup endpoint should check if any users exist — if not, accept username + password to create the admin account. Include a dependency function for route-level auth that supports both JWT and API key.
  > Created backend/auth.py with: bcrypt password hashing via passlib, JWT token creation/verification (HS256, 24h expiry) using python-jose, first-boot `needs_setup()` + `create_admin()` flow (409 if admin exists), `authenticate_user()` for login, and `get_current_user` FastAPI dependency supporting both JWT Bearer tokens and X-Api-Key header (API key grants first admin user). UUID string-to-UUID conversion for SQLite compatibility. 21 tests in tests/test_auth.py covering hashing, JWT lifecycle, setup flow, login, and all auth dependency paths. All 95 backend tests pass.
 - [x] Scaffold all router files in backend/routers/ as stubs: auth.py, projects.py, experiments.py, runs.py, endpoints.py, export.py, webhooks.py, admin.py. Each should have the correct APIRouter prefix and tags, with placeholder endpoints that return 501 Not Implemented.
  > Created all 8 router stubs with APIRouter instances, mounted via main.py's _mount_routers(). Endpoints match the spec: auth (3 endpoints), projects (5), experiments (9 incl. sweep/pause/resume/stop), runs (5 incl. leaderboard), endpoints (5 incl. test), export (4 formats), webhooks (3), admin (3). All return 501 Not Implemented. 37 tests in tests/test_routers.py verify every route is mounted and returns 501. All 132 backend tests pass.
 - [x] Initialize the frontend: run npm create vite@latest with React + TypeScript template. Install Tailwind CSS and configure it. Install react-router-dom for routing. Create the basic App.tsx with routes for Setup, Login, Dashboard, Projects, Experiment, Live, Compare, and Admin pages (all as placeholder components). Verify it builds cleanly.
  > Frontend was already scaffolded with Vite + React + TypeScript + Tailwind + react-router-dom from the Dockerfile task. Added 8 placeholder page components (SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage, ComparePage, AdminPage) in frontend/src/pages/. Updated App.tsx with react-router-dom Routes and main.tsx with BrowserRouter. Unknown routes redirect to dashboard. Installed vitest + @testing-library/react for testing. 9 routing tests in App.test.tsx all passing. Build completes cleanly. All 132 backend tests still pass.
 - [x] Create frontend/src/api/client.ts with a typed API client using fetch. Include JWT token management (stored in memory, not localStorage), request/response interceptors for auth headers, and typed wrapper functions for each API endpoint group. Include WebSocket connection helper.
  > Created frontend/src/api/client.ts with: TypeScript interfaces mirroring all backend Pydantic schemas, in-memory JWT token management (setToken/getToken/clearToken — never localStorage), automatic Authorization header injection on all requests, Content-Type header for POST/PUT bodies, ApiError class for non-ok responses, typed wrapper functions for all 8 endpoint groups (auth, projects, experiments, runs, endpoints, export, webhooks, admin) plus health check, and connectWebSocket() helper that derives ws/wss from current protocol and handles JSON message parsing. 39 tests in src/api/client.test.ts covering token management, header injection, all endpoint groups, error handling, and WebSocket lifecycle. All 48 frontend tests pass. All 132 backend tests still pass.
 - [x] Verify the full stack runs: docker compose up should start all services. The API should respond to /health. The frontend should load and show the setup screen (since no admin exists). The database migration should have run. Document any manual steps needed in the README.
  > Created missing backend/worker.py (Celery app config for docker-compose worker service). Created docker/entrypoint.sh that runs `alembic upgrade head` before starting uvicorn, and updated Dockerfile to use it as ENTRYPOINT. Fixed README single-container quick-start (port 8000, not 8400) and added production compose docs (service list, first-boot instructions). Added 24 stack integration tests verifying all Docker/compose/nginx/frontend/alembic files are present and consistent, plus /health endpoint test. 3 worker tests confirm Celery config. All 159 backend + 48 frontend tests pass.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,127 @@
 # CLAUDE.md — PromptLooper
 ## What is this project?
 PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
 ## Repository
 - **Hosted at**: git.xpltd.co/xpltdco/promptlooper
 - **XPLTD project name**: `xpltd_promptlooper`
 - **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
 ## Tech Stack
 - **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
 - **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
 - **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
 - **Cache/Queue**: Redis 7 (production) / in-process (single-container)
 - **Real-time**: WebSocket via FastAPI + Redis pub/sub
 - **MCP**: Python MCP SDK
 - **Container**: Multi-stage Docker build, nginx for frontend
 ## XPLTD Conventions
 These are non-negotiable project conventions shared across all XPLTD projects:
 - Docker Compose project name: `xpltd_promptlooper`
 - Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
 - Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
 - PostgreSQL on external port `5434` (internal `5432`)
 - Web UI on port `8400`
 - MCP server on port `8401`
 - Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
 ## Key Architecture Decisions
 1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
 2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
 3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
 4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
 5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
 6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
 ## File Organization
 ```
 backend/
  main.py              — FastAPI app, middleware, router mounting
  config.py            — Pydantic Settings from env vars
  models.py            — SQLAlchemy ORM models
  schemas.py           — Pydantic request/response schemas
  auth.py              — JWT + API key authentication
  worker.py            — Celery app configuration
  routers/             — API endpoint handlers
  engine/              — Core experiment execution logic
    runner.py          — Individual run execution
    sweep.py           — Sweep orchestration (grid/random/guided)
    cache.py           — Response cache layer
    adapters/          — LLM endpoint adapters
    scorers/           — Pluggable scoring functions
  mcp/                 — MCP server implementation
  websocket/           — WebSocket connection management
 frontend/src/
  pages/               — Route-level components
  components/          — Shared UI components
  api/                 — Typed API client functions
 ```
 ## Database Migrations
 Use Alembic. Same patterns as Chrysopedia:
 ```bash
 alembic revision --autogenerate -m "describe_change"
 alembic upgrade head
 ```
 ## Running Locally
 ```bash
 docker compose up -d promptlooper-db promptlooper-redis
 cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # Frontend in another terminal:
 cd frontend && npm run dev
 ```
 ## Testing
 ```bash
 cd backend && pytest
 cd frontend && npm test
 ```
 ## Important Patterns
 ### Adding a new scorer
 1. Create `backend/engine/scorers/my_scorer.py`
 2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
 3. Register in `backend/engine/scorers/__init__.py`
 4. Add to frontend scorer picker component
 ### Adding a new LLM adapter
 1. Create `backend/engine/adapters/my_adapter.py`
 2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
 3. Register in `backend/engine/adapters/__init__.py`
 4. Currently only OpenAI-compatible is implemented; all others should be edge cases
 ### Adding a new MCP tool
 1. Add tool definition in `backend/mcp/tools.py`
 2. Implement handler in `backend/mcp/server.py`
 3. Tools should map 1:1 to API endpoints where possible
 ## Common Gotchas
 - Always hash the FULL config when checking cache — missing a single parameter means cache misses
 - WebSocket connections must be cleaned up on disconnect — use the connection manager
 - SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
 - Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
 - MCP server runs on a separate port from the main API
 ## Deployment
 ```bash
 ssh ub01
 cd /vmPool/r/repos/xpltdco/promptlooper
 git pull && docker compose build && docker compose up -d
 ```
--- a/README.md
+++ b/README.md
@ -1,3 +1,79 @@
-# promptlooper
+# PromptLooper
-Universal LLM pipeline tuning workbench — systematically optimize prompts, models, and inference parameters through cached experiments, pluggable scoring, and agent-driven sweeps via MCP.
+[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
 [![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
 > The one who loops prompts — a universal LLM pipeline tuning workbench.
 PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
 It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
 ## Quick Start
 ### Single Container (zero dependencies)
 ```bash
 docker run -p 8000:8000 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
 ```
 Open `http://localhost:8000` — you'll be prompted to create an admin account on first boot.
 > In single-container mode, the API serves the built frontend as static files at the root.
 > Database migrations run automatically on startup.
 ### Production (Docker Compose)
 ```bash
 git clone git@git.xpltd.co:xpltdco/promptlooper.git
 cd promptlooper
 cp .env.example .env
 # Edit .env — set JWT_SECRET at minimum
 docker compose up -d
 ```
 Open `http://localhost:8400` — nginx proxies the frontend (port 80 → 8400) and API (`/api/` → port 8000).
 **Services started:**
 - `promptlooper-db` — PostgreSQL 16 on port 5434
 - `promptlooper-redis` — Redis 7
 - `promptlooper-api` — FastAPI + Alembic migrations (auto-runs on startup)
 - `promptlooper-worker` — Celery worker for experiment execution
 - `promptlooper-web` — Nginx reverse proxy on port 8400
 **First boot:** Navigate to `http://localhost:8400/setup` to create the admin account.
 ## Features
 - **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
 - **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
 - **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
 - **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
 - **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
 - **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
 ## Development
 ```bash
 # Start backing services
 docker compose up -d promptlooper-db promptlooper-redis
 # Backend
 cd backend && pip install -r requirements.txt
 alembic upgrade head
 uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # Frontend (separate terminal)
 cd frontend && npm install && npm run dev
 ```
 ## Testing
 ```bash
 cd backend && pytest
 cd frontend && npm test
 ```
 ## License
 [AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)
--- a/alembic.ini
+++ b/alembic.ini
@ -0,0 +1,39 @@
 [alembic]
 script_location = alembic
 # sqlalchemy.url is set programmatically in env.py from backend.config
 sqlalchemy.url =
 [post_write_hooks]
 [loggers]
 keys = root,sqlalchemy,alembic
 [handlers]
 keys = console
 [formatters]
 keys = generic
 [logger_root]
 level = WARN
 handlers = console
 [logger_sqlalchemy]
 level = WARN
 handlers =
 qualname = sqlalchemy.engine
 [logger_alembic]
 level = INFO
 handlers =
 qualname = alembic
 [handler_console]
 class = StreamHandler
 args = (sys.stderr,)
 level = NOTSET
 formatter = generic
 [formatter_generic]
 format = %(levelname)-5.5s [%(name)s] %(message)s
 datefmt = %H:%M:%S
--- a/alembic/env.py
+++ b/alembic/env.py
@ -0,0 +1,66 @@
 """Alembic environment configuration for PromptLooper."""
 import sys
 from logging.config import fileConfig
 from pathlib import Path
 from alembic import context
 from sqlalchemy import engine_from_config, pool
 # Ensure the backend package is importable
 sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
 from backend.config import settings
 from backend.models import Base
 config = context.config
 if config.config_file_name is not None:
    fileConfig(config.config_file_name)
 # Use sqlalchemy.url from alembic config if already set (e.g. by tests),
 # otherwise fall back to application settings.
 if not config.get_main_option("sqlalchemy.url"):
    config.set_main_option("sqlalchemy.url", settings.effective_database_url)
 target_metadata = Base.metadata
 def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode — emit SQL to stdout."""
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
        render_as_batch=True,
    )
    with context.begin_transaction():
        context.run_migrations()
 def run_migrations_online() -> None:
    """Run migrations against a live database connection."""
    connectable = engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )
    with connectable.connect() as connection:
        context.configure(
            connection=connection,
            target_metadata=target_metadata,
            render_as_batch=True,
        )
        with context.begin_transaction():
            context.run_migrations()
 if context.is_offline_mode():
    run_migrations_offline()
 else:
    run_migrations_online()
--- a/alembic/script.py.mako
+++ b/alembic/script.py.mako
@ -0,0 +1,26 @@
 """${message}
 Revision ID: ${up_revision}
 Revises: ${down_revision | comma,n}
 Create Date: ${create_date}
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 ${imports if imports else ""}
 # revision identifiers, used by Alembic.
 revision: str = ${repr(up_revision)}
 down_revision: Union[str, None] = ${repr(down_revision)}
 branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
 depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
 def upgrade() -> None:
    ${upgrades if upgrades else "pass"}
 def downgrade() -> None:
    ${downgrades if downgrades else "pass"}
--- a/alembic/versions/.gitkeep
+++ b/alembic/versions/.gitkeep
--- a/alembic/versions/e1909678e89e_initial_schema.py
+++ b/alembic/versions/e1909678e89e_initial_schema.py
@ -0,0 +1,165 @@
 """initial_schema
 Revision ID: e1909678e89e
 Revises: 
 Create Date: 2026-04-07 01:50:18.571150
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision: str = 'e1909678e89e'
 down_revision: Union[str, None] = None
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.create_table('response_cache',
    sa.Column('config_hash', sa.String(length=64), nullable=False),
    sa.Column('response', sa.Text(), nullable=False),
    sa.Column('model', sa.String(length=255), nullable=False),
    sa.Column('tokens_in', sa.Integer(), nullable=True),
    sa.Column('tokens_out', sa.Integer(), nullable=True),
    sa.Column('latency_ms', sa.Integer(), nullable=True),
    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
    sa.PrimaryKeyConstraint('config_hash')
    )
    op.create_table('users',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('username', sa.String(length=255), nullable=False),
    sa.Column('password_hash', sa.String(length=255), nullable=False),
    sa.Column('is_admin', sa.Boolean(), nullable=False),
    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
    sa.PrimaryKeyConstraint('id'),
    sa.UniqueConstraint('username')
    )
    op.create_table('webhook_configs',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('event_type', sa.String(length=255), nullable=False),
    sa.Column('url', sa.String(length=2048), nullable=False),
    sa.Column('headers', sa.JSON(), nullable=True),
    sa.Column('is_active', sa.Boolean(), nullable=False),
    sa.PrimaryKeyConstraint('id')
    )
    with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
        batch_op.create_index('ix_webhook_configs_event_type', ['event_type'], unique=False)
    op.create_table('projects',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('name', sa.String(length=255), nullable=False),
    sa.Column('description', sa.Text(), nullable=True),
    sa.Column('owner_id', sa.Uuid(), nullable=False),
    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
    sa.ForeignKeyConstraint(['owner_id'], ['users.id'], ondelete='CASCADE'),
    sa.PrimaryKeyConstraint('id')
    )
    op.create_table('experiments',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('project_id', sa.Uuid(), nullable=False),
    sa.Column('name', sa.String(length=255), nullable=False),
    sa.Column('description', sa.Text(), nullable=True),
    sa.Column('sample_data', sa.JSON(), nullable=True),
    sa.Column('pipeline_stages', sa.JSON(), nullable=True),
    sa.Column('scoring_config', sa.JSON(), nullable=True),
    sa.Column('parameter_space', sa.JSON(), nullable=True),
    sa.Column('status', sa.Enum('draft', 'running', 'paused', 'completed', name='experiment_status'), nullable=False),
    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
    sa.ForeignKeyConstraint(['project_id'], ['projects.id'], ondelete='CASCADE'),
    sa.PrimaryKeyConstraint('id')
    )
    with op.batch_alter_table('experiments', schema=None) as batch_op:
        batch_op.create_index('ix_experiments_project_id', ['project_id'], unique=False)
        batch_op.create_index('ix_experiments_status', ['status'], unique=False)
    op.create_table('runs',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('experiment_id', sa.Uuid(), nullable=False),
    sa.Column('config_hash', sa.String(length=64), nullable=False),
    sa.Column('config', sa.JSON(), nullable=False),
    sa.Column('status', sa.Enum('pending', 'running', 'completed', 'failed', 'cached', name='run_status'), nullable=False),
    sa.Column('started_at', sa.DateTime(timezone=True), nullable=True),
    sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),
    sa.Column('duration_ms', sa.Integer(), nullable=True),
    sa.Column('tokens_in', sa.Integer(), nullable=True),
    sa.Column('tokens_out', sa.Integer(), nullable=True),
    sa.Column('cost_estimate', sa.Numeric(precision=12, scale=6), nullable=True),
    sa.ForeignKeyConstraint(['experiment_id'], ['experiments.id'], ondelete='CASCADE'),
    sa.PrimaryKeyConstraint('id')
    )
    with op.batch_alter_table('runs', schema=None) as batch_op:
        batch_op.create_index('ix_runs_config_hash', ['config_hash'], unique=False)
        batch_op.create_index('ix_runs_experiment_id', ['experiment_id'], unique=False)
        batch_op.create_index('ix_runs_status', ['status'], unique=False)
    op.create_table('scores',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('run_id', sa.Uuid(), nullable=False),
    sa.Column('scorer_name', sa.String(length=255), nullable=False),
    sa.Column('value', sa.Float(), nullable=False),
    sa.Column('metadata', sa.JSON(), nullable=True),
    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
    sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
    sa.PrimaryKeyConstraint('id')
    )
    with op.batch_alter_table('scores', schema=None) as batch_op:
        batch_op.create_index('ix_scores_run_id', ['run_id'], unique=False)
        batch_op.create_index('ix_scores_scorer_name', ['scorer_name'], unique=False)
    op.create_table('stage_results',
    sa.Column('id', sa.Uuid(), nullable=False),
    sa.Column('run_id', sa.Uuid(), nullable=False),
    sa.Column('stage_index', sa.Integer(), nullable=False),
    sa.Column('prompt_sent', sa.Text(), nullable=False),
    sa.Column('response_raw', sa.Text(), nullable=False),
    sa.Column('model_used', sa.String(length=255), nullable=False),
    sa.Column('parameters', sa.JSON(), nullable=True),
    sa.Column('tokens_in', sa.Integer(), nullable=True),
    sa.Column('tokens_out', sa.Integer(), nullable=True),
    sa.Column('latency_ms', sa.Integer(), nullable=True),
    sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
    sa.PrimaryKeyConstraint('id')
    )
    with op.batch_alter_table('stage_results', schema=None) as batch_op:
        batch_op.create_index('ix_stage_results_run_id', ['run_id'], unique=False)
    # ### end Alembic commands ###
 def downgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    with op.batch_alter_table('stage_results', schema=None) as batch_op:
        batch_op.drop_index('ix_stage_results_run_id')
    op.drop_table('stage_results')
    with op.batch_alter_table('scores', schema=None) as batch_op:
        batch_op.drop_index('ix_scores_scorer_name')
        batch_op.drop_index('ix_scores_run_id')
    op.drop_table('scores')
    with op.batch_alter_table('runs', schema=None) as batch_op:
        batch_op.drop_index('ix_runs_status')
        batch_op.drop_index('ix_runs_experiment_id')
        batch_op.drop_index('ix_runs_config_hash')
    op.drop_table('runs')
    with op.batch_alter_table('experiments', schema=None) as batch_op:
        batch_op.drop_index('ix_experiments_status')
        batch_op.drop_index('ix_experiments_project_id')
    op.drop_table('experiments')
    op.drop_table('projects')
    with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
        batch_op.drop_index('ix_webhook_configs_event_type')
    op.drop_table('webhook_configs')
    op.drop_table('users')
    op.drop_table('response_cache')
    # ### end Alembic commands ###
--- a/backend/init.py
+++ b/backend/init.py
--- a/backend/auth.py
+++ b/backend/auth.py
@ -0,0 +1,154 @@
 """PromptLooper authentication — JWT tokens, API keys, first-boot setup."""
 import uuid as _uuid
 from datetime import datetime, timedelta, timezone
 from typing import Generator
 from fastapi import Depends, HTTPException, Header, status
 from jose import JWTError, jwt
 from passlib.context import CryptContext
 from sqlalchemy.orm import Session
 from config import settings
 from models import User
 # ---------------------------------------------------------------------------
 # Password hashing
 # ---------------------------------------------------------------------------
 pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
 def hash_password(password: str) -> str:
    return pwd_context.hash(password)
 def verify_password(plain: str, hashed: str) -> bool:
    return pwd_context.verify(plain, hashed)
 # ---------------------------------------------------------------------------
 # JWT
 # ---------------------------------------------------------------------------
 ALGORITHM = "HS256"
 ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 24  # 24 hours
 def create_access_token(user_id: str, *, expires_delta: timedelta | None = None) -> str:
    expire = datetime.now(timezone.utc) + (expires_delta or timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES))
    payload = {"sub": user_id, "exp": expire}
    return jwt.encode(payload, settings.jwt_secret, algorithm=ALGORITHM)
 def decode_access_token(token: str) -> str:
    """Return the user_id (sub) from a valid JWT, or raise."""
    try:
        payload = jwt.decode(token, settings.jwt_secret, algorithms=[ALGORITHM])
        user_id: str | None = payload.get("sub")
        if user_id is None:
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
        return user_id
    except JWTError:
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
 # ---------------------------------------------------------------------------
 # First-boot setup
 # ---------------------------------------------------------------------------
 def needs_setup(db: Session) -> bool:
    """Return True if no users exist yet (first-boot state)."""
    return db.query(User).count() == 0
 def create_admin(db: Session, username: str, password: str) -> User:
    """Create the first admin user. Raises if users already exist."""
    if not needs_setup(db):
        raise HTTPException(
            status_code=status.HTTP_409_CONFLICT,
            detail="Admin account already exists",
        )
    user = User(
        username=username,
        password_hash=hash_password(password),
        is_admin=True,
    )
    db.add(user)
    db.commit()
    db.refresh(user)
    return user
 # ---------------------------------------------------------------------------
 # Authenticate (login)
 # ---------------------------------------------------------------------------
 def authenticate_user(db: Session, username: str, password: str) -> User:
    """Verify credentials and return the User, or raise 401."""
    user = db.query(User).filter(User.username == username).first()
    if user is None or not verify_password(password, user.password_hash):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials")
    return user
 # ---------------------------------------------------------------------------
 # Database session dependency (local to avoid circular import with main.py)
 # ---------------------------------------------------------------------------
 def _get_db() -> Generator[Session, None, None]:
    """Yield a DB session. Imported lazily from main to avoid circular import."""
    from main import get_db
    yield from get_db()
 # ---------------------------------------------------------------------------
 # Dependency: get current user (JWT or API key)
 # ---------------------------------------------------------------------------
 def get_current_user(
    authorization: str | None = Header(None),
    x_api_key: str | None = Header(None),
    db: Session = Depends(_get_db),
 ) -> User:
    """FastAPI dependency — resolve the current user from JWT Bearer token or API key.
    Priority:
    1. X-Api-Key header — matched against settings.api_key (grants first admin).
    2. Authorization: Bearer <jwt> — decoded to get user_id.
    """
    # --- API key path ---
    if x_api_key is not None:
        if settings.api_key is None or x_api_key != settings.api_key:
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key")
        # API key grants the first admin user
        admin = db.query(User).filter(User.is_admin.is_(True)).first()
        if admin is None:
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="No admin user exists")
        return admin
    # --- JWT path ---
    if authorization is None:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Missing authentication",
            headers={"WWW-Authenticate": "Bearer"},
        )
    scheme, _, token = authorization.partition(" ")
    if scheme.lower() != "bearer" or not token:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid authorization header",
            headers={"WWW-Authenticate": "Bearer"},
        )
    user_id_str = decode_access_token(token)
    try:
        user_id = _uuid.UUID(user_id_str)
    except ValueError:
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
    user = db.query(User).filter(User.id == user_id).first()
    if user is None:
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="User not found")
    return user
--- a/backend/config.py
+++ b/backend/config.py
@ -0,0 +1,76 @@
 """PromptLooper configuration — Pydantic Settings loaded from environment."""
 import secrets
 from pathlib import Path
 from pydantic import field_validator
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )
    # --- Database ---
    database_url: str | None = None
    # --- Redis ---
    redis_url: str | None = None
    # --- Server ---
    host: str = "0.0.0.0"
    port: int = 8400
    # --- Auth ---
    jwt_secret: str = ""
    api_key: str | None = None
    # --- Default LLM Endpoint ---
    default_endpoint_url: str | None = None
    default_endpoint_key: str | None = None
    # --- Limits ---
    max_concurrent_runs: int = 4
    max_tokens_per_sweep: int = 0  # 0 = unlimited
    # --- Storage ---
    data_dir: str = "/data"
    # --- MCP ---
    mcp_enabled: bool = True
    mcp_port: int = 8401
    def model_post_init(self, __context: object) -> None:
        # Auto-generate JWT secret if not provided
        if not self.jwt_secret:
            self.jwt_secret = secrets.token_urlsafe(32)
    @property
    def effective_database_url(self) -> str:
        """Return DATABASE_URL or construct a SQLite URL from DATA_DIR."""
        if self.database_url:
            return self.database_url
        db_path = Path(self.data_dir) / "promptlooper.db"
        return f"sqlite:///{db_path}"
    @property
    def is_sqlite(self) -> bool:
        return self.effective_database_url.startswith("sqlite")
    @property
    def use_in_process_queue(self) -> bool:
        """When Redis is unavailable, use in-process task execution."""
        return self.redis_url is None
    @field_validator("api_key", mode="before")
    @classmethod
    def empty_string_to_none(cls, v: str | None) -> str | None:
        if v is not None and v.strip() == "":
            return None
        return v
 settings = Settings()
--- a/backend/engine/init.py
+++ b/backend/engine/init.py
--- a/backend/engine/adapters/init.py
+++ b/backend/engine/adapters/init.py
--- a/backend/engine/scorers/init.py
+++ b/backend/engine/scorers/init.py
--- a/backend/main.py
+++ b/backend/main.py
@ -0,0 +1,211 @@
 """PromptLooper FastAPI application."""
 from contextlib import asynccontextmanager
 from typing import AsyncGenerator
 from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.middleware.cors import CORSMiddleware
 from sqlalchemy import create_engine, text
 from sqlalchemy.orm import sessionmaker
 from config import settings
 # ---------------------------------------------------------------------------
 # Database engine & session factory (lazy, created at startup)
 # ---------------------------------------------------------------------------
 engine = None
 SessionLocal = None
 def _init_db() -> None:
    """Create the SQLAlchemy engine and session factory."""
    global engine, SessionLocal
    connect_args = {}
    if settings.is_sqlite:
        connect_args["check_same_thread"] = False
    engine = create_engine(
        settings.effective_database_url,
        connect_args=connect_args,
    )
    SessionLocal = sessionmaker(bind=engine, autoflush=False, expire_on_commit=False)
 def get_db():
    """FastAPI dependency that yields a database session."""
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()
 # ---------------------------------------------------------------------------
 # Redis helper
 # ---------------------------------------------------------------------------
 _redis_client = None
 def _init_redis() -> None:
    """Connect to Redis if configured."""
    global _redis_client
    if not settings.redis_url:
        _redis_client = None
        return
    import redis as redis_lib
    _redis_client = redis_lib.Redis.from_url(settings.redis_url, decode_responses=True)
 def get_redis():
    """Return the Redis client (or None in single-container mode)."""
    return _redis_client
 # ---------------------------------------------------------------------------
 # WebSocket connection manager
 # ---------------------------------------------------------------------------
 class ConnectionManager:
    """Manage active WebSocket connections."""
    def __init__(self) -> None:
        self.active_connections: list[WebSocket] = []
    async def connect(self, websocket: WebSocket) -> None:
        await websocket.accept()
        self.active_connections.append(websocket)
    def disconnect(self, websocket: WebSocket) -> None:
        self.active_connections.remove(websocket)
    async def broadcast(self, message: dict) -> None:
        for connection in list(self.active_connections):
            try:
                await connection.send_json(message)
            except Exception:
                self.disconnect(connection)
 ws_manager = ConnectionManager()
 # ---------------------------------------------------------------------------
 # Lifecycle
 # ---------------------------------------------------------------------------
@asynccontextmanager
 async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Startup and shutdown lifecycle hooks."""
    _init_db()
    _init_redis()
    yield
    # Shutdown: clean up connections
    if _redis_client is not None:
        _redis_client.close()
    if engine is not None:
        engine.dispose()
 # ---------------------------------------------------------------------------
 # Application
 # ---------------------------------------------------------------------------
 app = FastAPI(
    title="PromptLooper",
    description="LLM pipeline tuning workbench",
    version="0.1.0",
    lifespan=lifespan,
 )
 # CORS — allow all origins in development; tighten in production via env
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # ---------------------------------------------------------------------------
 # Health endpoint
 # ---------------------------------------------------------------------------
@app.get("/health", tags=["system"])
 def health_check() -> dict:
    """Check DB and Redis connectivity."""
    db_ok = False
    redis_ok = False
    # Database check
    if SessionLocal is not None:
        try:
            with SessionLocal() as session:
                session.execute(text("SELECT 1"))
            db_ok = True
        except Exception:
            pass
    # Redis check
    if not settings.redis_url:
        redis_ok = True  # No Redis needed — in-process mode
    elif _redis_client is not None:
        try:
            _redis_client.ping()
            redis_ok = True
        except Exception:
            pass
    return {"status": "ok" if (db_ok and redis_ok) else "degraded", "database": db_ok, "redis": redis_ok}
 # ---------------------------------------------------------------------------
 # WebSocket endpoint
 # ---------------------------------------------------------------------------
@app.websocket("/ws")
 async def websocket_endpoint(websocket: WebSocket) -> None:
    """WebSocket connection for real-time dashboard updates."""
    await ws_manager.connect(websocket)
    try:
        while True:
            # Keep connection alive; handle incoming messages if needed
            data = await websocket.receive_json()
            # Echo back or handle client messages in future
            await websocket.send_json({"type": "ack", "data": data})
    except WebSocketDisconnect:
        ws_manager.disconnect(websocket)
 # ---------------------------------------------------------------------------
 # Mount routers (stubs — actual implementations come later)
 # ---------------------------------------------------------------------------
 # Router imports are deferred to avoid circular imports and allow
 # stub files to be created independently.  Each router will be mounted
 # as it is implemented.  For now we register empty prefixes.
 def _mount_routers() -> None:
    """Import and mount all routers. Silently skip missing ones."""
    router_configs = [
        ("routers.auth", "/api/auth", ["auth"]),
        ("routers.projects", "/api/projects", ["projects"]),
        ("routers.experiments", "/api/experiments", ["experiments"]),
        ("routers.runs", "/api/runs", ["runs"]),
        ("routers.endpoints", "/api/endpoints", ["endpoints"]),
        ("routers.export", "/api/export", ["export"]),
        ("routers.webhooks", "/api/webhooks", ["webhooks"]),
        ("routers.admin", "/api/admin", ["admin"]),
    ]
    for module_name, prefix, tags in router_configs:
        try:
            import importlib
            mod = importlib.import_module(module_name)
            app.include_router(mod.router, prefix=prefix, tags=tags)
        except (ImportError, AttributeError):
            pass  # Router not yet implemented
 _mount_routers()
--- a/backend/mcp/init.py
+++ b/backend/mcp/init.py
--- a/backend/models.py
+++ b/backend/models.py
@ -0,0 +1,276 @@
 """PromptLooper SQLAlchemy ORM models."""
 import enum
 import uuid
 from datetime import datetime, timezone
 from sqlalchemy import (
    JSON,
    Boolean,
    DateTime,
    Enum,
    Float,
    ForeignKey,
    Index,
    Integer,
    Numeric,
    String,
    Text,
 )
 from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
 def _utcnow() -> datetime:
    return datetime.now(timezone.utc)
 def _new_uuid() -> uuid.UUID:
    return uuid.uuid4()
 # ---------------------------------------------------------------------------
 # Base
 # ---------------------------------------------------------------------------
 class Base(DeclarativeBase):
    """Shared declarative base for all models."""
    type_annotation_map = {
        dict: JSON,
    }
 # ---------------------------------------------------------------------------
 # Enums
 # ---------------------------------------------------------------------------
 class ExperimentStatus(str, enum.Enum):
    draft = "draft"
    running = "running"
    paused = "paused"
    completed = "completed"
 class RunStatus(str, enum.Enum):
    pending = "pending"
    running = "running"
    completed = "completed"
    failed = "failed"
    cached = "cached"
 # ---------------------------------------------------------------------------
 # Models
 # ---------------------------------------------------------------------------
 class User(Base):
    __tablename__ = "users"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    username: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
    password_hash: Mapped[str] = mapped_column(String(255), nullable=False)
    is_admin: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, nullable=False
    )
    # Relationships
    projects: Mapped[list["Project"]] = relationship(
        back_populates="owner", cascade="all, delete-orphan"
    )
 class Project(Base):
    __tablename__ = "projects"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    name: Mapped[str] = mapped_column(String(255), nullable=False)
    description: Mapped[str | None] = mapped_column(Text, nullable=True)
    owner_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("users.id", ondelete="CASCADE"), nullable=False
    )
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, nullable=False
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
    )
    # Relationships
    owner: Mapped["User"] = relationship(back_populates="projects")
    experiments: Mapped[list["Experiment"]] = relationship(
        back_populates="project", cascade="all, delete-orphan"
    )
 class Experiment(Base):
    __tablename__ = "experiments"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    project_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
    )
    name: Mapped[str] = mapped_column(String(255), nullable=False)
    description: Mapped[str | None] = mapped_column(Text, nullable=True)
    sample_data: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    pipeline_stages: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    scoring_config: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    parameter_space: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    status: Mapped[ExperimentStatus] = mapped_column(
        Enum(ExperimentStatus, name="experiment_status"),
        default=ExperimentStatus.draft,
        nullable=False,
    )
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, nullable=False
    )
    updated_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
    )
    # Relationships
    project: Mapped["Project"] = relationship(back_populates="experiments")
    runs: Mapped[list["Run"]] = relationship(
        back_populates="experiment", cascade="all, delete-orphan"
    )
    __table_args__ = (
        Index("ix_experiments_project_id", "project_id"),
        Index("ix_experiments_status", "status"),
    )
 class Run(Base):
    __tablename__ = "runs"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    experiment_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("experiments.id", ondelete="CASCADE"), nullable=False
    )
    config_hash: Mapped[str] = mapped_column(String(64), nullable=False)
    config: Mapped[dict] = mapped_column(JSON, nullable=False)
    status: Mapped[RunStatus] = mapped_column(
        Enum(RunStatus, name="run_status"),
        default=RunStatus.pending,
        nullable=False,
    )
    started_at: Mapped[datetime | None] = mapped_column(
        DateTime(timezone=True), nullable=True
    )
    completed_at: Mapped[datetime | None] = mapped_column(
        DateTime(timezone=True), nullable=True
    )
    duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
    cost_estimate: Mapped[float | None] = mapped_column(
        Numeric(precision=12, scale=6), nullable=True
    )
    # Relationships
    experiment: Mapped["Experiment"] = relationship(back_populates="runs")
    stage_results: Mapped[list["StageResult"]] = relationship(
        back_populates="run", cascade="all, delete-orphan"
    )
    scores: Mapped[list["Score"]] = relationship(
        back_populates="run", cascade="all, delete-orphan"
    )
    __table_args__ = (
        Index("ix_runs_experiment_id", "experiment_id"),
        Index("ix_runs_config_hash", "config_hash"),
        Index("ix_runs_status", "status"),
    )
 class StageResult(Base):
    __tablename__ = "stage_results"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    run_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
    )
    stage_index: Mapped[int] = mapped_column(Integer, nullable=False)
    prompt_sent: Mapped[str] = mapped_column(Text, nullable=False)
    response_raw: Mapped[str] = mapped_column(Text, nullable=False)
    model_used: Mapped[str] = mapped_column(String(255), nullable=False)
    parameters: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
    latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    # Relationships
    run: Mapped["Run"] = relationship(back_populates="stage_results")
    __table_args__ = (
        Index("ix_stage_results_run_id", "run_id"),
    )
 class Score(Base):
    __tablename__ = "scores"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    run_id: Mapped[uuid.UUID] = mapped_column(
        ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
    )
    scorer_name: Mapped[str] = mapped_column(String(255), nullable=False)
    value: Mapped[float] = mapped_column(Float, nullable=False)
    scorer_metadata: Mapped[dict | None] = mapped_column(
        "metadata", JSON, nullable=True
    )
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, nullable=False
    )
    # Relationships
    run: Mapped["Run"] = relationship(back_populates="scores")
    __table_args__ = (
        Index("ix_scores_run_id", "run_id"),
        Index("ix_scores_scorer_name", "scorer_name"),
    )
 class ResponseCache(Base):
    __tablename__ = "response_cache"
    config_hash: Mapped[str] = mapped_column(
        String(64), primary_key=True
    )
    response: Mapped[str] = mapped_column(Text, nullable=False)
    model: Mapped[str] = mapped_column(String(255), nullable=False)
    tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
    tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
    latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
    created_at: Mapped[datetime] = mapped_column(
        DateTime(timezone=True), default=_utcnow, nullable=False
    )
 class WebhookConfig(Base):
    __tablename__ = "webhook_configs"
    id: Mapped[uuid.UUID] = mapped_column(
        primary_key=True, default=_new_uuid
    )
    event_type: Mapped[str] = mapped_column(String(255), nullable=False)
    url: Mapped[str] = mapped_column(String(2048), nullable=False)
    headers: Mapped[dict | None] = mapped_column(JSON, nullable=True)
    is_active: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
    __table_args__ = (
        Index("ix_webhook_configs_event_type", "event_type"),
    )
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -0,0 +1,16 @@
 # PromptLooper — Backend Dependencies
 fastapi>=0.115,<1.0
 uvicorn[standard]>=0.32,<1.0
 sqlalchemy>=2.0,<3.0
 alembic>=1.14,<2.0
 pydantic>=2.0,<3.0
 pydantic-settings>=2.0,<3.0
 python-jose[cryptography]>=3.3,<4.0
 passlib[bcrypt]>=1.7,<2.0
 celery>=5.4,<6.0
 redis>=5.0,<6.0
 httpx>=0.27,<1.0
 websockets>=13.0,<14.0
 psycopg2-binary>=2.9,<3.0
 aiosqlite>=0.20,<1.0
 python-multipart>=0.0.9
--- a/backend/routers/init.py
+++ b/backend/routers/init.py
--- a/backend/routers/admin.py
+++ b/backend/routers/admin.py
@ -0,0 +1,23 @@
 """Admin router — system settings and stats."""
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/settings", status_code=501)
 def get_settings():
    """System settings (guest access, default model, etc.)."""
    return Response(status_code=501, content="Not Implemented")
@router.put("/settings", status_code=501)
 def update_settings():
    """Update settings."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/stats", status_code=501)
 def get_stats():
    """System-wide stats (total runs, cache hit rate, etc.)."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/auth.py
+++ b/backend/routers/auth.py
@ -0,0 +1,23 @@
 """Auth router — setup, login, and current user info."""
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.post("/setup", status_code=501)
 def setup():
    """First-boot admin password setup."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/login", status_code=501)
 def login():
    """Login, returns JWT."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/me", status_code=501)
 def me():
    """Current user info."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/endpoints.py
+++ b/backend/routers/endpoints.py
@ -0,0 +1,37 @@
 """Endpoints router — LLM target management."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/", status_code=501)
 def list_endpoints():
    """List configured LLM endpoints."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
 def create_endpoint():
    """Add endpoint (URL, API key, label)."""
    return Response(status_code=501, content="Not Implemented")
@router.put("/{endpoint_id}", status_code=501)
 def update_endpoint(endpoint_id: uuid.UUID):
    """Update endpoint."""
    return Response(status_code=501, content="Not Implemented")
@router.delete("/{endpoint_id}", status_code=501)
 def delete_endpoint(endpoint_id: uuid.UUID):
    """Remove endpoint."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{endpoint_id}/test", status_code=501)
 def test_endpoint(endpoint_id: uuid.UUID):
    """Test connectivity and list available models."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/experiments.py
+++ b/backend/routers/experiments.py
@ -0,0 +1,61 @@
 """Experiments router — CRUD and sweep controls."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/", status_code=501)
 def list_experiments():
    """List experiments (filter by project)."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
 def create_experiment():
    """Create experiment."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/{experiment_id}", status_code=501)
 def get_experiment(experiment_id: uuid.UUID):
    """Experiment detail with run summaries."""
    return Response(status_code=501, content="Not Implemented")
@router.put("/{experiment_id}", status_code=501)
 def update_experiment(experiment_id: uuid.UUID):
    """Update experiment config."""
    return Response(status_code=501, content="Not Implemented")
@router.delete("/{experiment_id}", status_code=501)
 def delete_experiment(experiment_id: uuid.UUID):
    """Delete experiment."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/sweep", status_code=501)
 def start_sweep(experiment_id: uuid.UUID):
    """Start a sweep (grid, random, or guided)."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/pause", status_code=501)
 def pause_sweep(experiment_id: uuid.UUID):
    """Pause running sweep."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/resume", status_code=501)
 def resume_sweep(experiment_id: uuid.UUID):
    """Resume paused sweep."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/stop", status_code=501)
 def stop_sweep(experiment_id: uuid.UUID):
    """Stop sweep."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/export.py
+++ b/backend/routers/export.py
@ -0,0 +1,31 @@
 """Export router — export experiment results in various formats."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/experiments/{experiment_id}/best", status_code=501)
 def export_best(experiment_id: uuid.UUID):
    """Best config as JSON."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/env", status_code=501)
 def export_env(experiment_id: uuid.UUID):
    """Best config as .env snippet."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/yaml", status_code=501)
 def export_yaml(experiment_id: uuid.UUID):
    """Best config as YAML."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/report", status_code=501)
 def export_report(experiment_id: uuid.UUID):
    """Full experiment report (markdown)."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/projects.py
+++ b/backend/routers/projects.py
@ -0,0 +1,37 @@
 """Projects router — CRUD for projects."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/", status_code=501)
 def list_projects():
    """List projects."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
 def create_project():
    """Create project."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/{project_id}", status_code=501)
 def get_project(project_id: uuid.UUID):
    """Project detail with experiment summaries."""
    return Response(status_code=501, content="Not Implemented")
@router.put("/{project_id}", status_code=501)
 def update_project(project_id: uuid.UUID):
    """Update project."""
    return Response(status_code=501, content="Not Implemented")
@router.delete("/{project_id}", status_code=501)
 def delete_project(project_id: uuid.UUID):
    """Delete project and all experiments."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/runs.py
+++ b/backend/routers/runs.py
@ -0,0 +1,37 @@
 """Runs router — execute, detail, score, and leaderboard."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/experiments/{experiment_id}/runs", status_code=501)
 def list_runs(experiment_id: uuid.UUID):
    """List runs with scores (sortable, filterable)."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/{run_id}", status_code=501)
 def get_run(run_id: uuid.UUID):
    """Run detail with stage results."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
 def create_run():
    """Execute a single run (ad-hoc)."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/{run_id}/score", status_code=501)
 def score_run(run_id: uuid.UUID):
    """Add human rating to a run."""
    return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/leaderboard", status_code=501)
 def leaderboard(experiment_id: uuid.UUID):
    """Top runs ranked by weighted score."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/routers/webhooks.py
+++ b/backend/routers/webhooks.py
@ -0,0 +1,25 @@
 """Webhooks router — manage webhook configurations."""
 import uuid
 from fastapi import APIRouter, Response
 router = APIRouter()
@router.get("/", status_code=501)
 def list_webhooks():
    """List webhook configs."""
    return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
 def create_webhook():
    """Create webhook."""
    return Response(status_code=501, content="Not Implemented")
@router.delete("/{webhook_id}", status_code=501)
 def delete_webhook(webhook_id: uuid.UUID):
    """Remove webhook."""
    return Response(status_code=501, content="Not Implemented")
--- a/backend/schemas.py
+++ b/backend/schemas.py
@ -0,0 +1,298 @@
 """PromptLooper Pydantic request/response schemas."""
 import uuid
 from datetime import datetime
 from pydantic import BaseModel, ConfigDict, Field
 from models import ExperimentStatus, RunStatus
 # ---------------------------------------------------------------------------
 # Shared mixins
 # ---------------------------------------------------------------------------
 class _TimestampMixin(BaseModel):
    created_at: datetime
    updated_at: datetime
 # ---------------------------------------------------------------------------
 # Project
 # ---------------------------------------------------------------------------
 class ProjectCreate(BaseModel):
    name: str = Field(..., min_length=1, max_length=255)
    description: str | None = None
 class ProjectUpdate(BaseModel):
    name: str | None = Field(None, min_length=1, max_length=255)
    description: str | None = None
 class ProjectResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    name: str
    description: str | None
    owner_id: uuid.UUID
    created_at: datetime
    updated_at: datetime
 class ProjectListResponse(BaseModel):
    items: list[ProjectResponse]
    total: int
 # ---------------------------------------------------------------------------
 # Experiment
 # ---------------------------------------------------------------------------
 class ExperimentCreate(BaseModel):
    name: str = Field(..., min_length=1, max_length=255)
    description: str | None = None
    sample_data: dict | None = None
    pipeline_stages: dict | None = None
    scoring_config: dict | None = None
    parameter_space: dict | None = None
 class ExperimentUpdate(BaseModel):
    name: str | None = Field(None, min_length=1, max_length=255)
    description: str | None = None
    sample_data: dict | None = None
    pipeline_stages: dict | None = None
    scoring_config: dict | None = None
    parameter_space: dict | None = None
    status: ExperimentStatus | None = None
 class ExperimentResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    project_id: uuid.UUID
    name: str
    description: str | None
    sample_data: dict | None
    pipeline_stages: dict | None
    scoring_config: dict | None
    parameter_space: dict | None
    status: ExperimentStatus
    created_at: datetime
    updated_at: datetime
 class ExperimentListResponse(BaseModel):
    items: list[ExperimentResponse]
    total: int
 # ---------------------------------------------------------------------------
 # Run
 # ---------------------------------------------------------------------------
 class RunResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    experiment_id: uuid.UUID
    config_hash: str
    config: dict
    status: RunStatus
    started_at: datetime | None
    completed_at: datetime | None
    duration_ms: int | None
    tokens_in: int | None
    tokens_out: int | None
    cost_estimate: float | None
 class RunListResponse(BaseModel):
    items: list[RunResponse]
    total: int
 # ---------------------------------------------------------------------------
 # StageResult (read-only, returned inside Run details)
 # ---------------------------------------------------------------------------
 class StageResultResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    run_id: uuid.UUID
    stage_index: int
    prompt_sent: str
    response_raw: str
    model_used: str
    parameters: dict | None
    tokens_in: int | None
    tokens_out: int | None
    latency_ms: int | None
 class RunDetailResponse(RunResponse):
    """Run with nested stage results and scores."""
    stage_results: list[StageResultResponse] = []
    scores: list["ScoreResponse"] = []
 # ---------------------------------------------------------------------------
 # Score
 # ---------------------------------------------------------------------------
 class ScoreInput(BaseModel):
    scorer_name: str = Field(..., min_length=1, max_length=255)
    value: float
    metadata: dict | None = None
 class ScoreResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    run_id: uuid.UUID
    scorer_name: str
    value: float
    scorer_metadata: dict | None
    created_at: datetime
 # ---------------------------------------------------------------------------
 # Endpoint (LLM endpoint configuration)
 # ---------------------------------------------------------------------------
 class EndpointCreate(BaseModel):
    name: str = Field(..., min_length=1, max_length=255)
    url: str = Field(..., min_length=1, max_length=2048)
    api_key: str | None = None
    default_model: str | None = Field(None, max_length=255)
 class EndpointUpdate(BaseModel):
    name: str | None = Field(None, min_length=1, max_length=255)
    url: str | None = Field(None, min_length=1, max_length=2048)
    api_key: str | None = None
    default_model: str | None = Field(None, max_length=255)
 class EndpointResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    name: str
    url: str
    default_model: str | None
 class EndpointListResponse(BaseModel):
    items: list[EndpointResponse]
    total: int
 # ---------------------------------------------------------------------------
 # Webhook
 # ---------------------------------------------------------------------------
 class WebhookCreate(BaseModel):
    event_type: str = Field(..., min_length=1, max_length=255)
    url: str = Field(..., min_length=1, max_length=2048)
    headers: dict | None = None
    is_active: bool = True
 class WebhookUpdate(BaseModel):
    event_type: str | None = Field(None, min_length=1, max_length=255)
    url: str | None = Field(None, min_length=1, max_length=2048)
    headers: dict | None = None
    is_active: bool | None = None
 class WebhookResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    event_type: str
    url: str
    headers: dict | None
    is_active: bool
 class WebhookListResponse(BaseModel):
    items: list[WebhookResponse]
    total: int
 # ---------------------------------------------------------------------------
 # Auth
 # ---------------------------------------------------------------------------
 class SetupRequest(BaseModel):
    username: str = Field(..., min_length=1, max_length=255)
    password: str = Field(..., min_length=8)
 class LoginRequest(BaseModel):
    username: str
    password: str
 class TokenResponse(BaseModel):
    access_token: str
    token_type: str = "bearer"
 class UserResponse(BaseModel):
    model_config = ConfigDict(from_attributes=True)
    id: uuid.UUID
    username: str
    is_admin: bool
    created_at: datetime
 # ---------------------------------------------------------------------------
 # Export
 # ---------------------------------------------------------------------------
 class ExportRunRow(BaseModel):
    """Flat row for CSV/JSON export of run results."""
    run_id: uuid.UUID
    experiment_id: uuid.UUID
    config_hash: str
    config: dict
    status: RunStatus
    duration_ms: int | None = None
    tokens_in: int | None = None
    tokens_out: int | None = None
    cost_estimate: float | None = None
    scores: dict[str, float] = Field(
        default_factory=dict,
        description="Map of scorer_name → value",
    )
 class ExportResponse(BaseModel):
    experiment_id: uuid.UUID
    experiment_name: str
    rows: list[ExportRunRow]
 # ---------------------------------------------------------------------------
 # Health
 # ---------------------------------------------------------------------------
 class HealthResponse(BaseModel):
    status: str = "ok"
    database: bool
    redis: bool
 # Rebuild forward refs for RunDetailResponse
 RunDetailResponse.model_rebuild()
--- a/backend/tests/init.py
+++ b/backend/tests/init.py
--- a/backend/tests/test_alembic.py
+++ b/backend/tests/test_alembic.py
@ -0,0 +1,107 @@
 """Tests for Alembic migration setup."""
 import os
 from pathlib import Path
 import pytest
 from alembic import command
 from alembic.config import Config
 from sqlalchemy import create_engine, inspect
 # Resolve the repo root regardless of where pytest is invoked from.
 _REPO_ROOT = Path(__file__).resolve().parents[2]
@pytest.fixture()
 def alembic_cfg(tmp_path):
    """Create an Alembic config pointing at a temporary SQLite database."""
    db_path = tmp_path / "test.db"
    db_url = f"sqlite:///{db_path}"
    cfg = Config(str(_REPO_ROOT / "alembic.ini"))
    cfg.set_main_option("script_location", str(_REPO_ROOT / "alembic"))
    cfg.set_main_option("sqlalchemy.url", db_url)
    return cfg, db_url
 def test_upgrade_head_creates_all_tables(alembic_cfg):
    """Running 'upgrade head' should create all expected tables."""
    cfg, db_url = alembic_cfg
    command.upgrade(cfg, "head")
    engine = create_engine(db_url)
    inspector = inspect(engine)
    tables = set(inspector.get_table_names())
    expected = {
        "alembic_version",
        "users",
        "projects",
        "experiments",
        "runs",
        "stage_results",
        "scores",
        "response_cache",
        "webhook_configs",
    }
    assert expected == tables
 def test_downgrade_base_removes_all_tables(alembic_cfg):
    """Running 'downgrade base' should remove all application tables."""
    cfg, db_url = alembic_cfg
    command.upgrade(cfg, "head")
    command.downgrade(cfg, "base")
    engine = create_engine(db_url)
    inspector = inspect(engine)
    tables = set(inspector.get_table_names())
    # Only alembic_version should remain
    assert tables == {"alembic_version"}
 def test_runs_table_has_expected_columns(alembic_cfg):
    """Spot-check that the runs table has key columns."""
    cfg, db_url = alembic_cfg
    command.upgrade(cfg, "head")
    engine = create_engine(db_url)
    inspector = inspect(engine)
    columns = {c["name"] for c in inspector.get_columns("runs")}
    assert "id" in columns
    assert "experiment_id" in columns
    assert "config_hash" in columns
    assert "status" in columns
    assert "cost_estimate" in columns
 def test_indexes_created(alembic_cfg):
    """Verify key indexes exist after migration."""
    cfg, db_url = alembic_cfg
    command.upgrade(cfg, "head")
    engine = create_engine(db_url)
    inspector = inspect(engine)
    run_indexes = {idx["name"] for idx in inspector.get_indexes("runs")}
    assert "ix_runs_config_hash" in run_indexes
    assert "ix_runs_experiment_id" in run_indexes
    score_indexes = {idx["name"] for idx in inspector.get_indexes("scores")}
    assert "ix_scores_run_id" in score_indexes
    assert "ix_scores_scorer_name" in score_indexes
 def test_foreign_keys_on_experiments(alembic_cfg):
    """Verify experiments table has FK to projects."""
    cfg, db_url = alembic_cfg
    command.upgrade(cfg, "head")
    engine = create_engine(db_url)
    inspector = inspect(engine)
    fks = inspector.get_foreign_keys("experiments")
    referred_tables = {fk["referred_table"] for fk in fks}
    assert "projects" in referred_tables
--- a/backend/tests/test_auth.py
+++ b/backend/tests/test_auth.py
@ -0,0 +1,238 @@
 """Tests for backend/auth.py — JWT, API key, setup flow, and auth dependency."""
 import os
 from datetime import timedelta
 from unittest.mock import patch
 import pytest
 from fastapi import FastAPI, Depends
 from fastapi.testclient import TestClient
@pytest.fixture(autouse=True)
 def _isolate_settings(tmp_path):
    """Ensure tests use a temp SQLite DB and no Redis."""
    env = {
        "DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
        "REDIS_URL": "",
        "DATA_DIR": str(tmp_path),
        "JWT_SECRET": "test-secret-key-for-jwt-signing",
        "API_KEY": "test-api-key-12345",
    }
    with patch.dict(os.environ, env, clear=False):
        import config
        new_settings = config.Settings(_env_file=None)
        config.settings = new_settings
        import main
        main.settings = new_settings
        main._init_db()
        main._init_redis()
        from models import Base
        Base.metadata.create_all(bind=main.engine)
        # Also patch auth module's settings reference
        import auth
        auth.settings = new_settings
        yield
@pytest.fixture
 def db_session():
    from main import get_db
    gen = get_db()
    session = next(gen)
    yield session
    try:
        next(gen)
    except StopIteration:
        pass
 # ---------------------------------------------------------------------------
 # Password hashing
 # ---------------------------------------------------------------------------
 class TestPasswordHashing:
    def test_hash_and_verify(self):
        from auth import hash_password, verify_password
        hashed = hash_password("my-secret-password")
        assert hashed != "my-secret-password"
        assert verify_password("my-secret-password", hashed)
    def test_wrong_password_fails(self):
        from auth import hash_password, verify_password
        hashed = hash_password("correct-password")
        assert not verify_password("wrong-password", hashed)
 # ---------------------------------------------------------------------------
 # JWT
 # ---------------------------------------------------------------------------
 class TestJWT:
    def test_create_and_decode_token(self):
        from auth import create_access_token, decode_access_token
        token = create_access_token("user-123")
        assert decode_access_token(token) == "user-123"
    def test_expired_token_raises(self):
        from auth import create_access_token, decode_access_token
        token = create_access_token("user-123", expires_delta=timedelta(seconds=-1))
        with pytest.raises(Exception) as exc_info:
            decode_access_token(token)
        assert exc_info.value.status_code == 401
    def test_invalid_token_raises(self):
        from auth import decode_access_token
        with pytest.raises(Exception) as exc_info:
            decode_access_token("not-a-valid-token")
        assert exc_info.value.status_code == 401
    def test_token_without_sub_raises(self):
        from jose import jwt
        import config
        token = jwt.encode({"foo": "bar"}, config.settings.jwt_secret, algorithm="HS256")
        from auth import decode_access_token
        with pytest.raises(Exception) as exc_info:
            decode_access_token(token)
        assert exc_info.value.status_code == 401
 # ---------------------------------------------------------------------------
 # First-boot setup
 # ---------------------------------------------------------------------------
 class TestSetup:
    def test_needs_setup_true_when_no_users(self, db_session):
        from auth import needs_setup
        assert needs_setup(db_session) is True
    def test_create_admin_succeeds(self, db_session):
        from auth import create_admin, needs_setup
        user = create_admin(db_session, "admin", "password123")
        assert user.username == "admin"
        assert user.is_admin is True
        assert needs_setup(db_session) is False
    def test_create_admin_twice_raises_409(self, db_session):
        from auth import create_admin
        create_admin(db_session, "admin", "password123")
        with pytest.raises(Exception) as exc_info:
            create_admin(db_session, "admin2", "password456")
        assert exc_info.value.status_code == 409
    def test_admin_password_is_hashed(self, db_session):
        from auth import create_admin
        user = create_admin(db_session, "admin", "password123")
        assert user.password_hash != "password123"
        assert user.password_hash.startswith("$2b$")
 # ---------------------------------------------------------------------------
 # Authenticate user (login)
 # ---------------------------------------------------------------------------
 class TestAuthenticateUser:
    def test_valid_credentials(self, db_session):
        from auth import create_admin, authenticate_user
        create_admin(db_session, "admin", "password123")
        user = authenticate_user(db_session, "admin", "password123")
        assert user.username == "admin"
    def test_wrong_password_raises_401(self, db_session):
        from auth import create_admin, authenticate_user
        create_admin(db_session, "admin", "password123")
        with pytest.raises(Exception) as exc_info:
            authenticate_user(db_session, "admin", "wrong")
        assert exc_info.value.status_code == 401
    def test_unknown_user_raises_401(self, db_session):
        from auth import authenticate_user
        with pytest.raises(Exception) as exc_info:
            authenticate_user(db_session, "nonexistent", "password")
        assert exc_info.value.status_code == 401
 # ---------------------------------------------------------------------------
 # get_current_user dependency (integration via test app)
 # ---------------------------------------------------------------------------
@pytest.fixture
 def auth_app():
    """Create a minimal FastAPI app with a protected endpoint for testing auth."""
    from auth import get_current_user
    from schemas import UserResponse
    test_app = FastAPI()
    @test_app.get("/protected")
    def protected(user=Depends(get_current_user)):
        return {"user_id": str(user.id), "username": user.username}
    return test_app
@pytest.fixture
 def auth_client(auth_app):
    return TestClient(auth_app)
 class TestGetCurrentUser:
    def test_no_auth_returns_401(self, auth_client):
        resp = auth_client.get("/protected")
        assert resp.status_code == 401
        assert "Missing authentication" in resp.json()["detail"]
    def test_invalid_bearer_format_returns_401(self, auth_client):
        resp = auth_client.get("/protected", headers={"Authorization": "NotBearer token"})
        assert resp.status_code == 401
    def test_jwt_auth_succeeds(self, auth_client, db_session):
        from auth import create_admin, create_access_token
        user = create_admin(db_session, "admin", "password123")
        token = create_access_token(str(user.id))
        resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
        assert resp.status_code == 200
        assert resp.json()["username"] == "admin"
    def test_jwt_for_deleted_user_returns_401(self, auth_client, db_session):
        from auth import create_access_token
        import uuid
        token = create_access_token(str(uuid.uuid4()))
        resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
        assert resp.status_code == 401
    def test_api_key_auth_succeeds(self, auth_client, db_session):
        from auth import create_admin
        create_admin(db_session, "admin", "password123")
        resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
        assert resp.status_code == 200
        assert resp.json()["username"] == "admin"
    def test_wrong_api_key_returns_401(self, auth_client):
        resp = auth_client.get("/protected", headers={"X-Api-Key": "wrong-key"})
        assert resp.status_code == 401
    def test_api_key_without_admin_returns_401(self, auth_client):
        # No admin user created yet
        resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
        assert resp.status_code == 401
    def test_api_key_disabled_when_not_configured(self, auth_client, db_session):
        """When API_KEY is not set in config, API key auth should fail."""
        from auth import create_admin
        import config, auth
        create_admin(db_session, "admin", "password123")
        old_key = config.settings.api_key
        config.settings.api_key = None
        auth.settings = config.settings
        try:
            resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
            assert resp.status_code == 401
        finally:
            config.settings.api_key = old_key
            auth.settings = config.settings
--- a/backend/tests/test_config.py
+++ b/backend/tests/test_config.py
@ -0,0 +1,105 @@
 """Tests for backend/config.py."""
 import os
 from unittest.mock import patch
 import pytest
 from pydantic_settings import BaseSettings
 from config import Settings
 class TestSettings:
    """Test the Settings configuration class."""
    def _make_settings(self, **env_vars: str) -> Settings:
        """Create a Settings instance with specific env vars, ignoring .env file."""
        with patch.dict(os.environ, env_vars, clear=False):
            return Settings(_env_file=None)
    def test_defaults(self) -> None:
        s = self._make_settings()
        assert s.database_url is None
        assert s.redis_url is None
        assert s.host == "0.0.0.0"
        assert s.port == 8400
        assert s.api_key is None
        assert s.default_endpoint_url is None
        assert s.default_endpoint_key is None
        assert s.max_concurrent_runs == 4
        assert s.max_tokens_per_sweep == 0
        assert s.data_dir == "/data"
        assert s.mcp_enabled is True
        assert s.mcp_port == 8401
    def test_jwt_secret_auto_generated(self) -> None:
        s = self._make_settings()
        assert len(s.jwt_secret) > 0
    def test_jwt_secret_auto_generated_unique(self) -> None:
        s1 = self._make_settings()
        s2 = self._make_settings()
        assert s1.jwt_secret != s2.jwt_secret
    def test_jwt_secret_from_env(self) -> None:
        s = self._make_settings(JWT_SECRET="my-secret-key")
        assert s.jwt_secret == "my-secret-key"
    def test_sqlite_fallback_when_no_database_url(self) -> None:
        s = self._make_settings(DATA_DIR="/tmp/test")
        url = s.effective_database_url
        assert url.startswith("sqlite:///")
        assert url.endswith("promptlooper.db")
        assert "tmp" in url and "test" in url
        assert s.is_sqlite is True
    def test_postgres_when_database_url_set(self) -> None:
        url = "postgresql://user:pass@localhost:5432/promptlooper"
        s = self._make_settings(DATABASE_URL=url)
        assert s.effective_database_url == url
        assert s.is_sqlite is False
    def test_in_process_queue_when_no_redis(self) -> None:
        s = self._make_settings()
        assert s.use_in_process_queue is True
    def test_celery_queue_when_redis_set(self) -> None:
        s = self._make_settings(REDIS_URL="redis://localhost:6379/0")
        assert s.use_in_process_queue is False
        assert s.redis_url == "redis://localhost:6379/0"
    def test_empty_api_key_becomes_none(self) -> None:
        s = self._make_settings(API_KEY="")
        assert s.api_key is None
    def test_whitespace_api_key_becomes_none(self) -> None:
        s = self._make_settings(API_KEY="   ")
        assert s.api_key is None
    def test_valid_api_key_preserved(self) -> None:
        s = self._make_settings(API_KEY="sk-test-123")
        assert s.api_key == "sk-test-123"
    def test_env_overrides(self) -> None:
        s = self._make_settings(
            HOST="127.0.0.1",
            PORT="9000",
            MAX_CONCURRENT_RUNS="8",
            MAX_TOKENS_PER_SWEEP="100000",
            MCP_ENABLED="false",
            MCP_PORT="9001",
        )
        assert s.host == "127.0.0.1"
        assert s.port == 9000
        assert s.max_concurrent_runs == 8
        assert s.max_tokens_per_sweep == 100000
        assert s.mcp_enabled is False
        assert s.mcp_port == 9001
    def test_default_endpoint_config(self) -> None:
        s = self._make_settings(
            DEFAULT_ENDPOINT_URL="http://localhost:11434/v1",
            DEFAULT_ENDPOINT_KEY="sk-key",
        )
        assert s.default_endpoint_url == "http://localhost:11434/v1"
        assert s.default_endpoint_key == "sk-key"
--- a/backend/tests/test_main.py
+++ b/backend/tests/test_main.py
@ -0,0 +1,129 @@
 """Tests for backend/main.py — FastAPI application."""
 import os
 from unittest.mock import patch
 import pytest
 from fastapi.testclient import TestClient
@pytest.fixture(autouse=True)
 def _isolate_settings(tmp_path):
    """Ensure tests use a temp SQLite DB and no Redis."""
    env = {
        "DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
        "REDIS_URL": "",
        "DATA_DIR": str(tmp_path),
    }
    with patch.dict(os.environ, env, clear=False):
        # Reload settings so it picks up test env
        import config
        new_settings = config.Settings(_env_file=None)
        config.settings = new_settings
        # Patch main's reference too
        import main
        main.settings = new_settings
        main._init_db()
        main._init_redis()
        # Create tables
        from models import Base
        Base.metadata.create_all(bind=main.engine)
        yield
@pytest.fixture
 def client():
    from main import app
    return TestClient(app)
 class TestHealthEndpoint:
    def test_health_returns_ok(self, client):
        resp = client.get("/health")
        assert resp.status_code == 200
        data = resp.json()
        assert data["status"] == "ok"
        assert data["database"] is True
        assert data["redis"] is True  # in-process mode counts as ok
    def test_health_response_schema(self, client):
        resp = client.get("/health")
        data = resp.json()
        assert set(data.keys()) == {"status", "database", "redis"}
 class TestCORSMiddleware:
    def test_cors_headers_present(self, client):
        resp = client.options(
            "/health",
            headers={
                "Origin": "http://localhost:3000",
                "Access-Control-Request-Method": "GET",
            },
        )
        assert "access-control-allow-origin" in resp.headers
 class TestWebSocket:
    def test_websocket_connect_and_echo(self, client):
        with client.websocket_connect("/ws") as ws:
            ws.send_json({"type": "ping"})
            data = ws.receive_json()
            assert data["type"] == "ack"
            assert data["data"]["type"] == "ping"
    def test_websocket_disconnect_cleanup(self, client):
        from main import ws_manager
        initial_count = len(ws_manager.active_connections)
        with client.websocket_connect("/ws") as ws:
            assert len(ws_manager.active_connections) == initial_count + 1
        # After disconnect, connection should be removed
        assert len(ws_manager.active_connections) == initial_count
 class TestRouterMounting:
    def test_openapi_schema_loads(self, client):
        resp = client.get("/openapi.json")
        assert resp.status_code == 200
        schema = resp.json()
        assert schema["info"]["title"] == "PromptLooper"
    def test_unknown_route_returns_404(self, client):
        resp = client.get("/api/nonexistent")
        assert resp.status_code == 404
 class TestConnectionManager:
    def test_broadcast_removes_dead_connections(self):
        """ConnectionManager.broadcast skips and removes broken connections."""
        from main import ConnectionManager
        manager = ConnectionManager()
        # No connections — broadcast should not raise
        import asyncio
        asyncio.get_event_loop().run_until_complete(
            manager.broadcast({"test": True})
        )
        assert len(manager.active_connections) == 0
 class TestGetDb:
    def test_get_db_yields_session(self):
        from main import get_db
        gen = get_db()
        session = next(gen)
        assert session is not None
        # Clean up
        try:
            next(gen)
        except StopIteration:
            pass
 class TestGetRedis:
    def test_get_redis_returns_none_in_process_mode(self):
        from main import get_redis
        # In test setup, Redis is not configured
        assert get_redis() is None
--- a/backend/tests/test_models.py
+++ b/backend/tests/test_models.py
@ -0,0 +1,359 @@
 """Tests for SQLAlchemy ORM models."""
 import uuid
 from datetime import datetime, timezone
 from sqlalchemy import create_engine, inspect
 from sqlalchemy.orm import Session
 from models import (
    Base,
    Experiment,
    ExperimentStatus,
    Project,
    ResponseCache,
    Run,
    RunStatus,
    Score,
    StageResult,
    User,
    WebhookConfig,
 )
 def _engine():
    engine = create_engine("sqlite:///:memory:")
    Base.metadata.create_all(engine)
    return engine
 def _session(engine):
    return Session(engine)
 # ---------------------------------------------------------------------------
 # Table existence
 # ---------------------------------------------------------------------------
 def test_all_tables_created():
    engine = _engine()
    table_names = inspect(engine).get_table_names()
    expected = {
        "users",
        "projects",
        "experiments",
        "runs",
        "stage_results",
        "scores",
        "response_cache",
        "webhook_configs",
    }
    assert expected.issubset(set(table_names))
 # ---------------------------------------------------------------------------
 # User
 # ---------------------------------------------------------------------------
 def test_user_creation():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="admin", password_hash="hashed", is_admin=True)
        session.add(user)
        session.commit()
        assert isinstance(user.id, uuid.UUID)
        assert user.username == "admin"
        assert user.is_admin is True
        assert isinstance(user.created_at, datetime)
 def test_user_username_unique():
    engine = _engine()
    with _session(engine) as session:
        session.add(User(username="dup", password_hash="h1"))
        session.commit()
        session.add(User(username="dup", password_hash="h2"))
        try:
            session.commit()
            assert False, "Should have raised IntegrityError"
        except Exception:
            session.rollback()
 # ---------------------------------------------------------------------------
 # Project
 # ---------------------------------------------------------------------------
 def test_project_with_owner():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="owner", password_hash="h")
        project = Project(name="Test Project", description="A test", owner=user)
        session.add(project)
        session.commit()
        assert project.owner_id == user.id
        assert project.name == "Test Project"
        assert isinstance(project.updated_at, datetime)
 def test_project_cascade_delete_from_user():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="owner", password_hash="h")
        project = Project(name="P1", owner=user)
        session.add(project)
        session.commit()
        project_id = project.id
        session.delete(user)
        session.commit()
        assert session.get(Project, project_id) is None
 # ---------------------------------------------------------------------------
 # Experiment
 # ---------------------------------------------------------------------------
 def test_experiment_defaults():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(
            project=project,
            name="Exp1",
            sample_data={"inputs": ["hello"]},
            pipeline_stages=[{"prompt": "test"}],
            scoring_config={"scorers": ["keyword"]},
            parameter_space={"temperature": [0.1, 0.5]},
        )
        session.add(exp)
        session.commit()
        assert exp.status == ExperimentStatus.draft
        assert exp.sample_data == {"inputs": ["hello"]}
        assert isinstance(exp.created_at, datetime)
 def test_experiment_cascade_delete_from_project():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        session.add(exp)
        session.commit()
        exp_id = exp.id
        session.delete(project)
        session.commit()
        assert session.get(Experiment, exp_id) is None
 # ---------------------------------------------------------------------------
 # Run
 # ---------------------------------------------------------------------------
 def test_run_creation():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        run = Run(
            experiment=exp,
            config_hash="a" * 64,
            config={"model": "gpt-4", "temperature": 0.5},
            status=RunStatus.completed,
            duration_ms=1200,
            tokens_in=100,
            tokens_out=50,
        )
        session.add(run)
        session.commit()
        assert run.status == RunStatus.completed
        assert run.config["model"] == "gpt-4"
 def test_run_default_status():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        run = Run(experiment=exp, config_hash="b" * 64, config={})
        session.add(run)
        session.commit()
        assert run.status == RunStatus.pending
 # ---------------------------------------------------------------------------
 # StageResult
 # ---------------------------------------------------------------------------
 def test_stage_result():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        run = Run(experiment=exp, config_hash="c" * 64, config={})
        sr = StageResult(
            run=run,
            stage_index=0,
            prompt_sent="Hello",
            response_raw="World",
            model_used="gpt-4",
            parameters={"temperature": 0.5},
            tokens_in=10,
            tokens_out=5,
            latency_ms=200,
        )
        session.add(sr)
        session.commit()
        assert sr.stage_index == 0
        assert sr.model_used == "gpt-4"
        assert len(run.stage_results) == 1
 # ---------------------------------------------------------------------------
 # Score
 # ---------------------------------------------------------------------------
 def test_score():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        run = Run(experiment=exp, config_hash="d" * 64, config={})
        score = Score(
            run=run,
            scorer_name="embedding_similarity",
            value=0.87,
            scorer_metadata={"reference_id": "ref1"},
        )
        session.add(score)
        session.commit()
        assert score.value == 0.87
        assert score.scorer_name == "embedding_similarity"
        assert len(run.scores) == 1
 # ---------------------------------------------------------------------------
 # ResponseCache
 # ---------------------------------------------------------------------------
 def test_response_cache():
    engine = _engine()
    with _session(engine) as session:
        cache = ResponseCache(
            config_hash="e" * 64,
            response="cached response",
            model="gpt-4",
            tokens_in=50,
            tokens_out=25,
            latency_ms=300,
        )
        session.add(cache)
        session.commit()
        fetched = session.get(ResponseCache, "e" * 64)
        assert fetched is not None
        assert fetched.response == "cached response"
 def test_response_cache_pk_is_config_hash():
    engine = _engine()
    with _session(engine) as session:
        session.add(
            ResponseCache(config_hash="f" * 64, response="r1", model="m1")
        )
        session.commit()
        session.add(
            ResponseCache(config_hash="f" * 64, response="r2", model="m2")
        )
        try:
            session.commit()
            assert False, "Should have raised IntegrityError"
        except Exception:
            session.rollback()
 # ---------------------------------------------------------------------------
 # WebhookConfig
 # ---------------------------------------------------------------------------
 def test_webhook_config():
    engine = _engine()
    with _session(engine) as session:
        wh = WebhookConfig(
            event_type="experiment.completed",
            url="https://example.com/hook",
            headers={"Authorization": "Bearer token"},
            is_active=True,
        )
        session.add(wh)
        session.commit()
        assert isinstance(wh.id, uuid.UUID)
        assert wh.event_type == "experiment.completed"
        assert wh.is_active is True
 def test_webhook_config_default_active():
    engine = _engine()
    with _session(engine) as session:
        wh = WebhookConfig(
            event_type="run.failed",
            url="https://example.com/hook",
        )
        session.add(wh)
        session.commit()
        assert wh.is_active is True
 # ---------------------------------------------------------------------------
 # Relationship cascades: Run → StageResult + Score
 # ---------------------------------------------------------------------------
 def test_run_cascade_deletes_children():
    engine = _engine()
    with _session(engine) as session:
        user = User(username="u", password_hash="h")
        project = Project(name="P", owner=user)
        exp = Experiment(project=project, name="E")
        run = Run(experiment=exp, config_hash="g" * 64, config={})
        sr = StageResult(
            run=run, stage_index=0, prompt_sent="p",
            response_raw="r", model_used="m",
        )
        score = Score(run=run, scorer_name="test", value=0.5)
        session.add_all([run, sr, score])
        session.commit()
        sr_id, score_id = sr.id, score.id
        session.delete(run)
        session.commit()
        assert session.get(StageResult, sr_id) is None
        assert session.get(Score, score_id) is None
--- a/backend/tests/test_routers.py
+++ b/backend/tests/test_routers.py
@ -0,0 +1,224 @@
 """Tests for router stubs — verify all routes are mounted and return 501."""
 import pytest
 from fastapi.testclient import TestClient
@pytest.fixture()
 def client(tmp_path, monkeypatch):
    """Create a test client with a temporary database."""
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    monkeypatch.setenv("DATABASE_URL", "")
    monkeypatch.setenv("REDIS_URL", "")
    # Reload config to pick up test env
    import importlib
    import config as config_mod
    importlib.reload(config_mod)
    import main as main_mod
    importlib.reload(main_mod)
    with TestClient(main_mod.app) as c:
        yield c
 # ---- Auth router (/api/auth) ----
 def test_auth_setup(client):
    resp = client.post("/api/auth/setup")
    assert resp.status_code == 501
 def test_auth_login(client):
    resp = client.post("/api/auth/login")
    assert resp.status_code == 501
 def test_auth_me(client):
    resp = client.get("/api/auth/me")
    assert resp.status_code == 501
 # ---- Projects router (/api/projects) ----
 def test_projects_list(client):
    resp = client.get("/api/projects/")
    assert resp.status_code == 501
 def test_projects_create(client):
    resp = client.post("/api/projects/")
    assert resp.status_code == 501
 def test_projects_get(client):
    resp = client.get("/api/projects/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_projects_update(client):
    resp = client.put("/api/projects/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_projects_delete(client):
    resp = client.delete("/api/projects/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 # ---- Experiments router (/api/experiments) ----
 def test_experiments_list(client):
    resp = client.get("/api/experiments/")
    assert resp.status_code == 501
 def test_experiments_create(client):
    resp = client.post("/api/experiments/")
    assert resp.status_code == 501
 def test_experiments_get(client):
    resp = client.get("/api/experiments/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_experiments_update(client):
    resp = client.put("/api/experiments/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_experiments_delete(client):
    resp = client.delete("/api/experiments/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_experiments_sweep(client):
    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/sweep")
    assert resp.status_code == 501
 def test_experiments_pause(client):
    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/pause")
    assert resp.status_code == 501
 def test_experiments_resume(client):
    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/resume")
    assert resp.status_code == 501
 def test_experiments_stop(client):
    resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/stop")
    assert resp.status_code == 501
 # ---- Runs router (/api/runs) ----
 def test_runs_list(client):
    resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/runs")
    assert resp.status_code == 501
 def test_runs_get(client):
    resp = client.get("/api/runs/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_runs_create(client):
    resp = client.post("/api/runs/")
    assert resp.status_code == 501
 def test_runs_score(client):
    resp = client.post("/api/runs/00000000-0000-0000-0000-000000000001/score")
    assert resp.status_code == 501
 def test_runs_leaderboard(client):
    resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/leaderboard")
    assert resp.status_code == 501
 # ---- Endpoints router (/api/endpoints) ----
 def test_endpoints_list(client):
    resp = client.get("/api/endpoints/")
    assert resp.status_code == 501
 def test_endpoints_create(client):
    resp = client.post("/api/endpoints/")
    assert resp.status_code == 501
 def test_endpoints_update(client):
    resp = client.put("/api/endpoints/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_endpoints_delete(client):
    resp = client.delete("/api/endpoints/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 def test_endpoints_test(client):
    resp = client.post("/api/endpoints/00000000-0000-0000-0000-000000000001/test")
    assert resp.status_code == 501
 # ---- Export router (/api/export) ----
 def test_export_best(client):
    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/best")
    assert resp.status_code == 501
 def test_export_env(client):
    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/env")
    assert resp.status_code == 501
 def test_export_yaml(client):
    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/yaml")
    assert resp.status_code == 501
 def test_export_report(client):
    resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/report")
    assert resp.status_code == 501
 # ---- Webhooks router (/api/webhooks) ----
 def test_webhooks_list(client):
    resp = client.get("/api/webhooks/")
    assert resp.status_code == 501
 def test_webhooks_create(client):
    resp = client.post("/api/webhooks/")
    assert resp.status_code == 501
 def test_webhooks_delete(client):
    resp = client.delete("/api/webhooks/00000000-0000-0000-0000-000000000001")
    assert resp.status_code == 501
 # ---- Admin router (/api/admin) ----
 def test_admin_get_settings(client):
    resp = client.get("/api/admin/settings")
    assert resp.status_code == 501
 def test_admin_update_settings(client):
    resp = client.put("/api/admin/settings")
    assert resp.status_code == 501
 def test_admin_stats(client):
    resp = client.get("/api/admin/stats")
    assert resp.status_code == 501
--- a/backend/tests/test_schemas.py
+++ b/backend/tests/test_schemas.py
@ -0,0 +1,339 @@
 """Tests for backend/schemas.py."""
 import uuid
 from datetime import datetime, timezone
 import pytest
 from pydantic import ValidationError
 from models import ExperimentStatus, RunStatus
 from schemas import (
    EndpointCreate,
    EndpointResponse,
    EndpointUpdate,
    ExperimentCreate,
    ExperimentResponse,
    ExperimentUpdate,
    ExportResponse,
    ExportRunRow,
    HealthResponse,
    LoginRequest,
    ProjectCreate,
    ProjectResponse,
    ProjectUpdate,
    RunDetailResponse,
    RunResponse,
    ScoreInput,
    ScoreResponse,
    SetupRequest,
    StageResultResponse,
    TokenResponse,
    UserResponse,
    WebhookCreate,
    WebhookResponse,
    WebhookUpdate,
 )
 NOW = datetime.now(timezone.utc)
 UUID1 = uuid.uuid4()
 UUID2 = uuid.uuid4()
 # ---------------------------------------------------------------------------
 # Project schemas
 # ---------------------------------------------------------------------------
 class TestProjectSchemas:
    def test_create_valid(self) -> None:
        p = ProjectCreate(name="My Project", description="desc")
        assert p.name == "My Project"
        assert p.description == "desc"
    def test_create_name_required(self) -> None:
        with pytest.raises(ValidationError):
            ProjectCreate()  # type: ignore[call-arg]
    def test_create_empty_name_rejected(self) -> None:
        with pytest.raises(ValidationError):
            ProjectCreate(name="")
    def test_update_partial(self) -> None:
        p = ProjectUpdate(name="New Name")
        assert p.name == "New Name"
        assert p.description is None
    def test_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            name = "Proj"
            description = None
            owner_id = UUID2
            created_at = NOW
            updated_at = NOW
        r = ProjectResponse.model_validate(Fake())
        assert r.id == UUID1
        assert r.name == "Proj"
 # ---------------------------------------------------------------------------
 # Experiment schemas
 # ---------------------------------------------------------------------------
 class TestExperimentSchemas:
    def test_create_minimal(self) -> None:
        e = ExperimentCreate(name="Exp 1")
        assert e.name == "Exp 1"
        assert e.sample_data is None
    def test_create_with_all_fields(self) -> None:
        e = ExperimentCreate(
            name="Full",
            description="desc",
            sample_data={"key": "value"},
            pipeline_stages={"stages": []},
            scoring_config={"scorer": "exact"},
            parameter_space={"temp": [0.5, 1.0]},
        )
        assert e.parameter_space == {"temp": [0.5, 1.0]}
    def test_update_status(self) -> None:
        e = ExperimentUpdate(status=ExperimentStatus.running)
        assert e.status == ExperimentStatus.running
    def test_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            project_id = UUID2
            name = "Exp"
            description = None
            sample_data = None
            pipeline_stages = None
            scoring_config = None
            parameter_space = None
            status = ExperimentStatus.draft
            created_at = NOW
            updated_at = NOW
        r = ExperimentResponse.model_validate(Fake())
        assert r.status == ExperimentStatus.draft
 # ---------------------------------------------------------------------------
 # Run schemas
 # ---------------------------------------------------------------------------
 class TestRunSchemas:
    def test_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            experiment_id = UUID2
            config_hash = "abc123"
            config = {"model": "gpt-4"}
            status = RunStatus.completed
            started_at = NOW
            completed_at = NOW
            duration_ms = 1234
            tokens_in = 100
            tokens_out = 200
            cost_estimate = 0.003
        r = RunResponse.model_validate(Fake())
        assert r.duration_ms == 1234
        assert r.cost_estimate == 0.003
    def test_detail_response_nested(self) -> None:
        data = {
            "id": UUID1,
            "experiment_id": UUID2,
            "config_hash": "abc",
            "config": {},
            "status": RunStatus.pending,
            "started_at": None,
            "completed_at": None,
            "duration_ms": None,
            "tokens_in": None,
            "tokens_out": None,
            "cost_estimate": None,
            "stage_results": [],
            "scores": [],
        }
        r = RunDetailResponse(**data)
        assert r.stage_results == []
        assert r.scores == []
 # ---------------------------------------------------------------------------
 # Score schemas
 # ---------------------------------------------------------------------------
 class TestScoreSchemas:
    def test_input_valid(self) -> None:
        s = ScoreInput(scorer_name="exact_match", value=0.95, metadata={"note": "ok"})
        assert s.value == 0.95
        assert s.metadata == {"note": "ok"}
    def test_input_missing_name(self) -> None:
        with pytest.raises(ValidationError):
            ScoreInput(value=0.5)  # type: ignore[call-arg]
    def test_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            run_id = UUID2
            scorer_name = "bleu"
            value = 0.8
            scorer_metadata = {"n": 4}
            created_at = NOW
        r = ScoreResponse.model_validate(Fake())
        assert r.scorer_metadata == {"n": 4}
 # ---------------------------------------------------------------------------
 # Endpoint schemas
 # ---------------------------------------------------------------------------
 class TestEndpointSchemas:
    def test_create_valid(self) -> None:
        e = EndpointCreate(name="OpenAI", url="https://api.openai.com/v1")
        assert e.api_key is None
    def test_create_empty_name_rejected(self) -> None:
        with pytest.raises(ValidationError):
            EndpointCreate(name="", url="https://example.com")
    def test_update_partial(self) -> None:
        e = EndpointUpdate(url="https://new-url.com")
        assert e.name is None
 # ---------------------------------------------------------------------------
 # Webhook schemas
 # ---------------------------------------------------------------------------
 class TestWebhookSchemas:
    def test_create_valid(self) -> None:
        w = WebhookCreate(
            event_type="run.completed",
            url="https://hooks.example.com/promptlooper",
            headers={"Authorization": "Bearer xyz"},
        )
        assert w.is_active is True
    def test_create_inactive(self) -> None:
        w = WebhookCreate(
            event_type="run.failed",
            url="https://example.com",
            is_active=False,
        )
        assert w.is_active is False
    def test_update_partial(self) -> None:
        w = WebhookUpdate(is_active=False)
        assert w.event_type is None
        assert w.is_active is False
    def test_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            event_type = "run.completed"
            url = "https://example.com"
            headers = None
            is_active = True
        r = WebhookResponse.model_validate(Fake())
        assert r.event_type == "run.completed"
 # ---------------------------------------------------------------------------
 # Auth schemas
 # ---------------------------------------------------------------------------
 class TestAuthSchemas:
    def test_setup_password_min_length(self) -> None:
        with pytest.raises(ValidationError):
            SetupRequest(username="admin", password="short")
    def test_setup_valid(self) -> None:
        s = SetupRequest(username="admin", password="securepass123")
        assert s.username == "admin"
    def test_login_valid(self) -> None:
        l = LoginRequest(username="user", password="pass")
        assert l.username == "user"
    def test_token_response(self) -> None:
        t = TokenResponse(access_token="jwt.token.here")
        assert t.token_type == "bearer"
    def test_user_response_from_attributes(self) -> None:
        class Fake:
            id = UUID1
            username = "admin"
            is_admin = True
            created_at = NOW
        r = UserResponse.model_validate(Fake())
        assert r.is_admin is True
 # ---------------------------------------------------------------------------
 # Export schemas
 # ---------------------------------------------------------------------------
 class TestExportSchemas:
    def test_export_run_row(self) -> None:
        row = ExportRunRow(
            run_id=UUID1,
            experiment_id=UUID2,
            config_hash="abc",
            config={"model": "gpt-4"},
            status=RunStatus.completed,
            duration_ms=500,
            tokens_in=10,
            tokens_out=20,
            cost_estimate=0.001,
            scores={"exact_match": 1.0, "bleu": 0.85},
        )
        assert row.scores["bleu"] == 0.85
    def test_export_run_row_default_scores(self) -> None:
        row = ExportRunRow(
            run_id=UUID1,
            experiment_id=UUID2,
            config_hash="abc",
            config={},
            status=RunStatus.pending,
        )
        assert row.scores == {}
    def test_export_response(self) -> None:
        r = ExportResponse(
            experiment_id=UUID1,
            experiment_name="Test Exp",
            rows=[],
        )
        assert r.rows == []
 # ---------------------------------------------------------------------------
 # Health schema
 # ---------------------------------------------------------------------------
 class TestHealthSchema:
    def test_health_response(self) -> None:
        h = HealthResponse(database=True, redis=False)
        assert h.status == "ok"
        assert h.database is True
        assert h.redis is False
--- a/backend/tests/test_stack_integration.py
+++ b/backend/tests/test_stack_integration.py
@ -0,0 +1,138 @@
 """Stack integration verification tests.
 These tests verify that all configuration files needed for 'docker compose up'
 are present, consistent, and well-formed. They do NOT start actual containers.
 """
 import os
 from pathlib import Path
 import pytest
 ROOT = Path(__file__).resolve().parents[2]  # repo root
 class TestDockerComposeConfig:
    """Verify docker-compose.yml references are satisfied."""
    def test_docker_compose_exists(self):
        assert (ROOT / "docker-compose.yml").is_file()
    def test_dockerfile_exists(self):
        assert (ROOT / "docker" / "Dockerfile").is_file()
    def test_nginx_conf_exists(self):
        assert (ROOT / "docker" / "nginx.conf").is_file()
    def test_entrypoint_exists(self):
        assert (ROOT / "docker" / "entrypoint.sh").is_file()
    def test_requirements_txt_exists(self):
        assert (ROOT / "backend" / "requirements.txt").is_file()
    def test_alembic_ini_exists(self):
        assert (ROOT / "alembic.ini").is_file()
    def test_alembic_env_exists(self):
        assert (ROOT / "alembic" / "env.py").is_file()
    def test_alembic_has_migration(self):
        versions = list((ROOT / "alembic" / "versions").glob("*.py"))
        assert len(versions) >= 1, "Expected at least one Alembic migration"
 class TestDockerfileConsistency:
    """Verify Dockerfile references match actual files."""
    def test_dockerfile_copies_backend(self):
        content = (ROOT / "docker" / "Dockerfile").read_text()
        assert "COPY backend/" in content
    def test_dockerfile_copies_alembic(self):
        content = (ROOT / "docker" / "Dockerfile").read_text()
        assert "COPY alembic/" in content
        assert "COPY alembic.ini" in content
    def test_dockerfile_copies_entrypoint(self):
        content = (ROOT / "docker" / "Dockerfile").read_text()
        assert "entrypoint.sh" in content
    def test_dockerfile_runs_migrations_via_entrypoint(self):
        content = (ROOT / "docker" / "entrypoint.sh").read_text()
        assert "alembic upgrade head" in content
 class TestNginxConfig:
    """Verify nginx proxies correctly."""
    def test_nginx_proxies_api(self):
        content = (ROOT / "docker" / "nginx.conf").read_text()
        assert "proxy_pass http://promptlooper-api:8000" in content
    def test_nginx_proxies_websocket(self):
        content = (ROOT / "docker" / "nginx.conf").read_text()
        assert "upgrade" in content.lower()
    def test_nginx_serves_spa_fallback(self):
        content = (ROOT / "docker" / "nginx.conf").read_text()
        assert "try_files" in content
        assert "/index.html" in content
 class TestFrontendBuildability:
    """Verify frontend has all files needed for a build."""
    def test_package_json_exists(self):
        assert (ROOT / "frontend" / "package.json").is_file()
    def test_index_html_exists(self):
        assert (ROOT / "frontend" / "index.html").is_file()
    def test_main_tsx_exists(self):
        assert (ROOT / "frontend" / "src" / "main.tsx").is_file()
    def test_app_tsx_exists(self):
        assert (ROOT / "frontend" / "src" / "App.tsx").is_file()
    def test_all_page_components_exist(self):
        pages = [
            "SetupPage", "LoginPage", "DashboardPage", "ProjectsPage",
            "ExperimentPage", "LivePage", "ComparePage", "AdminPage",
        ]
        for page in pages:
            assert (ROOT / "frontend" / "src" / "pages" / f"{page}.tsx").is_file(), f"Missing {page}.tsx"
    def test_vite_config_exists(self):
        assert (ROOT / "frontend" / "vite.config.ts").is_file()
    def test_tailwind_config_exists(self):
        assert (ROOT / "frontend" / "tailwind.config.js").is_file()
 class TestWorkerConfig:
    """Verify Celery worker module exists and is importable."""
    def test_worker_module_exists(self):
        assert (ROOT / "backend" / "worker.py").is_file()
 class TestHealthEndpoint:
    """Verify /health endpoint works in test mode."""
    def test_health_returns_ok(self):
        from fastapi.testclient import TestClient
        # Ensure backend is importable
        import sys
        backend_dir = str(ROOT / "backend")
        if backend_dir not in sys.path:
            sys.path.insert(0, backend_dir)
        from main import app
        client = TestClient(app)
        resp = client.get("/health")
        assert resp.status_code == 200
        data = resp.json()
        assert data["status"] in ("ok", "degraded")
        assert "database" in data
        assert "redis" in data
--- a/backend/tests/test_worker.py
+++ b/backend/tests/test_worker.py
@ -0,0 +1,47 @@
 """Tests for backend/worker.py — Celery configuration."""
 import importlib
 import sys
 from unittest.mock import patch
 def test_celery_app_is_importable():
    """worker.py exports a celery_app instance."""
    # Need to ensure config module is importable
    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
    if backend_dir not in sys.path:
        sys.path.insert(0, backend_dir)
    import worker
    assert hasattr(worker, "celery_app")
    assert worker.celery_app.main == "promptlooper"
 def test_celery_app_serializer_settings():
    """Verify JSON serialization is configured."""
    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
    if backend_dir not in sys.path:
        sys.path.insert(0, backend_dir)
    import worker
    assert worker.celery_app.conf.task_serializer == "json"
    assert worker.celery_app.conf.result_serializer == "json"
 def test_celery_defaults_to_memory_broker_without_redis():
    """Without REDIS_URL, broker falls back to memory://."""
    backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
    if backend_dir not in sys.path:
        sys.path.insert(0, backend_dir)
    with patch.dict("os.environ", {"REDIS_URL": ""}, clear=False):
        # Force reload to pick up env change
        if "config" in sys.modules:
            importlib.reload(sys.modules["config"])
        if "worker" in sys.modules:
            importlib.reload(sys.modules["worker"])
        import worker
        # In no-redis mode, broker should be memory://
        # (may have been set from settings.redis_url == None)
        assert worker.celery_app is not None
--- a/backend/websocket/init.py
+++ b/backend/websocket/init.py
--- a/backend/worker.py
+++ b/backend/worker.py
@ -0,0 +1,30 @@
 """PromptLooper Celery worker configuration."""
 from celery import Celery
 from config import settings
 # Determine broker and backend URLs
 broker_url = settings.redis_url or "memory://"
 result_backend = settings.redis_url or "cache+memory://"
 celery_app = Celery(
    "promptlooper",
    broker=broker_url,
    backend=result_backend,
 )
 celery_app.conf.update(
    task_serializer="json",
    accept_content=["json"],
    result_serializer="json",
    timezone="UTC",
    enable_utc=True,
    worker_concurrency=settings.max_concurrent_runs,
    task_track_started=True,
    task_acks_late=True,
    worker_prefetch_multiplier=1,
 )
 # Auto-discover tasks in engine package
 celery_app.autodiscover_tasks(["engine"], force=True)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,108 @@
 name: xpltd_promptlooper
 networks:
  promptlooper:
    driver: bridge
    ipam:
      config:
        - subnet: 172.33.0.0/24
 services:
  promptlooper-db:
    image: postgres:16-alpine
    container_name: promptlooper-db
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "5434:5432"
    environment:
      POSTGRES_USER: promptlooper
      POSTGRES_PASSWORD: promptlooper
      POSTGRES_DB: promptlooper
    volumes:
      - /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U promptlooper"]
      interval: 10s
      timeout: 5s
      retries: 5
  promptlooper-redis:
    image: redis:7-alpine
    container_name: promptlooper-redis
    restart: unless-stopped
    networks:
      - promptlooper
    volumes:
      - /vmPool/r/services/promptlooper_redis:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
  promptlooper-api:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: api
    container_name: promptlooper-api
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "8401:8401"  # MCP server
    environment:
      DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
      REDIS_URL: redis://promptlooper-redis:6379/0
      JWT_SECRET: ${JWT_SECRET:-dev-secret-change-in-production}
      API_KEY: ${API_KEY:-}
      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
      MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
      MCP_ENABLED: ${MCP_ENABLED:-true}
      MCP_PORT: "8401"
    depends_on:
      promptlooper-db:
        condition: service_healthy
      promptlooper-redis:
        condition: service_healthy
  promptlooper-worker:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: api
    container_name: promptlooper-worker
    restart: unless-stopped
    networks:
      - promptlooper
    command: celery -A worker:celery_app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
    working_dir: /app/backend
    environment:
      DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
      REDIS_URL: redis://promptlooper-redis:6379/0
      DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
      DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
      MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
    depends_on:
      promptlooper-db:
        condition: service_healthy
      promptlooper-redis:
        condition: service_healthy
  promptlooper-web:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: web
    container_name: promptlooper-web
    restart: unless-stopped
    networks:
      - promptlooper
    ports:
      - "8400:80"
    depends_on:
      - promptlooper-api
--- a/docker/.gitkeep
+++ b/docker/.gitkeep
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@ -0,0 +1,67 @@
 # =============================================================================
 # Stage 1: Frontend build
 # =============================================================================
 FROM node:20-alpine AS frontend-build
 WORKDIR /build
 COPY frontend/package.json frontend/package-lock.json* ./
 RUN npm ci || npm install
 COPY frontend/ ./
 RUN npm run build
 # =============================================================================
 # Stage 2: Python API runtime
 # =============================================================================
 FROM python:3.12-slim AS api
 WORKDIR /app
 # Install system dependencies for psycopg2 and general use
 RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc libpq-dev curl && \
    rm -rf /var/lib/apt/lists/*
 # Install Python dependencies
 COPY backend/requirements.txt /app/backend/requirements.txt
 RUN pip install --no-cache-dir -r /app/backend/requirements.txt
 # Copy backend source
 COPY backend/ /app/backend/
 COPY alembic/ /app/alembic/
 COPY alembic.ini /app/alembic.ini
 # Copy frontend build for single-container mode
 COPY --from=frontend-build /build/dist /app/static
 # Create data directory for SQLite mode
 RUN mkdir -p /data
 ENV PYTHONPATH=/app/backend
 ENV DATA_DIR=/data
 # Entrypoint runs migrations then starts the app
 COPY docker/entrypoint.sh /app/entrypoint.sh
 RUN chmod +x /app/entrypoint.sh
 EXPOSE 8000 8401
 # Default: run migrations then start the API server
 ENTRYPOINT ["/app/entrypoint.sh"]
 # =============================================================================
 # Stage 3: Nginx frontend (production compose)
 # =============================================================================
 FROM nginx:1.27-alpine AS web
 # Remove default config
 RUN rm /etc/nginx/conf.d/default.conf
 # Copy custom nginx config
 COPY docker/nginx.conf /etc/nginx/conf.d/default.conf
 # Copy built frontend assets
 COPY --from=frontend-build /build/dist /usr/share/nginx/html
 EXPOSE 80
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@ -0,0 +1,10 @@
 #!/bin/sh
 set -e
 # Run database migrations
 echo "Running database migrations..."
 cd /app && alembic upgrade head
 # Start the application
 echo "Starting PromptLooper API..."
 exec uvicorn main:app --host 0.0.0.0 --port 8000 --app-dir /app/backend "$@"
--- a/docker/nginx.conf
+++ b/docker/nginx.conf
@ -0,0 +1,44 @@
 server {
    listen 80;
    server_name _;
    root /usr/share/nginx/html;
    index index.html;
    # Frontend static assets
    location / {
        try_files $uri $uri/ /index.html;
    }
    # API proxy
    location /api/ {
        proxy_pass http://promptlooper-api:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    # Health endpoint proxy
    location /health {
        proxy_pass http://promptlooper-api:8000;
        proxy_set_header Host $host;
    }
    # WebSocket proxy
    location /ws/ {
        proxy_pass http://promptlooper-api:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 86400;
    }
    # Gzip compression
    gzip on;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript;
    gzip_min_length 256;
 }
--- a/env.example
+++ b/env.example
@ -0,0 +1,23 @@
 # PromptLooper — Environment Configuration
 # Copy to .env and fill in required values
 # ── Database ──────────────────────────────────────────────
 POSTGRES_USER=promptlooper
 POSTGRES_PASSWORD=          # REQUIRED: set a strong password
 POSTGRES_DB=promptlooper
 # ── Auth ──────────────────────────────────────────────────
 JWT_SECRET=                 # REQUIRED: generate with `openssl rand -hex 32`
 # ── Default LLM Endpoint (optional) ──────────────────────
 # Pre-configure an LLM endpoint so users don't have to add one manually
 DEFAULT_ENDPOINT_URL=       # e.g. http://chat.forgetyour.name/api/v1
 DEFAULT_ENDPOINT_KEY=       # API key for the default endpoint
 # ── Limits ────────────────────────────────────────────────
 MAX_CONCURRENT_RUNS=4       # Parallel run limit per sweep
 MAX_TOKENS_PER_SWEEP=0      # 0 = unlimited; set a number to cap token spend
 # ── MCP Server ────────────────────────────────────────────
 MCP_ENABLED=true            # Enable/disable MCP server for agent access
 # MCP_PORT=8401             # MCP server port (set in docker-compose)
--- a/frontend/index.html
+++ b/frontend/index.html
@ -0,0 +1,12 @@
 <!doctype html>
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>PromptLooper</title>
  </head>
  <body>
    <div id="root"></div>
    <script type="module" src="/src/main.tsx"></script>
  </body>
 </html>
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
@ -0,0 +1,31 @@
 {
  "name": "promptlooper-frontend",
  "private": true,
  "version": "0.1.0",
  "type": "module",
  "scripts": {
    "dev": "vite",
    "build": "tsc && vite build",
    "preview": "vite preview",
    "test": "vitest run"
  },
  "dependencies": {
    "react": "^18.3.1",
    "react-dom": "^18.3.1",
    "react-router-dom": "^6.28.0"
  },
  "devDependencies": {
    "@testing-library/jest-dom": "^6.9.1",
    "@testing-library/react": "^16.3.2",
    "@types/react": "^18.3.12",
    "@types/react-dom": "^18.3.1",
    "@vitejs/plugin-react": "^4.3.4",
    "autoprefixer": "^10.4.20",
    "jsdom": "^29.0.2",
    "postcss": "^8.4.49",
    "tailwindcss": "^3.4.15",
    "typescript": "^5.6.3",
    "vite": "^6.0.0",
    "vitest": "^4.1.2"
  }
 }
--- a/frontend/postcss.config.js
+++ b/frontend/postcss.config.js
@ -0,0 +1,6 @@
 export default {
  plugins: {
    tailwindcss: {},
    autoprefixer: {},
  },
 };
--- a/frontend/src/App.test.tsx
+++ b/frontend/src/App.test.tsx
@ -0,0 +1,59 @@
 import { render, screen } from "@testing-library/react";
 import { MemoryRouter } from "react-router-dom";
 import { describe, it, expect } from "vitest";
 import App from "./App";
 function renderWithRouter(route: string) {
  return render(
    <MemoryRouter initialEntries={[route]}>
      <App />
    </MemoryRouter>,
  );
 }
 describe("App routing", () => {
  it("renders SetupPage at /setup", () => {
    renderWithRouter("/setup");
    expect(screen.getByText("PromptLooper Setup")).toBeInTheDocument();
  });
  it("renders LoginPage at /login", () => {
    renderWithRouter("/login");
    expect(screen.getByText("Sign In")).toBeInTheDocument();
  });
  it("renders DashboardPage at /", () => {
    renderWithRouter("/");
    expect(screen.getByText("Dashboard")).toBeInTheDocument();
  });
  it("renders ProjectsPage at /projects", () => {
    renderWithRouter("/projects");
    expect(screen.getByText("Projects")).toBeInTheDocument();
  });
  it("renders ExperimentPage at /experiments/:id", () => {
    renderWithRouter("/experiments/abc-123");
    expect(screen.getByText("Experiment")).toBeInTheDocument();
  });
  it("renders LivePage at /live/:id", () => {
    renderWithRouter("/live/abc-123");
    expect(screen.getByText("Live")).toBeInTheDocument();
  });
  it("renders ComparePage at /compare", () => {
    renderWithRouter("/compare");
    expect(screen.getByText("Compare")).toBeInTheDocument();
  });
  it("renders AdminPage at /admin", () => {
    renderWithRouter("/admin");
    expect(screen.getByText("Admin")).toBeInTheDocument();
  });
  it("redirects unknown routes to dashboard", () => {
    renderWithRouter("/nonexistent");
    expect(screen.getByText("Dashboard")).toBeInTheDocument();
  });
 });
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@ -0,0 +1,25 @@
 import { Routes, Route, Navigate } from "react-router-dom";
 import SetupPage from "./pages/SetupPage";
 import LoginPage from "./pages/LoginPage";
 import DashboardPage from "./pages/DashboardPage";
 import ProjectsPage from "./pages/ProjectsPage";
 import ExperimentPage from "./pages/ExperimentPage";
 import LivePage from "./pages/LivePage";
 import ComparePage from "./pages/ComparePage";
 import AdminPage from "./pages/AdminPage";
 export default function App() {
  return (
    <Routes>
      <Route path="/setup" element={<SetupPage />} />
      <Route path="/login" element={<LoginPage />} />
      <Route path="/" element={<DashboardPage />} />
      <Route path="/projects" element={<ProjectsPage />} />
      <Route path="/experiments/:id" element={<ExperimentPage />} />
      <Route path="/live/:id" element={<LivePage />} />
      <Route path="/compare" element={<ComparePage />} />
      <Route path="/admin" element={<AdminPage />} />
      <Route path="*" element={<Navigate to="/" replace />} />
    </Routes>
  );
 }
--- a/frontend/src/api/client.test.ts
+++ b/frontend/src/api/client.test.ts
@ -0,0 +1,552 @@
 import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
 import {
  setToken,
  getToken,
  clearToken,
  ApiError,
  auth,
  projects,
  experiments,
  runs,
  endpoints,
  exportApi,
  webhooks,
  admin,
  health,
  connectWebSocket,
 } from "./client";
 // ---------------------------------------------------------------------------
 // Mock fetch
 // ---------------------------------------------------------------------------
 const mockFetch = vi.fn();
 beforeEach(() => {
  mockFetch.mockReset();
  vi.stubGlobal("fetch", mockFetch);
  clearToken();
 });
 afterEach(() => {
  vi.restoreAllMocks();
 });
 function jsonResponse(body: unknown, status = 200): Response {
  return {
    ok: status >= 200 && status < 300,
    status,
    statusText: status === 200 ? "OK" : "Error",
    json: () => Promise.resolve(body),
    text: () => Promise.resolve(JSON.stringify(body)),
    headers: new Headers(),
  } as unknown as Response;
 }
 function noContentResponse(): Response {
  return {
    ok: true,
    status: 204,
    statusText: "No Content",
    json: () => Promise.reject(new Error("no body")),
    text: () => Promise.resolve(""),
    headers: new Headers(),
  } as unknown as Response;
 }
 // ---------------------------------------------------------------------------
 // Token management
 // ---------------------------------------------------------------------------
 describe("token management", () => {
  it("starts with null token", () => {
    expect(getToken()).toBeNull();
  });
  it("sets and gets token", () => {
    setToken("abc123");
    expect(getToken()).toBe("abc123");
  });
  it("clears token", () => {
    setToken("abc123");
    clearToken();
    expect(getToken()).toBeNull();
  });
 });
 // ---------------------------------------------------------------------------
 // Auth header injection
 // ---------------------------------------------------------------------------
 describe("auth header injection", () => {
  it("sends Authorization header when token is set", async () => {
    setToken("my-jwt");
    mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
    await health.check();
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect((init.headers as Record<string, string>)["Authorization"]).toBe(
      "Bearer my-jwt",
    );
  });
  it("omits Authorization header when no token", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
    await health.check();
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(
      (init.headers as Record<string, string>)["Authorization"],
    ).toBeUndefined();
  });
 });
 // ---------------------------------------------------------------------------
 // ApiError
 // ---------------------------------------------------------------------------
 describe("ApiError", () => {
  it("throws ApiError on non-ok response", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ detail: "not found" }, 404),
    );
    await expect(projects.get("some-id")).rejects.toThrow(ApiError);
    try {
      mockFetch.mockResolvedValueOnce(
        jsonResponse({ detail: "bad" }, 400),
      );
      await projects.get("some-id");
    } catch (e) {
      expect(e).toBeInstanceOf(ApiError);
      expect((e as ApiError).status).toBe(400);
    }
  });
 });
 // ---------------------------------------------------------------------------
 // Content-Type header
 // ---------------------------------------------------------------------------
 describe("content-type", () => {
  it("sets Content-Type for POST with body", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ access_token: "tok", token_type: "bearer" }),
    );
    await auth.setup({ username: "admin", password: "password123" });
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect((init.headers as Record<string, string>)["Content-Type"]).toBe(
      "application/json",
    );
  });
  it("omits Content-Type for GET requests", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await projects.list();
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(
      (init.headers as Record<string, string>)["Content-Type"],
    ).toBeUndefined();
  });
 });
 // ---------------------------------------------------------------------------
 // Health
 // ---------------------------------------------------------------------------
 describe("health", () => {
  it("calls /health", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ status: "ok", database: true, redis: true }),
    );
    const result = await health.check();
    expect(mockFetch).toHaveBeenCalledWith("/health", expect.anything());
    expect(result.status).toBe("ok");
  });
 });
 // ---------------------------------------------------------------------------
 // Auth endpoints
 // ---------------------------------------------------------------------------
 describe("auth", () => {
  it("setup POSTs to /api/auth/setup", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ access_token: "tok", token_type: "bearer" }),
    );
    const result = await auth.setup({
      username: "admin",
      password: "password123",
    });
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/auth/setup",
      expect.anything(),
    );
    expect(result.access_token).toBe("tok");
  });
  it("login sets token automatically", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ access_token: "jwt-123", token_type: "bearer" }),
    );
    await auth.login({ username: "admin", password: "pass" });
    expect(getToken()).toBe("jwt-123");
  });
  it("me GETs /api/auth/me", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({
        id: "u1",
        username: "admin",
        is_admin: true,
        created_at: "2026-01-01T00:00:00Z",
      }),
    );
    const user = await auth.me();
    expect(user.username).toBe("admin");
  });
  it("logout clears token", () => {
    setToken("tok");
    auth.logout();
    expect(getToken()).toBeNull();
  });
 });
 // ---------------------------------------------------------------------------
 // Projects
 // ---------------------------------------------------------------------------
 describe("projects", () => {
  it("list GETs /api/projects/", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await projects.list();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/projects/",
      expect.anything(),
    );
  });
  it("create POSTs to /api/projects/", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ id: "p1", name: "Test" }),
    );
    await projects.create({ name: "Test" });
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(init.method).toBe("POST");
    expect(JSON.parse(init.body as string)).toEqual({ name: "Test" });
  });
  it("get fetches by id", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
    await projects.get("p1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/projects/p1",
      expect.anything(),
    );
  });
  it("update PUTs by id", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
    await projects.update("p1", { name: "New" });
    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(url).toBe("/api/projects/p1");
    expect(init.method).toBe("PUT");
  });
  it("delete DELETEs by id", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await projects.delete("p1");
    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(url).toBe("/api/projects/p1");
    expect(init.method).toBe("DELETE");
  });
 });
 // ---------------------------------------------------------------------------
 // Experiments
 // ---------------------------------------------------------------------------
 describe("experiments", () => {
  it("list GETs /api/experiments/", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await experiments.list();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/experiments/",
      expect.anything(),
    );
  });
  it("startSweep POSTs to sweep endpoint", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await experiments.startSweep("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/experiments/e1/sweep",
      expect.anything(),
    );
  });
  it("pause POSTs to pause endpoint", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await experiments.pause("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/experiments/e1/pause",
      expect.anything(),
    );
  });
  it("resume POSTs to resume endpoint", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await experiments.resume("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/experiments/e1/resume",
      expect.anything(),
    );
  });
  it("stop POSTs to stop endpoint", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await experiments.stop("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/experiments/e1/stop",
      expect.anything(),
    );
  });
 });
 // ---------------------------------------------------------------------------
 // Runs
 // ---------------------------------------------------------------------------
 describe("runs", () => {
  it("list GETs runs for experiment", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await runs.list("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/runs/experiments/e1/runs",
      expect.anything(),
    );
  });
  it("get fetches run detail", async () => {
    mockFetch.mockResolvedValueOnce(
      jsonResponse({ id: "r1", stage_results: [], scores: [] }),
    );
    await runs.get("r1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/runs/r1",
      expect.anything(),
    );
  });
  it("score POSTs to run score endpoint", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "s1" }));
    await runs.score("r1", { scorer_name: "human", value: 0.9 });
    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(url).toBe("/api/runs/r1/score");
    expect(init.method).toBe("POST");
  });
  it("leaderboard GETs leaderboard", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await runs.leaderboard("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/runs/experiments/e1/leaderboard",
      expect.anything(),
    );
  });
 });
 // ---------------------------------------------------------------------------
 // Endpoints
 // ---------------------------------------------------------------------------
 describe("endpoints", () => {
  it("list GETs /api/endpoints/", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await endpoints.list();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/endpoints/",
      expect.anything(),
    );
  });
  it("test POSTs to test endpoint", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ models: [] }));
    await endpoints.test("ep1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/endpoints/ep1/test",
      expect.anything(),
    );
  });
 });
 // ---------------------------------------------------------------------------
 // Export
 // ---------------------------------------------------------------------------
 describe("exportApi", () => {
  it("best GETs best config", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({}));
    await exportApi.best("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/export/experiments/e1/best",
      expect.anything(),
    );
  });
  it("env GETs env export", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse("KEY=val"));
    await exportApi.env("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/export/experiments/e1/env",
      expect.anything(),
    );
  });
  it("report GETs report", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse("# Report"));
    await exportApi.report("e1");
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/export/experiments/e1/report",
      expect.anything(),
    );
  });
 });
 // ---------------------------------------------------------------------------
 // Webhooks
 // ---------------------------------------------------------------------------
 describe("webhooks", () => {
  it("list GETs /api/webhooks/", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
    await webhooks.list();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/webhooks/",
      expect.anything(),
    );
  });
  it("create POSTs webhook", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({ id: "w1" }));
    await webhooks.create({ event_type: "run.complete", url: "http://x" });
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(init.method).toBe("POST");
  });
  it("delete DELETEs webhook", async () => {
    mockFetch.mockResolvedValueOnce(noContentResponse());
    await webhooks.delete("w1");
    const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(url).toBe("/api/webhooks/w1");
    expect(init.method).toBe("DELETE");
  });
 });
 // ---------------------------------------------------------------------------
 // Admin
 // ---------------------------------------------------------------------------
 describe("admin", () => {
  it("getSettings GETs /api/admin/settings", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({}));
    await admin.getSettings();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/admin/settings",
      expect.anything(),
    );
  });
  it("updateSettings PUTs /api/admin/settings", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({}));
    await admin.updateSettings({ guest_access: true });
    const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
    expect(init.method).toBe("PUT");
  });
  it("getStats GETs /api/admin/stats", async () => {
    mockFetch.mockResolvedValueOnce(jsonResponse({}));
    await admin.getStats();
    expect(mockFetch).toHaveBeenCalledWith(
      "/api/admin/stats",
      expect.anything(),
    );
  });
 });
 // ---------------------------------------------------------------------------
 // WebSocket helper
 // ---------------------------------------------------------------------------
 describe("connectWebSocket", () => {
  it("creates WebSocket with correct URL and handles messages", () => {
    const sendSpy = vi.fn();
    const closeSpy = vi.fn();
    let capturedInstance: {
      onmessage: ((ev: { data: string }) => void) | null;
      onclose: (() => void) | null;
      readyState: number;
    };
    // Use a class constructor so `new WebSocket(...)` works
    class MockWebSocket {
      static OPEN = 1;
      readyState = 1;
      onmessage: ((ev: { data: string }) => void) | null = null;
      onclose: (() => void) | null = null;
      send = sendSpy;
      close = closeSpy;
      constructor(public url: string) {
        capturedInstance = this;
      }
    }
    vi.stubGlobal("WebSocket", MockWebSocket);
    Object.defineProperty(window, "location", {
      value: { protocol: "http:", host: "localhost:5173" },
      writable: true,
      configurable: true,
    });
    const onMessage = vi.fn();
    const onClose = vi.fn();
    const conn = connectWebSocket(onMessage, onClose);
    expect(capturedInstance!.url).toBe("ws://localhost:5173/ws");
    // Simulate incoming message
    capturedInstance!.onmessage!({ data: JSON.stringify({ type: "update" }) });
    expect(onMessage).toHaveBeenCalledWith({ type: "update" });
    // Send message
    conn.send({ type: "ping" });
    expect(sendSpy).toHaveBeenCalledWith('{"type":"ping"}');
    // Simulate close
    capturedInstance!.onclose!();
    expect(onClose).toHaveBeenCalled();
    // Close from client
    conn.close();
    expect(closeSpy).toHaveBeenCalled();
    vi.unstubAllGlobals();
  });
 });
--- a/frontend/src/api/client.ts
+++ b/frontend/src/api/client.ts
@ -0,0 +1,545 @@
 /**
 * PromptLooper typed API client.
 *
 * - JWT token stored in memory (never localStorage) for security.
 * - Automatic Authorization header injection.
 * - Typed wrapper functions for every API endpoint group.
 * - WebSocket connection helper for real-time updates.
 */
 // ---------------------------------------------------------------------------
 // Types — mirrors backend Pydantic schemas
 // ---------------------------------------------------------------------------
 export interface ProjectCreate {
  name: string;
  description?: string | null;
 }
 export interface ProjectUpdate {
  name?: string | null;
  description?: string | null;
 }
 export interface ProjectResponse {
  id: string;
  name: string;
  description: string | null;
  owner_id: string;
  created_at: string;
  updated_at: string;
 }
 export interface ProjectListResponse {
  items: ProjectResponse[];
  total: number;
 }
 export interface ExperimentCreate {
  name: string;
  description?: string | null;
  sample_data?: Record<string, unknown> | null;
  pipeline_stages?: Record<string, unknown> | null;
  scoring_config?: Record<string, unknown> | null;
  parameter_space?: Record<string, unknown> | null;
 }
 export interface ExperimentUpdate {
  name?: string | null;
  description?: string | null;
  sample_data?: Record<string, unknown> | null;
  pipeline_stages?: Record<string, unknown> | null;
  scoring_config?: Record<string, unknown> | null;
  parameter_space?: Record<string, unknown> | null;
  status?: string | null;
 }
 export interface ExperimentResponse {
  id: string;
  project_id: string;
  name: string;
  description: string | null;
  sample_data: Record<string, unknown> | null;
  pipeline_stages: Record<string, unknown> | null;
  scoring_config: Record<string, unknown> | null;
  parameter_space: Record<string, unknown> | null;
  status: string;
  created_at: string;
  updated_at: string;
 }
 export interface ExperimentListResponse {
  items: ExperimentResponse[];
  total: number;
 }
 export interface RunResponse {
  id: string;
  experiment_id: string;
  config_hash: string;
  config: Record<string, unknown>;
  status: string;
  started_at: string | null;
  completed_at: string | null;
  duration_ms: number | null;
  tokens_in: number | null;
  tokens_out: number | null;
  cost_estimate: number | null;
 }
 export interface RunListResponse {
  items: RunResponse[];
  total: number;
 }
 export interface StageResultResponse {
  id: string;
  run_id: string;
  stage_index: number;
  prompt_sent: string;
  response_raw: string;
  model_used: string;
  parameters: Record<string, unknown> | null;
  tokens_in: number | null;
  tokens_out: number | null;
  latency_ms: number | null;
 }
 export interface ScoreResponse {
  id: string;
  run_id: string;
  scorer_name: string;
  value: number;
  scorer_metadata: Record<string, unknown> | null;
  created_at: string;
 }
 export interface RunDetailResponse extends RunResponse {
  stage_results: StageResultResponse[];
  scores: ScoreResponse[];
 }
 export interface ScoreInput {
  scorer_name: string;
  value: number;
  metadata?: Record<string, unknown> | null;
 }
 export interface EndpointCreate {
  name: string;
  url: string;
  api_key?: string | null;
  default_model?: string | null;
 }
 export interface EndpointUpdate {
  name?: string | null;
  url?: string | null;
  api_key?: string | null;
  default_model?: string | null;
 }
 export interface EndpointResponse {
  id: string;
  name: string;
  url: string;
  default_model: string | null;
 }
 export interface EndpointListResponse {
  items: EndpointResponse[];
  total: number;
 }
 export interface WebhookCreate {
  event_type: string;
  url: string;
  headers?: Record<string, string> | null;
  is_active?: boolean;
 }
 export interface WebhookUpdate {
  event_type?: string | null;
  url?: string | null;
  headers?: Record<string, string> | null;
  is_active?: boolean | null;
 }
 export interface WebhookResponse {
  id: string;
  event_type: string;
  url: string;
  headers: Record<string, string> | null;
  is_active: boolean;
 }
 export interface WebhookListResponse {
  items: WebhookResponse[];
  total: number;
 }
 export interface SetupRequest {
  username: string;
  password: string;
 }
 export interface LoginRequest {
  username: string;
  password: string;
 }
 export interface TokenResponse {
  access_token: string;
  token_type: string;
 }
 export interface UserResponse {
  id: string;
  username: string;
  is_admin: boolean;
  created_at: string;
 }
 export interface HealthResponse {
  status: string;
  database: boolean;
  redis: boolean;
 }
 export interface ExportRunRow {
  run_id: string;
  experiment_id: string;
  config_hash: string;
  config: Record<string, unknown>;
  status: string;
  duration_ms: number | null;
  tokens_in: number | null;
  tokens_out: number | null;
  cost_estimate: number | null;
  scores: Record<string, number>;
 }
 export interface ExportResponse {
  experiment_id: string;
  experiment_name: string;
  rows: ExportRunRow[];
 }
 // ---------------------------------------------------------------------------
 // API Error
 // ---------------------------------------------------------------------------
 export class ApiError extends Error {
  constructor(
    public status: number,
    public statusText: string,
    public body: unknown,
  ) {
    super(`API ${status}: ${statusText}`);
    this.name = "ApiError";
  }
 }
 // ---------------------------------------------------------------------------
 // Token management (in-memory only)
 // ---------------------------------------------------------------------------
 let _accessToken: string | null = null;
 export function setToken(token: string | null): void {
  _accessToken = token;
 }
 export function getToken(): string | null {
  return _accessToken;
 }
 export function clearToken(): void {
  _accessToken = null;
 }
 // ---------------------------------------------------------------------------
 // Base fetch wrapper
 // ---------------------------------------------------------------------------
 const BASE_URL = ""; // Uses Vite proxy in dev; same origin in prod
 async function request<T>(
  path: string,
  options: RequestInit = {},
 ): Promise<T> {
  const headers: Record<string, string> = {
    ...(options.headers as Record<string, string> | undefined),
  };
  // Inject auth header
  if (_accessToken) {
    headers["Authorization"] = `Bearer ${_accessToken}`;
  }
  // Default content-type for requests with bodies
  if (options.body && !headers["Content-Type"]) {
    headers["Content-Type"] = "application/json";
  }
  const response = await fetch(`${BASE_URL}${path}`, {
    ...options,
    headers,
  });
  if (!response.ok) {
    let body: unknown;
    try {
      body = await response.json();
    } catch {
      body = await response.text();
    }
    throw new ApiError(response.status, response.statusText, body);
  }
  // 204 No Content
  if (response.status === 204) {
    return undefined as T;
  }
  return response.json() as Promise<T>;
 }
 function get<T>(path: string): Promise<T> {
  return request<T>(path, { method: "GET" });
 }
 function post<T>(path: string, body?: unknown): Promise<T> {
  return request<T>(path, {
    method: "POST",
    body: body != null ? JSON.stringify(body) : undefined,
  });
 }
 function put<T>(path: string, body?: unknown): Promise<T> {
  return request<T>(path, {
    method: "PUT",
    body: body != null ? JSON.stringify(body) : undefined,
  });
 }
 function del<T>(path: string): Promise<T> {
  return request<T>(path, { method: "DELETE" });
 }
 // ---------------------------------------------------------------------------
 // Health
 // ---------------------------------------------------------------------------
 export const health = {
  check: () => get<HealthResponse>("/health"),
 };
 // ---------------------------------------------------------------------------
 // Auth
 // ---------------------------------------------------------------------------
 export const auth = {
  setup: (data: SetupRequest) =>
    post<TokenResponse>("/api/auth/setup", data),
  login: async (data: LoginRequest): Promise<TokenResponse> => {
    const resp = await post<TokenResponse>("/api/auth/login", data);
    setToken(resp.access_token);
    return resp;
  },
  me: () => get<UserResponse>("/api/auth/me"),
  logout: () => {
    clearToken();
  },
 };
 // ---------------------------------------------------------------------------
 // Projects
 // ---------------------------------------------------------------------------
 export const projects = {
  list: () => get<ProjectListResponse>("/api/projects/"),
  create: (data: ProjectCreate) =>
    post<ProjectResponse>("/api/projects/", data),
  get: (id: string) => get<ProjectResponse>(`/api/projects/${id}`),
  update: (id: string, data: ProjectUpdate) =>
    put<ProjectResponse>(`/api/projects/${id}`, data),
  delete: (id: string) => del<void>(`/api/projects/${id}`),
 };
 // ---------------------------------------------------------------------------
 // Experiments
 // ---------------------------------------------------------------------------
 export const experiments = {
  list: () => get<ExperimentListResponse>("/api/experiments/"),
  create: (data: ExperimentCreate) =>
    post<ExperimentResponse>("/api/experiments/", data),
  get: (id: string) => get<ExperimentResponse>(`/api/experiments/${id}`),
  update: (id: string, data: ExperimentUpdate) =>
    put<ExperimentResponse>(`/api/experiments/${id}`, data),
  delete: (id: string) => del<void>(`/api/experiments/${id}`),
  startSweep: (id: string) =>
    post<void>(`/api/experiments/${id}/sweep`),
  pause: (id: string) =>
    post<void>(`/api/experiments/${id}/pause`),
  resume: (id: string) =>
    post<void>(`/api/experiments/${id}/resume`),
  stop: (id: string) =>
    post<void>(`/api/experiments/${id}/stop`),
 };
 // ---------------------------------------------------------------------------
 // Runs
 // ---------------------------------------------------------------------------
 export const runs = {
  list: (experimentId: string) =>
    get<RunListResponse>(`/api/runs/experiments/${experimentId}/runs`),
  get: (runId: string) =>
    get<RunDetailResponse>(`/api/runs/${runId}`),
  create: (data: Record<string, unknown>) =>
    post<RunResponse>("/api/runs/", data),
  score: (runId: string, data: ScoreInput) =>
    post<ScoreResponse>(`/api/runs/${runId}/score`, data),
  leaderboard: (experimentId: string) =>
    get<RunListResponse>(
      `/api/runs/experiments/${experimentId}/leaderboard`,
    ),
 };
 // ---------------------------------------------------------------------------
 // Endpoints (LLM targets)
 // ---------------------------------------------------------------------------
 export const endpoints = {
  list: () => get<EndpointListResponse>("/api/endpoints/"),
  create: (data: EndpointCreate) =>
    post<EndpointResponse>("/api/endpoints/", data),
  update: (id: string, data: EndpointUpdate) =>
    put<EndpointResponse>(`/api/endpoints/${id}`, data),
  delete: (id: string) => del<void>(`/api/endpoints/${id}`),
  test: (id: string) =>
    post<Record<string, unknown>>(`/api/endpoints/${id}/test`),
 };
 // ---------------------------------------------------------------------------
 // Export
 // ---------------------------------------------------------------------------
 export const exportApi = {
  best: (experimentId: string) =>
    get<Record<string, unknown>>(
      `/api/export/experiments/${experimentId}/best`,
    ),
  env: (experimentId: string) =>
    get<string>(`/api/export/experiments/${experimentId}/env`),
  yaml: (experimentId: string) =>
    get<string>(`/api/export/experiments/${experimentId}/yaml`),
  report: (experimentId: string) =>
    get<string>(`/api/export/experiments/${experimentId}/report`),
 };
 // ---------------------------------------------------------------------------
 // Webhooks
 // ---------------------------------------------------------------------------
 export const webhooks = {
  list: () => get<WebhookListResponse>("/api/webhooks/"),
  create: (data: WebhookCreate) =>
    post<WebhookResponse>("/api/webhooks/", data),
  delete: (id: string) => del<void>(`/api/webhooks/${id}`),
 };
 // ---------------------------------------------------------------------------
 // Admin
 // ---------------------------------------------------------------------------
 export const admin = {
  getSettings: () =>
    get<Record<string, unknown>>("/api/admin/settings"),
  updateSettings: (data: Record<string, unknown>) =>
    put<Record<string, unknown>>("/api/admin/settings", data),
  getStats: () => get<Record<string, unknown>>("/api/admin/stats"),
 };
 // ---------------------------------------------------------------------------
 // WebSocket helper
 // ---------------------------------------------------------------------------
 export type WsMessageHandler = (data: unknown) => void;
 export interface WsConnection {
  send: (data: unknown) => void;
  close: () => void;
 }
 /**
 * Connect to the real-time WebSocket endpoint.
 *
 * @param onMessage  Called for each incoming message.
 * @param onClose    Optional callback when connection closes.
 * @returns Object with `send()` and `close()` methods.
 */
 export function connectWebSocket(
  onMessage: WsMessageHandler,
  onClose?: () => void,
 ): WsConnection {
  const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
  const wsUrl = `${protocol}//${window.location.host}/ws`;
  const ws = new WebSocket(wsUrl);
  ws.onmessage = (event) => {
    try {
      const data: unknown = JSON.parse(event.data as string);
      onMessage(data);
    } catch {
      onMessage(event.data);
    }
  };
  ws.onclose = () => {
    onClose?.();
  };
  return {
    send: (data: unknown) => {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(data));
      }
    },
    close: () => {
      ws.close();
    },
  };
 }
--- a/frontend/src/components/.gitkeep
+++ b/frontend/src/components/.gitkeep
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
@ -0,0 +1,3 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
@ -0,0 +1,13 @@
 import React from "react";
 import ReactDOM from "react-dom/client";
 import { BrowserRouter } from "react-router-dom";
 import App from "./App";
 import "./index.css";
 ReactDOM.createRoot(document.getElementById("root")!).render(
  <React.StrictMode>
    <BrowserRouter>
      <App />
    </BrowserRouter>
  </React.StrictMode>,
 );
--- a/frontend/src/pages/AdminPage.tsx
+++ b/frontend/src/pages/AdminPage.tsx
@ -0,0 +1,8 @@
 export default function AdminPage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Admin</h1>
      <p className="text-gray-600">System administration and user management.</p>
    </div>
  );
 }
--- a/frontend/src/pages/ComparePage.tsx
+++ b/frontend/src/pages/ComparePage.tsx
@ -0,0 +1,8 @@
 export default function ComparePage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Compare</h1>
      <p className="text-gray-600">Compare results across runs and experiments.</p>
    </div>
  );
 }
--- a/frontend/src/pages/DashboardPage.tsx
+++ b/frontend/src/pages/DashboardPage.tsx
@ -0,0 +1,8 @@
 export default function DashboardPage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Dashboard</h1>
      <p className="text-gray-600">Overview of recent experiments and runs.</p>
    </div>
  );
 }
--- a/frontend/src/pages/ExperimentPage.tsx
+++ b/frontend/src/pages/ExperimentPage.tsx
@ -0,0 +1,8 @@
 export default function ExperimentPage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Experiment</h1>
      <p className="text-gray-600">Configure and run prompt experiments.</p>
    </div>
  );
 }
--- a/frontend/src/pages/LivePage.tsx
+++ b/frontend/src/pages/LivePage.tsx
@ -0,0 +1,8 @@
 export default function LivePage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Live</h1>
      <p className="text-gray-600">Real-time experiment progress and results.</p>
    </div>
  );
 }
--- a/frontend/src/pages/LoginPage.tsx
+++ b/frontend/src/pages/LoginPage.tsx
@ -0,0 +1,10 @@
 export default function LoginPage() {
  return (
    <div className="flex min-h-screen items-center justify-center bg-gray-50">
      <div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
        <h1 className="mb-4 text-2xl font-bold">Sign In</h1>
        <p className="text-gray-600">Log in to PromptLooper.</p>
      </div>
    </div>
  );
 }
--- a/frontend/src/pages/ProjectsPage.tsx
+++ b/frontend/src/pages/ProjectsPage.tsx
@ -0,0 +1,8 @@
 export default function ProjectsPage() {
  return (
    <div className="p-8">
      <h1 className="mb-4 text-2xl font-bold">Projects</h1>
      <p className="text-gray-600">Manage your prompt tuning projects.</p>
    </div>
  );
 }
--- a/frontend/src/pages/SetupPage.tsx
+++ b/frontend/src/pages/SetupPage.tsx
@ -0,0 +1,10 @@
 export default function SetupPage() {
  return (
    <div className="flex min-h-screen items-center justify-center bg-gray-50">
      <div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
        <h1 className="mb-4 text-2xl font-bold">PromptLooper Setup</h1>
        <p className="text-gray-600">Create your admin account to get started.</p>
      </div>
    </div>
  );
 }
--- a/frontend/src/test-setup.ts
+++ b/frontend/src/test-setup.ts
@ -0,0 +1 @@
 import "@testing-library/jest-dom/vitest";
--- a/frontend/src/vite-env.d.ts
+++ b/frontend/src/vite-env.d.ts
@ -0,0 +1 @@
 /// <reference types="vite/client" />
--- a/frontend/tailwind.config.js
+++ b/frontend/tailwind.config.js
@ -0,0 +1,8 @@
 /** @type {import('tailwindcss').Config} */
 export default {
  content: ["./index.html", "./src/**/*.{js,ts,jsx,tsx}"],
  theme: {
    extend: {},
  },
  plugins: [],
 };
--- a/frontend/tsconfig.json
+++ b/frontend/tsconfig.json
@ -0,0 +1,21 @@
 {
  "compilerOptions": {
    "target": "ES2020",
    "useDefineForClassFields": true,
    "lib": ["ES2020", "DOM", "DOM.Iterable"],
    "module": "ESNext",
    "skipLibCheck": true,
    "moduleResolution": "bundler",
    "allowImportingTsExtensions": true,
    "isolatedModules": true,
    "moduleDetection": "force",
    "noEmit": true,
    "jsx": "react-jsx",
    "strict": true,
    "noUnusedLocals": true,
    "noUnusedParameters": true,
    "noFallthroughCasesInSwitch": true,
    "forceConsistentCasingInFileNames": true
  },
  "include": ["src"]
 }
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@ -0,0 +1,25 @@
 import { defineConfig } from "vite";
 import react from "@vitejs/plugin-react";
 export default defineConfig({
  plugins: [react()],
  build: {
    outDir: "dist",
  },
  server: {
    port: 5173,
    proxy: {
      "/api": "http://localhost:8000",
      "/ws": {
        target: "ws://localhost:8000",
        ws: true,
      },
      "/health": "http://localhost:8000",
    },
  },
  test: {
    environment: "jsdom",
    globals: true,
    setupFiles: ["./src/test-setup.ts"],
  },
 });
--- a/promptlooper-spec.md
+++ b/promptlooper-spec.md
@ -0,0 +1,635 @@
 # PromptLooper
 > The one who loops prompts — a universal LLM pipeline tuning workbench.
 PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
 It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
 ---
 ## Problem Statement
 Anyone building LLM-powered applications faces the same painful loop:
 1. Write a system prompt
 2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
 3. Run it against sample data
 4. Read the output and decide if it's "good enough"
 5. Tweak something and repeat
 This process is manual, unscientific, and wasteful. There's no way to:
 - Systematically compare configurations side-by-side
 - Know if you've already tested a particular combination
 - Quantify "better" beyond gut feeling
 - Let an agent handle the iteration while you steer from above
 - Share optimized configurations between projects or team members
 PromptLooper makes this process systematic, observable, cached, and agent-drivable.
 ---
 ## Target Users
 | User | Use Case |
 |------|----------|
 | **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
 | **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
 | **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
 | **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
 | **Infrastructure team** | Benchmarking new models against existing baselines before migration |
 ---
 ## Core Concepts
 ### Experiment
 A named configuration that defines:
 - **Sample data**: Input documents, queries, or any text the pipeline will process
 - **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
 - **Evaluation criteria**: Scoring functions that grade the output
 - **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
 ### Run
 A single execution of one specific configuration within an experiment. A run captures:
 - Full input configuration (prompt, model, all parameters)
 - Raw LLM response(s)
 - Timing data (latency, tokens in/out)
 - Evaluation scores
 - Configuration hash (for cache deduplication)
 ### Sweep
 A batch of runs that systematically explores a parameter space. Types:
 - **Grid sweep**: Every combination of specified parameter values
 - **Random sweep**: Random sampling from parameter ranges
 - **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
 ### Scoring Function
 A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
 - **Embedding similarity**: How semantically close is the output to a reference answer?
 - **Length compliance**: Does the output meet length constraints?
 - **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
 - **Keyword presence**: Do required terms appear in the output?
 - **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
 - **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
 - **Custom function**: User-provided Python snippet or HTTP webhook
 ### Project
 A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
 - All experiments and their runs
 - Saved "best" configurations
 - Notes and annotations
 - Export history
 ---
 ## Architecture
 ```
 ┌──────────────────────────────────────────────────────────────────────────┐
 │  Docker Compose: xpltd_promptlooper (ub01)                               │
 │  Network: promptlooper (172.33.0.0/24)                                   │
 │                                                                          │
 │  ┌────────────┐  ┌─────────────┐  ┌──────────────────────────────────┐  │
 │  │  PostgreSQL │  │    Redis    │  │         FastAPI (API)            │  │
 │  │  :5434      │  │  job queue  │  │  Experiments, Runs, Scoring,     │  │
 │  │  experiments│  │  pub/sub    │  │  Projects, Auth, MCP Server      │  │
 │  │  runs, cache│  │  live state │  │  WebSocket for live dashboard    │  │
 │  └─────┬───────┘  └──────┬──────┘  └──────────────┬───────────────────┘  │
 │        │                 │                        │                      │
 │  ┌─────┴─────────────────┴────────────────────────┴───────────────────┐  │
 │  │                      Celery Worker                                 │  │
 │  │  Executes runs against target LLM endpoints                        │  │
 │  │  Caches responses by config hash                                   │  │
 │  │  Streams progress via Redis pub/sub                                │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 │                                                                          │
 │  ┌────────────────────────────────────────────────────────────────────┐  │
 │  │                    Web UI (React + Vite)                           │  │
 │  │  nginx → :8400                                                     │  │
 │  │  Dashboard, Experiment Builder, Live Observability, Steering       │  │
 │  └────────────────────────────────────────────────────────────────────┘  │
 └──────────────────────────────────────────────────────────────────────────┘
                              │
                              │  HTTP (OpenAI-compatible)
                              ▼
              ┌───────────────────────────────┐
              │  Target LLM Endpoints          │
              │  OpenWebUI, vLLM, Ollama,      │
              │  OpenAI, Anthropic, any        │
              │  OpenAI-compatible API          │
              └───────────────────────────────┘
 ```
 ### Services (Production Compose)
 | Service | Image | Port | Purpose |
 |---------|-------|------|---------|
 | `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
 | `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
 | `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
 | `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
 | `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
 ### Single Container Mode
 When `DATABASE_URL` is not set, PromptLooper runs with:
 - SQLite at `/data/promptlooper.db`
 - In-process task queue (no Celery/Redis dependency)
 - All services in one container on port 8400
 ```bash
 docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
 ```
 ---
 ## Data Model
 ### User
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | username | string | Unique, "admin" created on first boot |
 | password_hash | string | bcrypt |
 | is_admin | bool | Default true for first user |
 | created_at | timestamp | |
 ### Project
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | name | string | |
 | description | text | Optional |
 | owner_id | UUID | FK → User |
 | created_at | timestamp | |
 | updated_at | timestamp | |
 ### Experiment
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | project_id | UUID | FK → Project |
 | name | string | |
 | description | text | Optional |
 | sample_data | JSONB | Input documents/queries |
 | pipeline_stages | JSONB | Stage definitions with prompt templates |
 | scoring_config | JSONB | Which scoring functions to use and their weights |
 | parameter_space | JSONB | What to vary and ranges/options |
 | status | enum | draft, running, paused, completed |
 | created_at | timestamp | |
 | updated_at | timestamp | |
 ### Run
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | experiment_id | UUID | FK → Experiment |
 | config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
 | config | JSONB | Complete configuration snapshot |
 | status | enum | pending, running, completed, failed, cached |
 | started_at | timestamp | |
 | completed_at | timestamp | |
 | duration_ms | int | Wall clock time |
 | tokens_in | int | Total input tokens across all stages |
 | tokens_out | int | Total output tokens |
 | cost_estimate | decimal | Estimated cost based on model pricing |
 ### StageResult
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | run_id | UUID | FK → Run |
 | stage_index | int | 0-based stage number |
 | prompt_sent | text | Actual prompt after template rendering |
 | response_raw | text | Raw LLM response |
 | model_used | string | Model identifier |
 | parameters | JSONB | Temperature, top_p, etc. |
 | tokens_in | int | This stage |
 | tokens_out | int | This stage |
 | latency_ms | int | This stage |
 ### Score
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | run_id | UUID | FK → Run |
 | scorer_name | string | e.g. "embedding_similarity", "human_rating" |
 | value | float | Normalized 0.0–1.0 |
 | metadata | JSONB | Scorer-specific details |
 | created_at | timestamp | |
 ### ResponseCache
 | Field | Type | Notes |
 |-------|------|-------|
 | config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
 | response | text | Cached LLM response |
 | model | string | |
 | tokens_in | int | |
 | tokens_out | int | |
 | latency_ms | int | Original latency |
 | created_at | timestamp | |
 ### WebhookConfig
 | Field | Type | Notes |
 |-------|------|-------|
 | id | UUID | PK |
 | event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
 | url | string | Target URL |
 | headers | JSONB | Optional auth headers |
 | is_active | bool | |
 ---
 ## API Endpoints
 ### Auth
 | Method | Path | Description |
 |--------|------|-------------|
 | POST | `/api/v1/auth/setup` | First-boot admin password setup |
 | POST | `/api/v1/auth/login` | Login, returns JWT |
 | GET | `/api/v1/auth/me` | Current user info |
 ### Admin
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
 | PUT | `/api/v1/admin/settings` | Update settings |
 | GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
 ### Projects
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/projects` | List projects |
 | POST | `/api/v1/projects` | Create project |
 | GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
 | PUT | `/api/v1/projects/{id}` | Update project |
 | DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
 ### Experiments
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments` | List experiments (filter by project) |
 | POST | `/api/v1/experiments` | Create experiment |
 | GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
 | PUT | `/api/v1/experiments/{id}` | Update experiment config |
 | DELETE | `/api/v1/experiments/{id}` | Delete experiment |
 | POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
 | POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
 | POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
 | POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
 ### Runs
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
 | GET | `/api/v1/runs/{id}` | Run detail with stage results |
 | POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
 | POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
 | GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
 ### Export
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
 | GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
 | GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
 | GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
 ### LLM Endpoints (Target Management)
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/endpoints` | List configured LLM endpoints |
 | POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
 | PUT | `/api/v1/endpoints/{id}` | Update endpoint |
 | DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
 | POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
 ### Webhooks
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/v1/webhooks` | List webhook configs |
 | POST | `/api/v1/webhooks` | Create webhook |
 | DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
 ### WebSocket
 | Path | Description |
 |------|-------------|
 | `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
 | `/ws/dashboard` | Global activity feed across all experiments |
 ### Health
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Health check (DB + Redis connectivity) |
 ---
 ## MCP Server
 PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
 ### MCP Tools
 | Tool | Description |
 |------|-------------|
 | `create_project` | Create a new project workspace |
 | `create_experiment` | Define an experiment with sample data, stages, and scoring |
 | `configure_endpoint` | Add or update an LLM target endpoint |
 | `run_single` | Execute one specific configuration and return results |
 | `run_sweep` | Start a parameter sweep (grid/random/guided) |
 | `get_leaderboard` | Get top N configurations ranked by score |
 | `get_run_detail` | Get full details of a specific run |
 | `export_best_config` | Export the best configuration in JSON/YAML/env format |
 | `pause_sweep` | Pause a running sweep |
 | `resume_sweep` | Resume a paused sweep |
 | `add_human_score` | Rate a run's output |
 | `get_experiment_status` | Check experiment progress |
 | `list_models` | List available models across all configured endpoints |
 ### Example Agent Interaction
 ```
 Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
        that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
        sweeping temperature from 0.1 to 0.9 in 0.2 increments.
        Use embedding similarity scoring against these reference outputs.
        Run a grid sweep."
 PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
                  → streams progress → [get_leaderboard]
 Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
        a .env snippet I can drop into Chrysopedia."
 PromptLooper MCP: [export_best_config format=env]
 ```
 ---
 ## Response Caching
 Every LLM call is cached by a SHA-256 hash of:
 - Prompt text (after template rendering)
 - Model identifier
 - All inference parameters (temperature, top_p, max_tokens, etc.)
 - Input data
 If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
 - Re-running experiments with new scoring functions costs zero tokens
 - Adding a new scorer retroactively evaluates all historical runs
 - Accidentally re-running a sweep wastes nothing
 - Cache can be invalidated per-run or per-experiment if needed
 ---
 ## Authentication Model
 ### First Boot
 - App detects no users exist
 - Presents a setup screen: create admin username + password
 - Admin account is created, user is logged in
 ### Guest Access
 - Admin can toggle `allow_guest_access` in settings
 - Guests can view experiments and results (read-only)
 - Guests cannot create experiments, run sweeps, or modify configs
 - Default: guest access disabled
 ### API Authentication
 - JWT tokens for the web UI
 - API key (generated in admin settings) for programmatic access and MCP
 - API key passed via `Authorization: Bearer <key>` header
 ---
 ## Real-Time Observability Dashboard
 The dashboard is the primary user interface during active experimentation. It provides:
 ### Live Experiment View
 - Progress bar: X of Y runs completed
 - Token usage accumulator (running total)
 - Cost estimate (based on configured model pricing)
 - Cache hit rate for current sweep
 - Estimated time remaining
 ### Side-by-Side Output Comparison
 - Pick any two runs and diff their outputs
 - Highlight differences in prompt, parameters, and response
 - Score comparison overlay
 ### Leaderboard
 - Real-time ranked list of runs by weighted score
 - Sortable by any individual scorer
 - Click to expand full run detail
 ### Steering Controls
 - **Pause**: Stop the sweep after current run completes
 - **Fork**: Create a new experiment branching from current best, with modified parameters
 - **Redirect**: Change remaining sweep parameters mid-flight
 - **Approve**: Mark a configuration as "good enough" and export
 - **Reject**: Exclude a run from leaderboard consideration
 ### Activity Timeline
 - Chronological feed of events: run started, run completed, new best found, cache hit, error
 - Filterable by event type
 ---
 ## Webhook Events
 | Event | Payload | Trigger |
 |-------|---------|---------|
 | `experiment.started` | experiment_id, sweep config | Sweep begins |
 | `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
 | `experiment.paused` | experiment_id, reason | Manual or budget pause |
 | `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
 | `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
 | `human_needed` | experiment_id, reason, context | Agent requests human review |
 | `run.failed` | run_id, error | Individual run error |
 ---
 ## Configuration Export Formats
 ### JSON
 ```json
 {
  "model": "qwen2.5-72b-instruct",
  "endpoint": "http://chat.forgetyour.name/api",
  "temperature": 0.3,
  "top_p": 0.85,
  "max_tokens": 2048,
  "system_prompt": "You are a music production knowledge extractor...",
  "score": 0.87,
  "experiment": "chrysopedia-extraction-v2",
  "exported_at": "2026-04-06T12:00:00Z"
 }
 ```
 ### .env
 ```bash
 LLM_MODEL=qwen2.5-72b-instruct
 LLM_API_URL=http://chat.forgetyour.name/api
 LLM_TEMPERATURE=0.3
 LLM_TOP_P=0.85
 LLM_MAX_TOKENS=2048
 # Score: 0.87 | Experiment: chrysopedia-extraction-v2
 ```
 ### YAML
 ```yaml
 model: qwen2.5-72b-instruct
 endpoint: http://chat.forgetyour.name/api
 parameters:
  temperature: 0.3
  top_p: 0.85
  max_tokens: 2048
 system_prompt: |
  You are a music production knowledge extractor...
 metadata:
  score: 0.87
  experiment: chrysopedia-extraction-v2
  exported_at: 2026-04-06T12:00:00Z
 ```
 ---
 ## Environment Variables
 | Group | Variable | Default | Notes |
 |-------|----------|---------|-------|
 | **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
 | **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
 | **Server** | `HOST` | `0.0.0.0` | Bind address |
 | **Server** | `PORT` | `8400` | HTTP port |
 | **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
 | **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
 | **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
 | **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
 | **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
 | **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
 | **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
 | **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
 | **MCP** | `MCP_PORT` | `8401` | MCP server port |
 ---
 ## Docker Compose (Production — XPLTD Conventions)
 Project name: `xpltd_promptlooper`
 Network: `promptlooper` (`172.33.0.0/24`)
 Persistent data: `/vmPool/r/services/promptlooper_*`
 PostgreSQL port: `5434` (external)
 Web UI port: `8400` (external)
 ---
 ## Technology Stack
 | Layer | Technology | Rationale |
 |-------|-----------|-----------|
 | **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
 | **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
 | **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
 | **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
 | **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
 | **Styling** | Tailwind CSS | Fast iteration, utility-first |
 | **MCP** | Python MCP SDK | Standard protocol for agent integration |
 | **Container** | Multi-stage Docker build | Single image serves both API and frontend |
 ---
 ## Development & Deployment
 ### Local Development
 ```bash
 git clone git@git.xpltd.co:xpltdco/promptlooper.git
 cd promptlooper
 cp .env.example .env
 docker compose up -d promptlooper-db promptlooper-redis
 cd backend && pip install -r requirements.txt
 alembic upgrade head
 uvicorn main:app --reload --host 0.0.0.0 --port 8000
 # In another terminal:
 cd frontend && npm install && npm run dev
 ```
 ### Production Deployment (ub01)
 ```bash
 ssh ub01
 cd /vmPool/r/repos/xpltdco/promptlooper
 git pull && docker compose build && docker compose up -d
 ```
 ### Project Structure
 ```
 promptlooper/
 ├── backend/
 │   ├── main.py                 # FastAPI entry point
 │   ├── config.py               # Pydantic Settings
 │   ├── models.py               # SQLAlchemy ORM
 │   ├── schemas.py              # Pydantic request/response
 │   ├── auth.py                 # JWT + API key auth
 │   ├── worker.py               # Celery app config
 │   ├── routers/
 │   │   ├── auth.py
 │   │   ├── projects.py
 │   │   ├── experiments.py
 │   │   ├── runs.py
 │   │   ├── endpoints.py
 │   │   ├── export.py
 │   │   ├── webhooks.py
 │   │   └── admin.py
 │   ├── engine/
 │   │   ├── runner.py           # Run execution logic
 │   │   ├── sweep.py            # Sweep orchestration
 │   │   ├── cache.py            # Response cache layer
 │   │   ├── adapters/           # LLM endpoint adapters
 │   │   │   ├── openai_compat.py
 │   │   │   └── base.py
 │   │   └── scorers/            # Pluggable scoring functions
 │   │       ├── embedding.py
 │   │       ├── format.py
 │   │       ├── keyword.py
 │   │       ├── llm_judge.py
 │   │       └── base.py
 │   ├── mcp/
 │   │   ├── server.py           # MCP server implementation
 │   │   └── tools.py            # MCP tool definitions
 │   ├── websocket/
 │   │   └── manager.py          # WebSocket connection management
 │   └── tests/
 ├── frontend/
 │   └── src/
 │       ├── pages/
 │       │   ├── Setup.tsx       # First-boot admin setup
 │       │   ├── Login.tsx
 │       │   ├── Dashboard.tsx   # Global activity
 │       │   ├── Projects.tsx
 │       │   ├── Experiment.tsx  # Experiment builder + config
 │       │   ├── Live.tsx        # Real-time observability
 │       │   ├── Compare.tsx     # Side-by-side run comparison
 │       │   └── Admin.tsx       # System settings
 │       ├── components/
 │       │   ├── Leaderboard.tsx
 │       │   ├── SteeringControls.tsx
 │       │   ├── RunCard.tsx
 │       │   ├── ScoreChart.tsx
 │       │   └── Timeline.tsx
 │       └── api/
 ├── docker/
 │   ├── Dockerfile              # Multi-stage: API + frontend
 │   └── nginx.conf
 ├── alembic/
 ├── docker-compose.yml
 ├── .env.example
 ├── CLAUDE.md
 └── README.md
 ```