Compare commits

...

No commits in common. "8208fe6f9f6ea2aa85887a83f36ced665df8b240" and "7dad9d97afbf1a4dd5d14ef73bb7bd3856468431" have entirely different histories.

73 changed files with 9900 additions and 2 deletions

68
.env.example Normal file
View file

@ -0,0 +1,68 @@
# PromptLooper — Environment Variables
# Copy to .env and adjust values for your deployment.
# =============================================================================
# Database
# =============================================================================
# PostgreSQL connection string for production mode.
# When not set, PromptLooper uses SQLite at DATA_DIR/promptlooper.db (single-container mode).
# DATABASE_URL=postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
# =============================================================================
# Redis
# =============================================================================
# Redis connection string for Celery task queue and pub/sub (live dashboard).
# When not set, PromptLooper uses an in-process queue (single-container mode).
# REDIS_URL=redis://promptlooper-redis:6379/0
# =============================================================================
# Server
# =============================================================================
# Bind address and port for the HTTP server.
HOST=0.0.0.0
PORT=8400
# =============================================================================
# Authentication
# =============================================================================
# Secret key used to sign JWT tokens. Auto-generated on first boot if not set.
# IMPORTANT: Set this to a long random string in production.
# JWT_SECRET=change-me-to-a-random-secret
# Static API key for programmatic access (MCP, scripts, CI).
# When not set, API key auth is disabled — only JWT login works.
# API_KEY=
# =============================================================================
# Default LLM Endpoint
# =============================================================================
# Pre-configured LLM endpoint URL (OpenAI-compatible API).
# Users can add more endpoints via the UI or API; this is a convenience default.
# DEFAULT_ENDPOINT_URL=http://localhost:11434/v1
# API key for the default endpoint, if required.
# DEFAULT_ENDPOINT_KEY=
# =============================================================================
# Limits
# =============================================================================
# Maximum number of runs executing in parallel.
MAX_CONCURRENT_RUNS=4
# Token budget per sweep. 0 = unlimited.
MAX_TOKENS_PER_SWEEP=0
# =============================================================================
# Storage
# =============================================================================
# Directory for SQLite database and file storage (single-container mode).
DATA_DIR=/data
# =============================================================================
# MCP Server
# =============================================================================
# Enable the Model Context Protocol server for agent-driven workflows.
MCP_ENABLED=true
# Port for the MCP server (separate from the main API).
MCP_PORT=8401

57
.gitignore vendored Normal file
View file

@ -0,0 +1,57 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
*.egg
dist/
build/
.eggs/
*.whl
.venv/
venv/
env/
.env
*.pyc
.pytest_cache/
.mypy_cache/
.ruff_cache/
htmlcov/
.coverage
.coverage.*
# Node / Frontend
node_modules/
frontend/dist/
frontend/build/
.npm
*.tsbuildinfo
# Docker
docker/nginx.conf.bak
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# OS
Thumbs.db
Desktop.ini
# Data (single-container mode)
*.db
/data/
# Alembic
alembic/versions/__pycache__/
# Auto Run Docs (Maestro working files)
Auto Run Docs/Working/
# Misc
*.log
*.bak

View file

@ -0,0 +1,48 @@
# Phase 1 — Project Scaffold
Set up the PromptLooper repository, Docker infrastructure, and basic project skeleton. Read `promptlooper-spec.md` and `CLAUDE.md` before starting any task.
- [x] Initialize the git repository at git.xpltd.co/xpltdco/promptlooper with a README.md that includes the project description from the spec, a quick-start section showing the single-container docker run command, and badges for license (AGPL-3.0) and status. Add .gitignore for Python, Node, and Docker artifacts.
> NOTE: Git repo initialized locally with remote set to git@git.xpltd.co:xpltdco/promptlooper.git. Push failed — SSH key not configured for this host or repo not yet created on Gitea. Needs manual setup before pushing.
- [x] Create the full directory structure as defined in the spec's Project Structure section. Every directory should exist with a placeholder __init__.py or .gitkeep as appropriate. Include backend/, frontend/, docker/, alembic/, and all subdirectories.
> Created all directories: backend/ (with routers/, engine/adapters/, engine/scorers/, mcp/, websocket/, tests/), frontend/src/ (pages/, components/, api/), docker/, alembic/versions/. Python packages have __init__.py, non-Python dirs have .gitkeep.
- [x] Create .env.example with all environment variables from the spec's Environment Variables table, with sensible defaults and comments explaining each group. Include DATABASE_URL, REDIS_URL, JWT_SECRET, DEFAULT_ENDPOINT_URL, MAX_CONCURRENT_RUNS, and all others.
> Created .env.example with all 13 environment variables organized into 7 groups (Database, Redis, Server, Auth, Default LLM Endpoint, Limits, Storage, MCP). Production-only vars (DATABASE_URL, REDIS_URL, JWT_SECRET, API_KEY, DEFAULT_ENDPOINT_*) are commented out with explanatory notes. Single-container defaults work out of the box.
- [x] Create docker-compose.yml following XPLTD conventions: project name xpltd_promptlooper, network promptlooper (172.33.0.0/24), PostgreSQL on port 5434, Redis, API service, worker service, and web service on port 8400. Use bind mounts under /vmPool/r/services/promptlooper_* for persistent data. Model this after Chrysopedia's docker-compose.yml patterns.
> Updated existing docker-compose.yml: fixed DATABASE_URL to use standard postgresql:// scheme (not asyncpg), hardcoded DB credentials instead of requiring .env vars, added API_KEY pass-through, added working_dir for worker service, made JWT_SECRET optional with dev default. All 5 services defined: db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400). Bind mounts under /vmPool/r/services/promptlooper_*. Health checks on db and redis with dependency conditions.
- [x] Create the multi-stage Dockerfile in docker/ that builds both backend and frontend into a single image. Stage 1: Node build for frontend (npm ci && npm run build). Stage 2: Python runtime with uvicorn, copying the built frontend assets. Include nginx.conf that serves the frontend and proxies /api and /ws to uvicorn. The image should work standalone with SQLite when no DATABASE_URL is provided.
> Created 3-stage Dockerfile: (1) frontend-build with Node 20 Alpine, (2) api stage with Python 3.12-slim + uvicorn + static assets for single-container mode, (3) web stage with nginx 1.27 Alpine for production compose. nginx.conf proxies /api/ and /health to the API, upgrades /ws/ connections for WebSocket. Also created: backend/requirements.txt, frontend scaffolding (package.json, vite.config.ts, tsconfig.json, index.html, App.tsx, Tailwind config), and placeholder alembic.ini/env.py for Dockerfile COPY.
- [x] Create backend/config.py using Pydantic Settings. Define all configuration from the Environment Variables table. Implement the SQLite fallback logic: when DATABASE_URL is not set, construct a SQLite URL pointing to DATA_DIR/promptlooper.db. When REDIS_URL is not set, set a flag for in-process mode.
> Created backend/config.py with Pydantic Settings class defining all 13 env vars. SQLite fallback via `effective_database_url` property constructs sqlite:///DATA_DIR/promptlooper.db when DATABASE_URL is unset. `use_in_process_queue` property flags in-process mode when REDIS_URL is absent. JWT_SECRET auto-generates via `secrets.token_urlsafe(32)` when not provided. Empty API_KEY strings normalize to None. 13 tests in tests/test_config.py all passing.
- [x] Create backend/models.py with all SQLAlchemy ORM models from the spec's Data Model section: User, Project, Experiment, Run, StageResult, Score, ResponseCache, and WebhookConfig. Include all fields, types, relationships, and indexes. Use UUID primary keys and JSONB for flexible fields.
> Created all 8 ORM models with UUID PKs, JSON columns (using sqlalchemy.JSON for SQLite compatibility — maps to JSONB on PostgreSQL), enum types (ExperimentStatus, RunStatus), full relationship definitions with cascade deletes, and indexes on foreign keys and commonly filtered columns. Score.metadata mapped as `scorer_metadata` Python attribute (column name stays "metadata") to avoid SQLAlchemy reserved name conflict. 16 tests in tests/test_models.py all passing.
- [x] Set up Alembic: create alembic.ini and alembic/env.py configured to read DATABASE_URL from the config. Generate and apply the initial migration from the models.
> Created alembic.ini with logging config and script_location pointing to alembic/. env.py reads DATABASE_URL from backend.config.settings (with override support for tests). Added script.py.mako template. Generated initial migration (e1909678e89e) with all 8 tables, indexes, foreign keys, and enums. Migration applies cleanly on SQLite (render_as_batch=True for SQLite compatibility). 5 tests in tests/test_alembic.py covering upgrade/downgrade/columns/indexes/FKs. All 34 backend tests pass.
- [x] Create backend/schemas.py with Pydantic request/response schemas for all API endpoints. Include create/update/response schemas for Project, Experiment, Run, Endpoint, and Webhook. Include the Score input schema and export format schemas.
> Created backend/schemas.py with all Pydantic v2 schemas using ConfigDict(from_attributes=True) for ORM compatibility. Includes: Project (create/update/response/list), Experiment (create/update/response/list), Run (response/list/detail with nested stages+scores), StageResult (response), Score (input/response), Endpoint (create/update/response/list), Webhook (create/update/response/list), Auth (setup/login/token/user), Export (run row with scores dict, export response), and Health. 30 tests in tests/test_schemas.py all passing. All 64 backend tests pass.
- [x] Create backend/main.py with the FastAPI application. Set up CORS middleware, mount all routers (even if they're stubs), configure the WebSocket endpoint, add the /health endpoint that checks DB and Redis connectivity, and add startup/shutdown lifecycle hooks.
> Created backend/main.py with: CORS middleware (allow all origins), /health endpoint checking DB (SELECT 1) and Redis (ping) connectivity, /ws WebSocket endpoint with ConnectionManager for real-time broadcasts, async lifespan hooks for DB engine + Redis init/teardown, get_db dependency yielding sessions, dynamic router mounting (silently skips missing routers). 10 tests in tests/test_main.py covering health, CORS, WebSocket connect/disconnect/echo, OpenAPI schema, 404s, broadcast, get_db, and get_redis. All 74 backend tests pass.
- [x] Create backend/auth.py implementing JWT token generation/verification, API key validation, and the first-boot setup flow. The setup endpoint should check if any users exist — if not, accept username + password to create the admin account. Include a dependency function for route-level auth that supports both JWT and API key.
> Created backend/auth.py with: bcrypt password hashing via passlib, JWT token creation/verification (HS256, 24h expiry) using python-jose, first-boot `needs_setup()` + `create_admin()` flow (409 if admin exists), `authenticate_user()` for login, and `get_current_user` FastAPI dependency supporting both JWT Bearer tokens and X-Api-Key header (API key grants first admin user). UUID string-to-UUID conversion for SQLite compatibility. 21 tests in tests/test_auth.py covering hashing, JWT lifecycle, setup flow, login, and all auth dependency paths. All 95 backend tests pass.
- [x] Scaffold all router files in backend/routers/ as stubs: auth.py, projects.py, experiments.py, runs.py, endpoints.py, export.py, webhooks.py, admin.py. Each should have the correct APIRouter prefix and tags, with placeholder endpoints that return 501 Not Implemented.
> Created all 8 router stubs with APIRouter instances, mounted via main.py's _mount_routers(). Endpoints match the spec: auth (3 endpoints), projects (5), experiments (9 incl. sweep/pause/resume/stop), runs (5 incl. leaderboard), endpoints (5 incl. test), export (4 formats), webhooks (3), admin (3). All return 501 Not Implemented. 37 tests in tests/test_routers.py verify every route is mounted and returns 501. All 132 backend tests pass.
- [x] Initialize the frontend: run npm create vite@latest with React + TypeScript template. Install Tailwind CSS and configure it. Install react-router-dom for routing. Create the basic App.tsx with routes for Setup, Login, Dashboard, Projects, Experiment, Live, Compare, and Admin pages (all as placeholder components). Verify it builds cleanly.
> Frontend was already scaffolded with Vite + React + TypeScript + Tailwind + react-router-dom from the Dockerfile task. Added 8 placeholder page components (SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage, ComparePage, AdminPage) in frontend/src/pages/. Updated App.tsx with react-router-dom Routes and main.tsx with BrowserRouter. Unknown routes redirect to dashboard. Installed vitest + @testing-library/react for testing. 9 routing tests in App.test.tsx all passing. Build completes cleanly. All 132 backend tests still pass.
- [x] Create frontend/src/api/client.ts with a typed API client using fetch. Include JWT token management (stored in memory, not localStorage), request/response interceptors for auth headers, and typed wrapper functions for each API endpoint group. Include WebSocket connection helper.
> Created frontend/src/api/client.ts with: TypeScript interfaces mirroring all backend Pydantic schemas, in-memory JWT token management (setToken/getToken/clearToken — never localStorage), automatic Authorization header injection on all requests, Content-Type header for POST/PUT bodies, ApiError class for non-ok responses, typed wrapper functions for all 8 endpoint groups (auth, projects, experiments, runs, endpoints, export, webhooks, admin) plus health check, and connectWebSocket() helper that derives ws/wss from current protocol and handles JSON message parsing. 39 tests in src/api/client.test.ts covering token management, header injection, all endpoint groups, error handling, and WebSocket lifecycle. All 48 frontend tests pass. All 132 backend tests still pass.
- [x] Verify the full stack runs: docker compose up should start all services. The API should respond to /health. The frontend should load and show the setup screen (since no admin exists). The database migration should have run. Document any manual steps needed in the README.
> Created missing backend/worker.py (Celery app config for docker-compose worker service). Created docker/entrypoint.sh that runs `alembic upgrade head` before starting uvicorn, and updated Dockerfile to use it as ENTRYPOINT. Fixed README single-container quick-start (port 8000, not 8400) and added production compose docs (service list, first-boot instructions). Added 24 stack integration tests verifying all Docker/compose/nginx/frontend/alembic files are present and consistent, plus /health endpoint test. 3 worker tests confirm Celery config. All 159 backend + 48 frontend tests pass.

127
CLAUDE.md Normal file
View file

@ -0,0 +1,127 @@
# CLAUDE.md — PromptLooper
## What is this project?
PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
## Repository
- **Hosted at**: git.xpltd.co/xpltdco/promptlooper
- **XPLTD project name**: `xpltd_promptlooper`
- **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
## Tech Stack
- **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
- **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
- **Cache/Queue**: Redis 7 (production) / in-process (single-container)
- **Real-time**: WebSocket via FastAPI + Redis pub/sub
- **MCP**: Python MCP SDK
- **Container**: Multi-stage Docker build, nginx for frontend
## XPLTD Conventions
These are non-negotiable project conventions shared across all XPLTD projects:
- Docker Compose project name: `xpltd_promptlooper`
- Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
- Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
- PostgreSQL on external port `5434` (internal `5432`)
- Web UI on port `8400`
- MCP server on port `8401`
- Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
## Key Architecture Decisions
1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
## File Organization
```
backend/
main.py — FastAPI app, middleware, router mounting
config.py — Pydantic Settings from env vars
models.py — SQLAlchemy ORM models
schemas.py — Pydantic request/response schemas
auth.py — JWT + API key authentication
worker.py — Celery app configuration
routers/ — API endpoint handlers
engine/ — Core experiment execution logic
runner.py — Individual run execution
sweep.py — Sweep orchestration (grid/random/guided)
cache.py — Response cache layer
adapters/ — LLM endpoint adapters
scorers/ — Pluggable scoring functions
mcp/ — MCP server implementation
websocket/ — WebSocket connection management
frontend/src/
pages/ — Route-level components
components/ — Shared UI components
api/ — Typed API client functions
```
## Database Migrations
Use Alembic. Same patterns as Chrysopedia:
```bash
alembic revision --autogenerate -m "describe_change"
alembic upgrade head
```
## Running Locally
```bash
docker compose up -d promptlooper-db promptlooper-redis
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend in another terminal:
cd frontend && npm run dev
```
## Testing
```bash
cd backend && pytest
cd frontend && npm test
```
## Important Patterns
### Adding a new scorer
1. Create `backend/engine/scorers/my_scorer.py`
2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
3. Register in `backend/engine/scorers/__init__.py`
4. Add to frontend scorer picker component
### Adding a new LLM adapter
1. Create `backend/engine/adapters/my_adapter.py`
2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
3. Register in `backend/engine/adapters/__init__.py`
4. Currently only OpenAI-compatible is implemented; all others should be edge cases
### Adding a new MCP tool
1. Add tool definition in `backend/mcp/tools.py`
2. Implement handler in `backend/mcp/server.py`
3. Tools should map 1:1 to API endpoints where possible
## Common Gotchas
- Always hash the FULL config when checking cache — missing a single parameter means cache misses
- WebSocket connections must be cleaned up on disconnect — use the connection manager
- SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
- Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
- MCP server runs on a separate port from the main API
## Deployment
```bash
ssh ub01
cd /vmPool/r/repos/xpltdco/promptlooper
git pull && docker compose build && docker compose up -d
```

View file

@ -1,3 +1,79 @@
# promptlooper # PromptLooper
Universal LLM pipeline tuning workbench — systematically optimize prompts, models, and inference parameters through cached experiments, pluggable scoring, and agent-driven sweeps via MCP. [![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
> The one who loops prompts — a universal LLM pipeline tuning workbench.
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
## Quick Start
### Single Container (zero dependencies)
```bash
docker run -p 8000:8000 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
```
Open `http://localhost:8000` — you'll be prompted to create an admin account on first boot.
> In single-container mode, the API serves the built frontend as static files at the root.
> Database migrations run automatically on startup.
### Production (Docker Compose)
```bash
git clone git@git.xpltd.co:xpltdco/promptlooper.git
cd promptlooper
cp .env.example .env
# Edit .env — set JWT_SECRET at minimum
docker compose up -d
```
Open `http://localhost:8400` — nginx proxies the frontend (port 80 → 8400) and API (`/api/` → port 8000).
**Services started:**
- `promptlooper-db` — PostgreSQL 16 on port 5434
- `promptlooper-redis` — Redis 7
- `promptlooper-api` — FastAPI + Alembic migrations (auto-runs on startup)
- `promptlooper-worker` — Celery worker for experiment execution
- `promptlooper-web` — Nginx reverse proxy on port 8400
**First boot:** Navigate to `http://localhost:8400/setup` to create the admin account.
## Features
- **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
- **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
- **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
- **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
- **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
- **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
## Development
```bash
# Start backing services
docker compose up -d promptlooper-db promptlooper-redis
# Backend
cd backend && pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend (separate terminal)
cd frontend && npm install && npm run dev
```
## Testing
```bash
cd backend && pytest
cd frontend && npm test
```
## License
[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)

39
alembic.ini Normal file
View file

@ -0,0 +1,39 @@
[alembic]
script_location = alembic
# sqlalchemy.url is set programmatically in env.py from backend.config
sqlalchemy.url =
[post_write_hooks]
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

66
alembic/env.py Normal file
View file

@ -0,0 +1,66 @@
"""Alembic environment configuration for PromptLooper."""
import sys
from logging.config import fileConfig
from pathlib import Path
from alembic import context
from sqlalchemy import engine_from_config, pool
# Ensure the backend package is importable
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
from backend.config import settings
from backend.models import Base
config = context.config
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# Use sqlalchemy.url from alembic config if already set (e.g. by tests),
# otherwise fall back to application settings.
if not config.get_main_option("sqlalchemy.url"):
config.set_main_option("sqlalchemy.url", settings.effective_database_url)
target_metadata = Base.metadata
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode — emit SQL to stdout."""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
render_as_batch=True,
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
"""Run migrations against a live database connection."""
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
render_as_batch=True,
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

26
alembic/script.py.mako Normal file
View file

@ -0,0 +1,26 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}

View file

View file

@ -0,0 +1,165 @@
"""initial_schema
Revision ID: e1909678e89e
Revises:
Create Date: 2026-04-07 01:50:18.571150
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = 'e1909678e89e'
down_revision: Union[str, None] = None
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('response_cache',
sa.Column('config_hash', sa.String(length=64), nullable=False),
sa.Column('response', sa.Text(), nullable=False),
sa.Column('model', sa.String(length=255), nullable=False),
sa.Column('tokens_in', sa.Integer(), nullable=True),
sa.Column('tokens_out', sa.Integer(), nullable=True),
sa.Column('latency_ms', sa.Integer(), nullable=True),
sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
sa.PrimaryKeyConstraint('config_hash')
)
op.create_table('users',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('username', sa.String(length=255), nullable=False),
sa.Column('password_hash', sa.String(length=255), nullable=False),
sa.Column('is_admin', sa.Boolean(), nullable=False),
sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
sa.PrimaryKeyConstraint('id'),
sa.UniqueConstraint('username')
)
op.create_table('webhook_configs',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('event_type', sa.String(length=255), nullable=False),
sa.Column('url', sa.String(length=2048), nullable=False),
sa.Column('headers', sa.JSON(), nullable=True),
sa.Column('is_active', sa.Boolean(), nullable=False),
sa.PrimaryKeyConstraint('id')
)
with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
batch_op.create_index('ix_webhook_configs_event_type', ['event_type'], unique=False)
op.create_table('projects',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('name', sa.String(length=255), nullable=False),
sa.Column('description', sa.Text(), nullable=True),
sa.Column('owner_id', sa.Uuid(), nullable=False),
sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
sa.ForeignKeyConstraint(['owner_id'], ['users.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
op.create_table('experiments',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('project_id', sa.Uuid(), nullable=False),
sa.Column('name', sa.String(length=255), nullable=False),
sa.Column('description', sa.Text(), nullable=True),
sa.Column('sample_data', sa.JSON(), nullable=True),
sa.Column('pipeline_stages', sa.JSON(), nullable=True),
sa.Column('scoring_config', sa.JSON(), nullable=True),
sa.Column('parameter_space', sa.JSON(), nullable=True),
sa.Column('status', sa.Enum('draft', 'running', 'paused', 'completed', name='experiment_status'), nullable=False),
sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
sa.ForeignKeyConstraint(['project_id'], ['projects.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
with op.batch_alter_table('experiments', schema=None) as batch_op:
batch_op.create_index('ix_experiments_project_id', ['project_id'], unique=False)
batch_op.create_index('ix_experiments_status', ['status'], unique=False)
op.create_table('runs',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('experiment_id', sa.Uuid(), nullable=False),
sa.Column('config_hash', sa.String(length=64), nullable=False),
sa.Column('config', sa.JSON(), nullable=False),
sa.Column('status', sa.Enum('pending', 'running', 'completed', 'failed', 'cached', name='run_status'), nullable=False),
sa.Column('started_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('completed_at', sa.DateTime(timezone=True), nullable=True),
sa.Column('duration_ms', sa.Integer(), nullable=True),
sa.Column('tokens_in', sa.Integer(), nullable=True),
sa.Column('tokens_out', sa.Integer(), nullable=True),
sa.Column('cost_estimate', sa.Numeric(precision=12, scale=6), nullable=True),
sa.ForeignKeyConstraint(['experiment_id'], ['experiments.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
with op.batch_alter_table('runs', schema=None) as batch_op:
batch_op.create_index('ix_runs_config_hash', ['config_hash'], unique=False)
batch_op.create_index('ix_runs_experiment_id', ['experiment_id'], unique=False)
batch_op.create_index('ix_runs_status', ['status'], unique=False)
op.create_table('scores',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('run_id', sa.Uuid(), nullable=False),
sa.Column('scorer_name', sa.String(length=255), nullable=False),
sa.Column('value', sa.Float(), nullable=False),
sa.Column('metadata', sa.JSON(), nullable=True),
sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
with op.batch_alter_table('scores', schema=None) as batch_op:
batch_op.create_index('ix_scores_run_id', ['run_id'], unique=False)
batch_op.create_index('ix_scores_scorer_name', ['scorer_name'], unique=False)
op.create_table('stage_results',
sa.Column('id', sa.Uuid(), nullable=False),
sa.Column('run_id', sa.Uuid(), nullable=False),
sa.Column('stage_index', sa.Integer(), nullable=False),
sa.Column('prompt_sent', sa.Text(), nullable=False),
sa.Column('response_raw', sa.Text(), nullable=False),
sa.Column('model_used', sa.String(length=255), nullable=False),
sa.Column('parameters', sa.JSON(), nullable=True),
sa.Column('tokens_in', sa.Integer(), nullable=True),
sa.Column('tokens_out', sa.Integer(), nullable=True),
sa.Column('latency_ms', sa.Integer(), nullable=True),
sa.ForeignKeyConstraint(['run_id'], ['runs.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
with op.batch_alter_table('stage_results', schema=None) as batch_op:
batch_op.create_index('ix_stage_results_run_id', ['run_id'], unique=False)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('stage_results', schema=None) as batch_op:
batch_op.drop_index('ix_stage_results_run_id')
op.drop_table('stage_results')
with op.batch_alter_table('scores', schema=None) as batch_op:
batch_op.drop_index('ix_scores_scorer_name')
batch_op.drop_index('ix_scores_run_id')
op.drop_table('scores')
with op.batch_alter_table('runs', schema=None) as batch_op:
batch_op.drop_index('ix_runs_status')
batch_op.drop_index('ix_runs_experiment_id')
batch_op.drop_index('ix_runs_config_hash')
op.drop_table('runs')
with op.batch_alter_table('experiments', schema=None) as batch_op:
batch_op.drop_index('ix_experiments_status')
batch_op.drop_index('ix_experiments_project_id')
op.drop_table('experiments')
op.drop_table('projects')
with op.batch_alter_table('webhook_configs', schema=None) as batch_op:
batch_op.drop_index('ix_webhook_configs_event_type')
op.drop_table('webhook_configs')
op.drop_table('users')
op.drop_table('response_cache')
# ### end Alembic commands ###

0
backend/__init__.py Normal file
View file

154
backend/auth.py Normal file
View file

@ -0,0 +1,154 @@
"""PromptLooper authentication — JWT tokens, API keys, first-boot setup."""
import uuid as _uuid
from datetime import datetime, timedelta, timezone
from typing import Generator
from fastapi import Depends, HTTPException, Header, status
from jose import JWTError, jwt
from passlib.context import CryptContext
from sqlalchemy.orm import Session
from config import settings
from models import User
# ---------------------------------------------------------------------------
# Password hashing
# ---------------------------------------------------------------------------
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
def hash_password(password: str) -> str:
return pwd_context.hash(password)
def verify_password(plain: str, hashed: str) -> bool:
return pwd_context.verify(plain, hashed)
# ---------------------------------------------------------------------------
# JWT
# ---------------------------------------------------------------------------
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 24 # 24 hours
def create_access_token(user_id: str, *, expires_delta: timedelta | None = None) -> str:
expire = datetime.now(timezone.utc) + (expires_delta or timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES))
payload = {"sub": user_id, "exp": expire}
return jwt.encode(payload, settings.jwt_secret, algorithm=ALGORITHM)
def decode_access_token(token: str) -> str:
"""Return the user_id (sub) from a valid JWT, or raise."""
try:
payload = jwt.decode(token, settings.jwt_secret, algorithms=[ALGORITHM])
user_id: str | None = payload.get("sub")
if user_id is None:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
return user_id
except JWTError:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
# ---------------------------------------------------------------------------
# First-boot setup
# ---------------------------------------------------------------------------
def needs_setup(db: Session) -> bool:
"""Return True if no users exist yet (first-boot state)."""
return db.query(User).count() == 0
def create_admin(db: Session, username: str, password: str) -> User:
"""Create the first admin user. Raises if users already exist."""
if not needs_setup(db):
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail="Admin account already exists",
)
user = User(
username=username,
password_hash=hash_password(password),
is_admin=True,
)
db.add(user)
db.commit()
db.refresh(user)
return user
# ---------------------------------------------------------------------------
# Authenticate (login)
# ---------------------------------------------------------------------------
def authenticate_user(db: Session, username: str, password: str) -> User:
"""Verify credentials and return the User, or raise 401."""
user = db.query(User).filter(User.username == username).first()
if user is None or not verify_password(password, user.password_hash):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials")
return user
# ---------------------------------------------------------------------------
# Database session dependency (local to avoid circular import with main.py)
# ---------------------------------------------------------------------------
def _get_db() -> Generator[Session, None, None]:
"""Yield a DB session. Imported lazily from main to avoid circular import."""
from main import get_db
yield from get_db()
# ---------------------------------------------------------------------------
# Dependency: get current user (JWT or API key)
# ---------------------------------------------------------------------------
def get_current_user(
authorization: str | None = Header(None),
x_api_key: str | None = Header(None),
db: Session = Depends(_get_db),
) -> User:
"""FastAPI dependency — resolve the current user from JWT Bearer token or API key.
Priority:
1. X-Api-Key header matched against settings.api_key (grants first admin).
2. Authorization: Bearer <jwt> decoded to get user_id.
"""
# --- API key path ---
if x_api_key is not None:
if settings.api_key is None or x_api_key != settings.api_key:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key")
# API key grants the first admin user
admin = db.query(User).filter(User.is_admin.is_(True)).first()
if admin is None:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="No admin user exists")
return admin
# --- JWT path ---
if authorization is None:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Missing authentication",
headers={"WWW-Authenticate": "Bearer"},
)
scheme, _, token = authorization.partition(" ")
if scheme.lower() != "bearer" or not token:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid authorization header",
headers={"WWW-Authenticate": "Bearer"},
)
user_id_str = decode_access_token(token)
try:
user_id = _uuid.UUID(user_id_str)
except ValueError:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid token")
user = db.query(User).filter(User.id == user_id).first()
if user is None:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="User not found")
return user

76
backend/config.py Normal file
View file

@ -0,0 +1,76 @@
"""PromptLooper configuration — Pydantic Settings loaded from environment."""
import secrets
from pathlib import Path
from pydantic import field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore",
)
# --- Database ---
database_url: str | None = None
# --- Redis ---
redis_url: str | None = None
# --- Server ---
host: str = "0.0.0.0"
port: int = 8400
# --- Auth ---
jwt_secret: str = ""
api_key: str | None = None
# --- Default LLM Endpoint ---
default_endpoint_url: str | None = None
default_endpoint_key: str | None = None
# --- Limits ---
max_concurrent_runs: int = 4
max_tokens_per_sweep: int = 0 # 0 = unlimited
# --- Storage ---
data_dir: str = "/data"
# --- MCP ---
mcp_enabled: bool = True
mcp_port: int = 8401
def model_post_init(self, __context: object) -> None:
# Auto-generate JWT secret if not provided
if not self.jwt_secret:
self.jwt_secret = secrets.token_urlsafe(32)
@property
def effective_database_url(self) -> str:
"""Return DATABASE_URL or construct a SQLite URL from DATA_DIR."""
if self.database_url:
return self.database_url
db_path = Path(self.data_dir) / "promptlooper.db"
return f"sqlite:///{db_path}"
@property
def is_sqlite(self) -> bool:
return self.effective_database_url.startswith("sqlite")
@property
def use_in_process_queue(self) -> bool:
"""When Redis is unavailable, use in-process task execution."""
return self.redis_url is None
@field_validator("api_key", mode="before")
@classmethod
def empty_string_to_none(cls, v: str | None) -> str | None:
if v is not None and v.strip() == "":
return None
return v
settings = Settings()

View file

View file

View file

211
backend/main.py Normal file
View file

@ -0,0 +1,211 @@
"""PromptLooper FastAPI application."""
from contextlib import asynccontextmanager
from typing import AsyncGenerator
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
from config import settings
# ---------------------------------------------------------------------------
# Database engine & session factory (lazy, created at startup)
# ---------------------------------------------------------------------------
engine = None
SessionLocal = None
def _init_db() -> None:
"""Create the SQLAlchemy engine and session factory."""
global engine, SessionLocal
connect_args = {}
if settings.is_sqlite:
connect_args["check_same_thread"] = False
engine = create_engine(
settings.effective_database_url,
connect_args=connect_args,
)
SessionLocal = sessionmaker(bind=engine, autoflush=False, expire_on_commit=False)
def get_db():
"""FastAPI dependency that yields a database session."""
db = SessionLocal()
try:
yield db
finally:
db.close()
# ---------------------------------------------------------------------------
# Redis helper
# ---------------------------------------------------------------------------
_redis_client = None
def _init_redis() -> None:
"""Connect to Redis if configured."""
global _redis_client
if not settings.redis_url:
_redis_client = None
return
import redis as redis_lib
_redis_client = redis_lib.Redis.from_url(settings.redis_url, decode_responses=True)
def get_redis():
"""Return the Redis client (or None in single-container mode)."""
return _redis_client
# ---------------------------------------------------------------------------
# WebSocket connection manager
# ---------------------------------------------------------------------------
class ConnectionManager:
"""Manage active WebSocket connections."""
def __init__(self) -> None:
self.active_connections: list[WebSocket] = []
async def connect(self, websocket: WebSocket) -> None:
await websocket.accept()
self.active_connections.append(websocket)
def disconnect(self, websocket: WebSocket) -> None:
self.active_connections.remove(websocket)
async def broadcast(self, message: dict) -> None:
for connection in list(self.active_connections):
try:
await connection.send_json(message)
except Exception:
self.disconnect(connection)
ws_manager = ConnectionManager()
# ---------------------------------------------------------------------------
# Lifecycle
# ---------------------------------------------------------------------------
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
"""Startup and shutdown lifecycle hooks."""
_init_db()
_init_redis()
yield
# Shutdown: clean up connections
if _redis_client is not None:
_redis_client.close()
if engine is not None:
engine.dispose()
# ---------------------------------------------------------------------------
# Application
# ---------------------------------------------------------------------------
app = FastAPI(
title="PromptLooper",
description="LLM pipeline tuning workbench",
version="0.1.0",
lifespan=lifespan,
)
# CORS — allow all origins in development; tighten in production via env
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# ---------------------------------------------------------------------------
# Health endpoint
# ---------------------------------------------------------------------------
@app.get("/health", tags=["system"])
def health_check() -> dict:
"""Check DB and Redis connectivity."""
db_ok = False
redis_ok = False
# Database check
if SessionLocal is not None:
try:
with SessionLocal() as session:
session.execute(text("SELECT 1"))
db_ok = True
except Exception:
pass
# Redis check
if not settings.redis_url:
redis_ok = True # No Redis needed — in-process mode
elif _redis_client is not None:
try:
_redis_client.ping()
redis_ok = True
except Exception:
pass
return {"status": "ok" if (db_ok and redis_ok) else "degraded", "database": db_ok, "redis": redis_ok}
# ---------------------------------------------------------------------------
# WebSocket endpoint
# ---------------------------------------------------------------------------
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket) -> None:
"""WebSocket connection for real-time dashboard updates."""
await ws_manager.connect(websocket)
try:
while True:
# Keep connection alive; handle incoming messages if needed
data = await websocket.receive_json()
# Echo back or handle client messages in future
await websocket.send_json({"type": "ack", "data": data})
except WebSocketDisconnect:
ws_manager.disconnect(websocket)
# ---------------------------------------------------------------------------
# Mount routers (stubs — actual implementations come later)
# ---------------------------------------------------------------------------
# Router imports are deferred to avoid circular imports and allow
# stub files to be created independently. Each router will be mounted
# as it is implemented. For now we register empty prefixes.
def _mount_routers() -> None:
"""Import and mount all routers. Silently skip missing ones."""
router_configs = [
("routers.auth", "/api/auth", ["auth"]),
("routers.projects", "/api/projects", ["projects"]),
("routers.experiments", "/api/experiments", ["experiments"]),
("routers.runs", "/api/runs", ["runs"]),
("routers.endpoints", "/api/endpoints", ["endpoints"]),
("routers.export", "/api/export", ["export"]),
("routers.webhooks", "/api/webhooks", ["webhooks"]),
("routers.admin", "/api/admin", ["admin"]),
]
for module_name, prefix, tags in router_configs:
try:
import importlib
mod = importlib.import_module(module_name)
app.include_router(mod.router, prefix=prefix, tags=tags)
except (ImportError, AttributeError):
pass # Router not yet implemented
_mount_routers()

0
backend/mcp/__init__.py Normal file
View file

276
backend/models.py Normal file
View file

@ -0,0 +1,276 @@
"""PromptLooper SQLAlchemy ORM models."""
import enum
import uuid
from datetime import datetime, timezone
from sqlalchemy import (
JSON,
Boolean,
DateTime,
Enum,
Float,
ForeignKey,
Index,
Integer,
Numeric,
String,
Text,
)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
def _utcnow() -> datetime:
return datetime.now(timezone.utc)
def _new_uuid() -> uuid.UUID:
return uuid.uuid4()
# ---------------------------------------------------------------------------
# Base
# ---------------------------------------------------------------------------
class Base(DeclarativeBase):
"""Shared declarative base for all models."""
type_annotation_map = {
dict: JSON,
}
# ---------------------------------------------------------------------------
# Enums
# ---------------------------------------------------------------------------
class ExperimentStatus(str, enum.Enum):
draft = "draft"
running = "running"
paused = "paused"
completed = "completed"
class RunStatus(str, enum.Enum):
pending = "pending"
running = "running"
completed = "completed"
failed = "failed"
cached = "cached"
# ---------------------------------------------------------------------------
# Models
# ---------------------------------------------------------------------------
class User(Base):
__tablename__ = "users"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
username: Mapped[str] = mapped_column(String(255), unique=True, nullable=False)
password_hash: Mapped[str] = mapped_column(String(255), nullable=False)
is_admin: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, nullable=False
)
# Relationships
projects: Mapped[list["Project"]] = relationship(
back_populates="owner", cascade="all, delete-orphan"
)
class Project(Base):
__tablename__ = "projects"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
name: Mapped[str] = mapped_column(String(255), nullable=False)
description: Mapped[str | None] = mapped_column(Text, nullable=True)
owner_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
)
# Relationships
owner: Mapped["User"] = relationship(back_populates="projects")
experiments: Mapped[list["Experiment"]] = relationship(
back_populates="project", cascade="all, delete-orphan"
)
class Experiment(Base):
__tablename__ = "experiments"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
project_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("projects.id", ondelete="CASCADE"), nullable=False
)
name: Mapped[str] = mapped_column(String(255), nullable=False)
description: Mapped[str | None] = mapped_column(Text, nullable=True)
sample_data: Mapped[dict | None] = mapped_column(JSON, nullable=True)
pipeline_stages: Mapped[dict | None] = mapped_column(JSON, nullable=True)
scoring_config: Mapped[dict | None] = mapped_column(JSON, nullable=True)
parameter_space: Mapped[dict | None] = mapped_column(JSON, nullable=True)
status: Mapped[ExperimentStatus] = mapped_column(
Enum(ExperimentStatus, name="experiment_status"),
default=ExperimentStatus.draft,
nullable=False,
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, onupdate=_utcnow, nullable=False
)
# Relationships
project: Mapped["Project"] = relationship(back_populates="experiments")
runs: Mapped[list["Run"]] = relationship(
back_populates="experiment", cascade="all, delete-orphan"
)
__table_args__ = (
Index("ix_experiments_project_id", "project_id"),
Index("ix_experiments_status", "status"),
)
class Run(Base):
__tablename__ = "runs"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
experiment_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("experiments.id", ondelete="CASCADE"), nullable=False
)
config_hash: Mapped[str] = mapped_column(String(64), nullable=False)
config: Mapped[dict] = mapped_column(JSON, nullable=False)
status: Mapped[RunStatus] = mapped_column(
Enum(RunStatus, name="run_status"),
default=RunStatus.pending,
nullable=False,
)
started_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True), nullable=True
)
completed_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True), nullable=True
)
duration_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
cost_estimate: Mapped[float | None] = mapped_column(
Numeric(precision=12, scale=6), nullable=True
)
# Relationships
experiment: Mapped["Experiment"] = relationship(back_populates="runs")
stage_results: Mapped[list["StageResult"]] = relationship(
back_populates="run", cascade="all, delete-orphan"
)
scores: Mapped[list["Score"]] = relationship(
back_populates="run", cascade="all, delete-orphan"
)
__table_args__ = (
Index("ix_runs_experiment_id", "experiment_id"),
Index("ix_runs_config_hash", "config_hash"),
Index("ix_runs_status", "status"),
)
class StageResult(Base):
__tablename__ = "stage_results"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
run_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
)
stage_index: Mapped[int] = mapped_column(Integer, nullable=False)
prompt_sent: Mapped[str] = mapped_column(Text, nullable=False)
response_raw: Mapped[str] = mapped_column(Text, nullable=False)
model_used: Mapped[str] = mapped_column(String(255), nullable=False)
parameters: Mapped[dict | None] = mapped_column(JSON, nullable=True)
tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
# Relationships
run: Mapped["Run"] = relationship(back_populates="stage_results")
__table_args__ = (
Index("ix_stage_results_run_id", "run_id"),
)
class Score(Base):
__tablename__ = "scores"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
run_id: Mapped[uuid.UUID] = mapped_column(
ForeignKey("runs.id", ondelete="CASCADE"), nullable=False
)
scorer_name: Mapped[str] = mapped_column(String(255), nullable=False)
value: Mapped[float] = mapped_column(Float, nullable=False)
scorer_metadata: Mapped[dict | None] = mapped_column(
"metadata", JSON, nullable=True
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, nullable=False
)
# Relationships
run: Mapped["Run"] = relationship(back_populates="scores")
__table_args__ = (
Index("ix_scores_run_id", "run_id"),
Index("ix_scores_scorer_name", "scorer_name"),
)
class ResponseCache(Base):
__tablename__ = "response_cache"
config_hash: Mapped[str] = mapped_column(
String(64), primary_key=True
)
response: Mapped[str] = mapped_column(Text, nullable=False)
model: Mapped[str] = mapped_column(String(255), nullable=False)
tokens_in: Mapped[int | None] = mapped_column(Integer, nullable=True)
tokens_out: Mapped[int | None] = mapped_column(Integer, nullable=True)
latency_ms: Mapped[int | None] = mapped_column(Integer, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=_utcnow, nullable=False
)
class WebhookConfig(Base):
__tablename__ = "webhook_configs"
id: Mapped[uuid.UUID] = mapped_column(
primary_key=True, default=_new_uuid
)
event_type: Mapped[str] = mapped_column(String(255), nullable=False)
url: Mapped[str] = mapped_column(String(2048), nullable=False)
headers: Mapped[dict | None] = mapped_column(JSON, nullable=True)
is_active: Mapped[bool] = mapped_column(Boolean, default=True, nullable=False)
__table_args__ = (
Index("ix_webhook_configs_event_type", "event_type"),
)

16
backend/requirements.txt Normal file
View file

@ -0,0 +1,16 @@
# PromptLooper — Backend Dependencies
fastapi>=0.115,<1.0
uvicorn[standard]>=0.32,<1.0
sqlalchemy>=2.0,<3.0
alembic>=1.14,<2.0
pydantic>=2.0,<3.0
pydantic-settings>=2.0,<3.0
python-jose[cryptography]>=3.3,<4.0
passlib[bcrypt]>=1.7,<2.0
celery>=5.4,<6.0
redis>=5.0,<6.0
httpx>=0.27,<1.0
websockets>=13.0,<14.0
psycopg2-binary>=2.9,<3.0
aiosqlite>=0.20,<1.0
python-multipart>=0.0.9

View file

23
backend/routers/admin.py Normal file
View file

@ -0,0 +1,23 @@
"""Admin router — system settings and stats."""
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/settings", status_code=501)
def get_settings():
"""System settings (guest access, default model, etc.)."""
return Response(status_code=501, content="Not Implemented")
@router.put("/settings", status_code=501)
def update_settings():
"""Update settings."""
return Response(status_code=501, content="Not Implemented")
@router.get("/stats", status_code=501)
def get_stats():
"""System-wide stats (total runs, cache hit rate, etc.)."""
return Response(status_code=501, content="Not Implemented")

23
backend/routers/auth.py Normal file
View file

@ -0,0 +1,23 @@
"""Auth router — setup, login, and current user info."""
from fastapi import APIRouter, Response
router = APIRouter()
@router.post("/setup", status_code=501)
def setup():
"""First-boot admin password setup."""
return Response(status_code=501, content="Not Implemented")
@router.post("/login", status_code=501)
def login():
"""Login, returns JWT."""
return Response(status_code=501, content="Not Implemented")
@router.get("/me", status_code=501)
def me():
"""Current user info."""
return Response(status_code=501, content="Not Implemented")

View file

@ -0,0 +1,37 @@
"""Endpoints router — LLM target management."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/", status_code=501)
def list_endpoints():
"""List configured LLM endpoints."""
return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
def create_endpoint():
"""Add endpoint (URL, API key, label)."""
return Response(status_code=501, content="Not Implemented")
@router.put("/{endpoint_id}", status_code=501)
def update_endpoint(endpoint_id: uuid.UUID):
"""Update endpoint."""
return Response(status_code=501, content="Not Implemented")
@router.delete("/{endpoint_id}", status_code=501)
def delete_endpoint(endpoint_id: uuid.UUID):
"""Remove endpoint."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{endpoint_id}/test", status_code=501)
def test_endpoint(endpoint_id: uuid.UUID):
"""Test connectivity and list available models."""
return Response(status_code=501, content="Not Implemented")

View file

@ -0,0 +1,61 @@
"""Experiments router — CRUD and sweep controls."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/", status_code=501)
def list_experiments():
"""List experiments (filter by project)."""
return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
def create_experiment():
"""Create experiment."""
return Response(status_code=501, content="Not Implemented")
@router.get("/{experiment_id}", status_code=501)
def get_experiment(experiment_id: uuid.UUID):
"""Experiment detail with run summaries."""
return Response(status_code=501, content="Not Implemented")
@router.put("/{experiment_id}", status_code=501)
def update_experiment(experiment_id: uuid.UUID):
"""Update experiment config."""
return Response(status_code=501, content="Not Implemented")
@router.delete("/{experiment_id}", status_code=501)
def delete_experiment(experiment_id: uuid.UUID):
"""Delete experiment."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/sweep", status_code=501)
def start_sweep(experiment_id: uuid.UUID):
"""Start a sweep (grid, random, or guided)."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/pause", status_code=501)
def pause_sweep(experiment_id: uuid.UUID):
"""Pause running sweep."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/resume", status_code=501)
def resume_sweep(experiment_id: uuid.UUID):
"""Resume paused sweep."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{experiment_id}/stop", status_code=501)
def stop_sweep(experiment_id: uuid.UUID):
"""Stop sweep."""
return Response(status_code=501, content="Not Implemented")

31
backend/routers/export.py Normal file
View file

@ -0,0 +1,31 @@
"""Export router — export experiment results in various formats."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/experiments/{experiment_id}/best", status_code=501)
def export_best(experiment_id: uuid.UUID):
"""Best config as JSON."""
return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/env", status_code=501)
def export_env(experiment_id: uuid.UUID):
"""Best config as .env snippet."""
return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/yaml", status_code=501)
def export_yaml(experiment_id: uuid.UUID):
"""Best config as YAML."""
return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/report", status_code=501)
def export_report(experiment_id: uuid.UUID):
"""Full experiment report (markdown)."""
return Response(status_code=501, content="Not Implemented")

View file

@ -0,0 +1,37 @@
"""Projects router — CRUD for projects."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/", status_code=501)
def list_projects():
"""List projects."""
return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
def create_project():
"""Create project."""
return Response(status_code=501, content="Not Implemented")
@router.get("/{project_id}", status_code=501)
def get_project(project_id: uuid.UUID):
"""Project detail with experiment summaries."""
return Response(status_code=501, content="Not Implemented")
@router.put("/{project_id}", status_code=501)
def update_project(project_id: uuid.UUID):
"""Update project."""
return Response(status_code=501, content="Not Implemented")
@router.delete("/{project_id}", status_code=501)
def delete_project(project_id: uuid.UUID):
"""Delete project and all experiments."""
return Response(status_code=501, content="Not Implemented")

37
backend/routers/runs.py Normal file
View file

@ -0,0 +1,37 @@
"""Runs router — execute, detail, score, and leaderboard."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/experiments/{experiment_id}/runs", status_code=501)
def list_runs(experiment_id: uuid.UUID):
"""List runs with scores (sortable, filterable)."""
return Response(status_code=501, content="Not Implemented")
@router.get("/{run_id}", status_code=501)
def get_run(run_id: uuid.UUID):
"""Run detail with stage results."""
return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
def create_run():
"""Execute a single run (ad-hoc)."""
return Response(status_code=501, content="Not Implemented")
@router.post("/{run_id}/score", status_code=501)
def score_run(run_id: uuid.UUID):
"""Add human rating to a run."""
return Response(status_code=501, content="Not Implemented")
@router.get("/experiments/{experiment_id}/leaderboard", status_code=501)
def leaderboard(experiment_id: uuid.UUID):
"""Top runs ranked by weighted score."""
return Response(status_code=501, content="Not Implemented")

View file

@ -0,0 +1,25 @@
"""Webhooks router — manage webhook configurations."""
import uuid
from fastapi import APIRouter, Response
router = APIRouter()
@router.get("/", status_code=501)
def list_webhooks():
"""List webhook configs."""
return Response(status_code=501, content="Not Implemented")
@router.post("/", status_code=501)
def create_webhook():
"""Create webhook."""
return Response(status_code=501, content="Not Implemented")
@router.delete("/{webhook_id}", status_code=501)
def delete_webhook(webhook_id: uuid.UUID):
"""Remove webhook."""
return Response(status_code=501, content="Not Implemented")

298
backend/schemas.py Normal file
View file

@ -0,0 +1,298 @@
"""PromptLooper Pydantic request/response schemas."""
import uuid
from datetime import datetime
from pydantic import BaseModel, ConfigDict, Field
from models import ExperimentStatus, RunStatus
# ---------------------------------------------------------------------------
# Shared mixins
# ---------------------------------------------------------------------------
class _TimestampMixin(BaseModel):
created_at: datetime
updated_at: datetime
# ---------------------------------------------------------------------------
# Project
# ---------------------------------------------------------------------------
class ProjectCreate(BaseModel):
name: str = Field(..., min_length=1, max_length=255)
description: str | None = None
class ProjectUpdate(BaseModel):
name: str | None = Field(None, min_length=1, max_length=255)
description: str | None = None
class ProjectResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
name: str
description: str | None
owner_id: uuid.UUID
created_at: datetime
updated_at: datetime
class ProjectListResponse(BaseModel):
items: list[ProjectResponse]
total: int
# ---------------------------------------------------------------------------
# Experiment
# ---------------------------------------------------------------------------
class ExperimentCreate(BaseModel):
name: str = Field(..., min_length=1, max_length=255)
description: str | None = None
sample_data: dict | None = None
pipeline_stages: dict | None = None
scoring_config: dict | None = None
parameter_space: dict | None = None
class ExperimentUpdate(BaseModel):
name: str | None = Field(None, min_length=1, max_length=255)
description: str | None = None
sample_data: dict | None = None
pipeline_stages: dict | None = None
scoring_config: dict | None = None
parameter_space: dict | None = None
status: ExperimentStatus | None = None
class ExperimentResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
project_id: uuid.UUID
name: str
description: str | None
sample_data: dict | None
pipeline_stages: dict | None
scoring_config: dict | None
parameter_space: dict | None
status: ExperimentStatus
created_at: datetime
updated_at: datetime
class ExperimentListResponse(BaseModel):
items: list[ExperimentResponse]
total: int
# ---------------------------------------------------------------------------
# Run
# ---------------------------------------------------------------------------
class RunResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
experiment_id: uuid.UUID
config_hash: str
config: dict
status: RunStatus
started_at: datetime | None
completed_at: datetime | None
duration_ms: int | None
tokens_in: int | None
tokens_out: int | None
cost_estimate: float | None
class RunListResponse(BaseModel):
items: list[RunResponse]
total: int
# ---------------------------------------------------------------------------
# StageResult (read-only, returned inside Run details)
# ---------------------------------------------------------------------------
class StageResultResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
run_id: uuid.UUID
stage_index: int
prompt_sent: str
response_raw: str
model_used: str
parameters: dict | None
tokens_in: int | None
tokens_out: int | None
latency_ms: int | None
class RunDetailResponse(RunResponse):
"""Run with nested stage results and scores."""
stage_results: list[StageResultResponse] = []
scores: list["ScoreResponse"] = []
# ---------------------------------------------------------------------------
# Score
# ---------------------------------------------------------------------------
class ScoreInput(BaseModel):
scorer_name: str = Field(..., min_length=1, max_length=255)
value: float
metadata: dict | None = None
class ScoreResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
run_id: uuid.UUID
scorer_name: str
value: float
scorer_metadata: dict | None
created_at: datetime
# ---------------------------------------------------------------------------
# Endpoint (LLM endpoint configuration)
# ---------------------------------------------------------------------------
class EndpointCreate(BaseModel):
name: str = Field(..., min_length=1, max_length=255)
url: str = Field(..., min_length=1, max_length=2048)
api_key: str | None = None
default_model: str | None = Field(None, max_length=255)
class EndpointUpdate(BaseModel):
name: str | None = Field(None, min_length=1, max_length=255)
url: str | None = Field(None, min_length=1, max_length=2048)
api_key: str | None = None
default_model: str | None = Field(None, max_length=255)
class EndpointResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
name: str
url: str
default_model: str | None
class EndpointListResponse(BaseModel):
items: list[EndpointResponse]
total: int
# ---------------------------------------------------------------------------
# Webhook
# ---------------------------------------------------------------------------
class WebhookCreate(BaseModel):
event_type: str = Field(..., min_length=1, max_length=255)
url: str = Field(..., min_length=1, max_length=2048)
headers: dict | None = None
is_active: bool = True
class WebhookUpdate(BaseModel):
event_type: str | None = Field(None, min_length=1, max_length=255)
url: str | None = Field(None, min_length=1, max_length=2048)
headers: dict | None = None
is_active: bool | None = None
class WebhookResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
event_type: str
url: str
headers: dict | None
is_active: bool
class WebhookListResponse(BaseModel):
items: list[WebhookResponse]
total: int
# ---------------------------------------------------------------------------
# Auth
# ---------------------------------------------------------------------------
class SetupRequest(BaseModel):
username: str = Field(..., min_length=1, max_length=255)
password: str = Field(..., min_length=8)
class LoginRequest(BaseModel):
username: str
password: str
class TokenResponse(BaseModel):
access_token: str
token_type: str = "bearer"
class UserResponse(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
username: str
is_admin: bool
created_at: datetime
# ---------------------------------------------------------------------------
# Export
# ---------------------------------------------------------------------------
class ExportRunRow(BaseModel):
"""Flat row for CSV/JSON export of run results."""
run_id: uuid.UUID
experiment_id: uuid.UUID
config_hash: str
config: dict
status: RunStatus
duration_ms: int | None = None
tokens_in: int | None = None
tokens_out: int | None = None
cost_estimate: float | None = None
scores: dict[str, float] = Field(
default_factory=dict,
description="Map of scorer_name → value",
)
class ExportResponse(BaseModel):
experiment_id: uuid.UUID
experiment_name: str
rows: list[ExportRunRow]
# ---------------------------------------------------------------------------
# Health
# ---------------------------------------------------------------------------
class HealthResponse(BaseModel):
status: str = "ok"
database: bool
redis: bool
# Rebuild forward refs for RunDetailResponse
RunDetailResponse.model_rebuild()

View file

View file

@ -0,0 +1,107 @@
"""Tests for Alembic migration setup."""
import os
from pathlib import Path
import pytest
from alembic import command
from alembic.config import Config
from sqlalchemy import create_engine, inspect
# Resolve the repo root regardless of where pytest is invoked from.
_REPO_ROOT = Path(__file__).resolve().parents[2]
@pytest.fixture()
def alembic_cfg(tmp_path):
"""Create an Alembic config pointing at a temporary SQLite database."""
db_path = tmp_path / "test.db"
db_url = f"sqlite:///{db_path}"
cfg = Config(str(_REPO_ROOT / "alembic.ini"))
cfg.set_main_option("script_location", str(_REPO_ROOT / "alembic"))
cfg.set_main_option("sqlalchemy.url", db_url)
return cfg, db_url
def test_upgrade_head_creates_all_tables(alembic_cfg):
"""Running 'upgrade head' should create all expected tables."""
cfg, db_url = alembic_cfg
command.upgrade(cfg, "head")
engine = create_engine(db_url)
inspector = inspect(engine)
tables = set(inspector.get_table_names())
expected = {
"alembic_version",
"users",
"projects",
"experiments",
"runs",
"stage_results",
"scores",
"response_cache",
"webhook_configs",
}
assert expected == tables
def test_downgrade_base_removes_all_tables(alembic_cfg):
"""Running 'downgrade base' should remove all application tables."""
cfg, db_url = alembic_cfg
command.upgrade(cfg, "head")
command.downgrade(cfg, "base")
engine = create_engine(db_url)
inspector = inspect(engine)
tables = set(inspector.get_table_names())
# Only alembic_version should remain
assert tables == {"alembic_version"}
def test_runs_table_has_expected_columns(alembic_cfg):
"""Spot-check that the runs table has key columns."""
cfg, db_url = alembic_cfg
command.upgrade(cfg, "head")
engine = create_engine(db_url)
inspector = inspect(engine)
columns = {c["name"] for c in inspector.get_columns("runs")}
assert "id" in columns
assert "experiment_id" in columns
assert "config_hash" in columns
assert "status" in columns
assert "cost_estimate" in columns
def test_indexes_created(alembic_cfg):
"""Verify key indexes exist after migration."""
cfg, db_url = alembic_cfg
command.upgrade(cfg, "head")
engine = create_engine(db_url)
inspector = inspect(engine)
run_indexes = {idx["name"] for idx in inspector.get_indexes("runs")}
assert "ix_runs_config_hash" in run_indexes
assert "ix_runs_experiment_id" in run_indexes
score_indexes = {idx["name"] for idx in inspector.get_indexes("scores")}
assert "ix_scores_run_id" in score_indexes
assert "ix_scores_scorer_name" in score_indexes
def test_foreign_keys_on_experiments(alembic_cfg):
"""Verify experiments table has FK to projects."""
cfg, db_url = alembic_cfg
command.upgrade(cfg, "head")
engine = create_engine(db_url)
inspector = inspect(engine)
fks = inspector.get_foreign_keys("experiments")
referred_tables = {fk["referred_table"] for fk in fks}
assert "projects" in referred_tables

238
backend/tests/test_auth.py Normal file
View file

@ -0,0 +1,238 @@
"""Tests for backend/auth.py — JWT, API key, setup flow, and auth dependency."""
import os
from datetime import timedelta
from unittest.mock import patch
import pytest
from fastapi import FastAPI, Depends
from fastapi.testclient import TestClient
@pytest.fixture(autouse=True)
def _isolate_settings(tmp_path):
"""Ensure tests use a temp SQLite DB and no Redis."""
env = {
"DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
"REDIS_URL": "",
"DATA_DIR": str(tmp_path),
"JWT_SECRET": "test-secret-key-for-jwt-signing",
"API_KEY": "test-api-key-12345",
}
with patch.dict(os.environ, env, clear=False):
import config
new_settings = config.Settings(_env_file=None)
config.settings = new_settings
import main
main.settings = new_settings
main._init_db()
main._init_redis()
from models import Base
Base.metadata.create_all(bind=main.engine)
# Also patch auth module's settings reference
import auth
auth.settings = new_settings
yield
@pytest.fixture
def db_session():
from main import get_db
gen = get_db()
session = next(gen)
yield session
try:
next(gen)
except StopIteration:
pass
# ---------------------------------------------------------------------------
# Password hashing
# ---------------------------------------------------------------------------
class TestPasswordHashing:
def test_hash_and_verify(self):
from auth import hash_password, verify_password
hashed = hash_password("my-secret-password")
assert hashed != "my-secret-password"
assert verify_password("my-secret-password", hashed)
def test_wrong_password_fails(self):
from auth import hash_password, verify_password
hashed = hash_password("correct-password")
assert not verify_password("wrong-password", hashed)
# ---------------------------------------------------------------------------
# JWT
# ---------------------------------------------------------------------------
class TestJWT:
def test_create_and_decode_token(self):
from auth import create_access_token, decode_access_token
token = create_access_token("user-123")
assert decode_access_token(token) == "user-123"
def test_expired_token_raises(self):
from auth import create_access_token, decode_access_token
token = create_access_token("user-123", expires_delta=timedelta(seconds=-1))
with pytest.raises(Exception) as exc_info:
decode_access_token(token)
assert exc_info.value.status_code == 401
def test_invalid_token_raises(self):
from auth import decode_access_token
with pytest.raises(Exception) as exc_info:
decode_access_token("not-a-valid-token")
assert exc_info.value.status_code == 401
def test_token_without_sub_raises(self):
from jose import jwt
import config
token = jwt.encode({"foo": "bar"}, config.settings.jwt_secret, algorithm="HS256")
from auth import decode_access_token
with pytest.raises(Exception) as exc_info:
decode_access_token(token)
assert exc_info.value.status_code == 401
# ---------------------------------------------------------------------------
# First-boot setup
# ---------------------------------------------------------------------------
class TestSetup:
def test_needs_setup_true_when_no_users(self, db_session):
from auth import needs_setup
assert needs_setup(db_session) is True
def test_create_admin_succeeds(self, db_session):
from auth import create_admin, needs_setup
user = create_admin(db_session, "admin", "password123")
assert user.username == "admin"
assert user.is_admin is True
assert needs_setup(db_session) is False
def test_create_admin_twice_raises_409(self, db_session):
from auth import create_admin
create_admin(db_session, "admin", "password123")
with pytest.raises(Exception) as exc_info:
create_admin(db_session, "admin2", "password456")
assert exc_info.value.status_code == 409
def test_admin_password_is_hashed(self, db_session):
from auth import create_admin
user = create_admin(db_session, "admin", "password123")
assert user.password_hash != "password123"
assert user.password_hash.startswith("$2b$")
# ---------------------------------------------------------------------------
# Authenticate user (login)
# ---------------------------------------------------------------------------
class TestAuthenticateUser:
def test_valid_credentials(self, db_session):
from auth import create_admin, authenticate_user
create_admin(db_session, "admin", "password123")
user = authenticate_user(db_session, "admin", "password123")
assert user.username == "admin"
def test_wrong_password_raises_401(self, db_session):
from auth import create_admin, authenticate_user
create_admin(db_session, "admin", "password123")
with pytest.raises(Exception) as exc_info:
authenticate_user(db_session, "admin", "wrong")
assert exc_info.value.status_code == 401
def test_unknown_user_raises_401(self, db_session):
from auth import authenticate_user
with pytest.raises(Exception) as exc_info:
authenticate_user(db_session, "nonexistent", "password")
assert exc_info.value.status_code == 401
# ---------------------------------------------------------------------------
# get_current_user dependency (integration via test app)
# ---------------------------------------------------------------------------
@pytest.fixture
def auth_app():
"""Create a minimal FastAPI app with a protected endpoint for testing auth."""
from auth import get_current_user
from schemas import UserResponse
test_app = FastAPI()
@test_app.get("/protected")
def protected(user=Depends(get_current_user)):
return {"user_id": str(user.id), "username": user.username}
return test_app
@pytest.fixture
def auth_client(auth_app):
return TestClient(auth_app)
class TestGetCurrentUser:
def test_no_auth_returns_401(self, auth_client):
resp = auth_client.get("/protected")
assert resp.status_code == 401
assert "Missing authentication" in resp.json()["detail"]
def test_invalid_bearer_format_returns_401(self, auth_client):
resp = auth_client.get("/protected", headers={"Authorization": "NotBearer token"})
assert resp.status_code == 401
def test_jwt_auth_succeeds(self, auth_client, db_session):
from auth import create_admin, create_access_token
user = create_admin(db_session, "admin", "password123")
token = create_access_token(str(user.id))
resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
assert resp.status_code == 200
assert resp.json()["username"] == "admin"
def test_jwt_for_deleted_user_returns_401(self, auth_client, db_session):
from auth import create_access_token
import uuid
token = create_access_token(str(uuid.uuid4()))
resp = auth_client.get("/protected", headers={"Authorization": f"Bearer {token}"})
assert resp.status_code == 401
def test_api_key_auth_succeeds(self, auth_client, db_session):
from auth import create_admin
create_admin(db_session, "admin", "password123")
resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
assert resp.status_code == 200
assert resp.json()["username"] == "admin"
def test_wrong_api_key_returns_401(self, auth_client):
resp = auth_client.get("/protected", headers={"X-Api-Key": "wrong-key"})
assert resp.status_code == 401
def test_api_key_without_admin_returns_401(self, auth_client):
# No admin user created yet
resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
assert resp.status_code == 401
def test_api_key_disabled_when_not_configured(self, auth_client, db_session):
"""When API_KEY is not set in config, API key auth should fail."""
from auth import create_admin
import config, auth
create_admin(db_session, "admin", "password123")
old_key = config.settings.api_key
config.settings.api_key = None
auth.settings = config.settings
try:
resp = auth_client.get("/protected", headers={"X-Api-Key": "test-api-key-12345"})
assert resp.status_code == 401
finally:
config.settings.api_key = old_key
auth.settings = config.settings

View file

@ -0,0 +1,105 @@
"""Tests for backend/config.py."""
import os
from unittest.mock import patch
import pytest
from pydantic_settings import BaseSettings
from config import Settings
class TestSettings:
"""Test the Settings configuration class."""
def _make_settings(self, **env_vars: str) -> Settings:
"""Create a Settings instance with specific env vars, ignoring .env file."""
with patch.dict(os.environ, env_vars, clear=False):
return Settings(_env_file=None)
def test_defaults(self) -> None:
s = self._make_settings()
assert s.database_url is None
assert s.redis_url is None
assert s.host == "0.0.0.0"
assert s.port == 8400
assert s.api_key is None
assert s.default_endpoint_url is None
assert s.default_endpoint_key is None
assert s.max_concurrent_runs == 4
assert s.max_tokens_per_sweep == 0
assert s.data_dir == "/data"
assert s.mcp_enabled is True
assert s.mcp_port == 8401
def test_jwt_secret_auto_generated(self) -> None:
s = self._make_settings()
assert len(s.jwt_secret) > 0
def test_jwt_secret_auto_generated_unique(self) -> None:
s1 = self._make_settings()
s2 = self._make_settings()
assert s1.jwt_secret != s2.jwt_secret
def test_jwt_secret_from_env(self) -> None:
s = self._make_settings(JWT_SECRET="my-secret-key")
assert s.jwt_secret == "my-secret-key"
def test_sqlite_fallback_when_no_database_url(self) -> None:
s = self._make_settings(DATA_DIR="/tmp/test")
url = s.effective_database_url
assert url.startswith("sqlite:///")
assert url.endswith("promptlooper.db")
assert "tmp" in url and "test" in url
assert s.is_sqlite is True
def test_postgres_when_database_url_set(self) -> None:
url = "postgresql://user:pass@localhost:5432/promptlooper"
s = self._make_settings(DATABASE_URL=url)
assert s.effective_database_url == url
assert s.is_sqlite is False
def test_in_process_queue_when_no_redis(self) -> None:
s = self._make_settings()
assert s.use_in_process_queue is True
def test_celery_queue_when_redis_set(self) -> None:
s = self._make_settings(REDIS_URL="redis://localhost:6379/0")
assert s.use_in_process_queue is False
assert s.redis_url == "redis://localhost:6379/0"
def test_empty_api_key_becomes_none(self) -> None:
s = self._make_settings(API_KEY="")
assert s.api_key is None
def test_whitespace_api_key_becomes_none(self) -> None:
s = self._make_settings(API_KEY=" ")
assert s.api_key is None
def test_valid_api_key_preserved(self) -> None:
s = self._make_settings(API_KEY="sk-test-123")
assert s.api_key == "sk-test-123"
def test_env_overrides(self) -> None:
s = self._make_settings(
HOST="127.0.0.1",
PORT="9000",
MAX_CONCURRENT_RUNS="8",
MAX_TOKENS_PER_SWEEP="100000",
MCP_ENABLED="false",
MCP_PORT="9001",
)
assert s.host == "127.0.0.1"
assert s.port == 9000
assert s.max_concurrent_runs == 8
assert s.max_tokens_per_sweep == 100000
assert s.mcp_enabled is False
assert s.mcp_port == 9001
def test_default_endpoint_config(self) -> None:
s = self._make_settings(
DEFAULT_ENDPOINT_URL="http://localhost:11434/v1",
DEFAULT_ENDPOINT_KEY="sk-key",
)
assert s.default_endpoint_url == "http://localhost:11434/v1"
assert s.default_endpoint_key == "sk-key"

129
backend/tests/test_main.py Normal file
View file

@ -0,0 +1,129 @@
"""Tests for backend/main.py — FastAPI application."""
import os
from unittest.mock import patch
import pytest
from fastapi.testclient import TestClient
@pytest.fixture(autouse=True)
def _isolate_settings(tmp_path):
"""Ensure tests use a temp SQLite DB and no Redis."""
env = {
"DATABASE_URL": f"sqlite:///{tmp_path / 'test.db'}",
"REDIS_URL": "",
"DATA_DIR": str(tmp_path),
}
with patch.dict(os.environ, env, clear=False):
# Reload settings so it picks up test env
import config
new_settings = config.Settings(_env_file=None)
config.settings = new_settings
# Patch main's reference too
import main
main.settings = new_settings
main._init_db()
main._init_redis()
# Create tables
from models import Base
Base.metadata.create_all(bind=main.engine)
yield
@pytest.fixture
def client():
from main import app
return TestClient(app)
class TestHealthEndpoint:
def test_health_returns_ok(self, client):
resp = client.get("/health")
assert resp.status_code == 200
data = resp.json()
assert data["status"] == "ok"
assert data["database"] is True
assert data["redis"] is True # in-process mode counts as ok
def test_health_response_schema(self, client):
resp = client.get("/health")
data = resp.json()
assert set(data.keys()) == {"status", "database", "redis"}
class TestCORSMiddleware:
def test_cors_headers_present(self, client):
resp = client.options(
"/health",
headers={
"Origin": "http://localhost:3000",
"Access-Control-Request-Method": "GET",
},
)
assert "access-control-allow-origin" in resp.headers
class TestWebSocket:
def test_websocket_connect_and_echo(self, client):
with client.websocket_connect("/ws") as ws:
ws.send_json({"type": "ping"})
data = ws.receive_json()
assert data["type"] == "ack"
assert data["data"]["type"] == "ping"
def test_websocket_disconnect_cleanup(self, client):
from main import ws_manager
initial_count = len(ws_manager.active_connections)
with client.websocket_connect("/ws") as ws:
assert len(ws_manager.active_connections) == initial_count + 1
# After disconnect, connection should be removed
assert len(ws_manager.active_connections) == initial_count
class TestRouterMounting:
def test_openapi_schema_loads(self, client):
resp = client.get("/openapi.json")
assert resp.status_code == 200
schema = resp.json()
assert schema["info"]["title"] == "PromptLooper"
def test_unknown_route_returns_404(self, client):
resp = client.get("/api/nonexistent")
assert resp.status_code == 404
class TestConnectionManager:
def test_broadcast_removes_dead_connections(self):
"""ConnectionManager.broadcast skips and removes broken connections."""
from main import ConnectionManager
manager = ConnectionManager()
# No connections — broadcast should not raise
import asyncio
asyncio.get_event_loop().run_until_complete(
manager.broadcast({"test": True})
)
assert len(manager.active_connections) == 0
class TestGetDb:
def test_get_db_yields_session(self):
from main import get_db
gen = get_db()
session = next(gen)
assert session is not None
# Clean up
try:
next(gen)
except StopIteration:
pass
class TestGetRedis:
def test_get_redis_returns_none_in_process_mode(self):
from main import get_redis
# In test setup, Redis is not configured
assert get_redis() is None

View file

@ -0,0 +1,359 @@
"""Tests for SQLAlchemy ORM models."""
import uuid
from datetime import datetime, timezone
from sqlalchemy import create_engine, inspect
from sqlalchemy.orm import Session
from models import (
Base,
Experiment,
ExperimentStatus,
Project,
ResponseCache,
Run,
RunStatus,
Score,
StageResult,
User,
WebhookConfig,
)
def _engine():
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
return engine
def _session(engine):
return Session(engine)
# ---------------------------------------------------------------------------
# Table existence
# ---------------------------------------------------------------------------
def test_all_tables_created():
engine = _engine()
table_names = inspect(engine).get_table_names()
expected = {
"users",
"projects",
"experiments",
"runs",
"stage_results",
"scores",
"response_cache",
"webhook_configs",
}
assert expected.issubset(set(table_names))
# ---------------------------------------------------------------------------
# User
# ---------------------------------------------------------------------------
def test_user_creation():
engine = _engine()
with _session(engine) as session:
user = User(username="admin", password_hash="hashed", is_admin=True)
session.add(user)
session.commit()
assert isinstance(user.id, uuid.UUID)
assert user.username == "admin"
assert user.is_admin is True
assert isinstance(user.created_at, datetime)
def test_user_username_unique():
engine = _engine()
with _session(engine) as session:
session.add(User(username="dup", password_hash="h1"))
session.commit()
session.add(User(username="dup", password_hash="h2"))
try:
session.commit()
assert False, "Should have raised IntegrityError"
except Exception:
session.rollback()
# ---------------------------------------------------------------------------
# Project
# ---------------------------------------------------------------------------
def test_project_with_owner():
engine = _engine()
with _session(engine) as session:
user = User(username="owner", password_hash="h")
project = Project(name="Test Project", description="A test", owner=user)
session.add(project)
session.commit()
assert project.owner_id == user.id
assert project.name == "Test Project"
assert isinstance(project.updated_at, datetime)
def test_project_cascade_delete_from_user():
engine = _engine()
with _session(engine) as session:
user = User(username="owner", password_hash="h")
project = Project(name="P1", owner=user)
session.add(project)
session.commit()
project_id = project.id
session.delete(user)
session.commit()
assert session.get(Project, project_id) is None
# ---------------------------------------------------------------------------
# Experiment
# ---------------------------------------------------------------------------
def test_experiment_defaults():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(
project=project,
name="Exp1",
sample_data={"inputs": ["hello"]},
pipeline_stages=[{"prompt": "test"}],
scoring_config={"scorers": ["keyword"]},
parameter_space={"temperature": [0.1, 0.5]},
)
session.add(exp)
session.commit()
assert exp.status == ExperimentStatus.draft
assert exp.sample_data == {"inputs": ["hello"]}
assert isinstance(exp.created_at, datetime)
def test_experiment_cascade_delete_from_project():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
session.add(exp)
session.commit()
exp_id = exp.id
session.delete(project)
session.commit()
assert session.get(Experiment, exp_id) is None
# ---------------------------------------------------------------------------
# Run
# ---------------------------------------------------------------------------
def test_run_creation():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
run = Run(
experiment=exp,
config_hash="a" * 64,
config={"model": "gpt-4", "temperature": 0.5},
status=RunStatus.completed,
duration_ms=1200,
tokens_in=100,
tokens_out=50,
)
session.add(run)
session.commit()
assert run.status == RunStatus.completed
assert run.config["model"] == "gpt-4"
def test_run_default_status():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
run = Run(experiment=exp, config_hash="b" * 64, config={})
session.add(run)
session.commit()
assert run.status == RunStatus.pending
# ---------------------------------------------------------------------------
# StageResult
# ---------------------------------------------------------------------------
def test_stage_result():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
run = Run(experiment=exp, config_hash="c" * 64, config={})
sr = StageResult(
run=run,
stage_index=0,
prompt_sent="Hello",
response_raw="World",
model_used="gpt-4",
parameters={"temperature": 0.5},
tokens_in=10,
tokens_out=5,
latency_ms=200,
)
session.add(sr)
session.commit()
assert sr.stage_index == 0
assert sr.model_used == "gpt-4"
assert len(run.stage_results) == 1
# ---------------------------------------------------------------------------
# Score
# ---------------------------------------------------------------------------
def test_score():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
run = Run(experiment=exp, config_hash="d" * 64, config={})
score = Score(
run=run,
scorer_name="embedding_similarity",
value=0.87,
scorer_metadata={"reference_id": "ref1"},
)
session.add(score)
session.commit()
assert score.value == 0.87
assert score.scorer_name == "embedding_similarity"
assert len(run.scores) == 1
# ---------------------------------------------------------------------------
# ResponseCache
# ---------------------------------------------------------------------------
def test_response_cache():
engine = _engine()
with _session(engine) as session:
cache = ResponseCache(
config_hash="e" * 64,
response="cached response",
model="gpt-4",
tokens_in=50,
tokens_out=25,
latency_ms=300,
)
session.add(cache)
session.commit()
fetched = session.get(ResponseCache, "e" * 64)
assert fetched is not None
assert fetched.response == "cached response"
def test_response_cache_pk_is_config_hash():
engine = _engine()
with _session(engine) as session:
session.add(
ResponseCache(config_hash="f" * 64, response="r1", model="m1")
)
session.commit()
session.add(
ResponseCache(config_hash="f" * 64, response="r2", model="m2")
)
try:
session.commit()
assert False, "Should have raised IntegrityError"
except Exception:
session.rollback()
# ---------------------------------------------------------------------------
# WebhookConfig
# ---------------------------------------------------------------------------
def test_webhook_config():
engine = _engine()
with _session(engine) as session:
wh = WebhookConfig(
event_type="experiment.completed",
url="https://example.com/hook",
headers={"Authorization": "Bearer token"},
is_active=True,
)
session.add(wh)
session.commit()
assert isinstance(wh.id, uuid.UUID)
assert wh.event_type == "experiment.completed"
assert wh.is_active is True
def test_webhook_config_default_active():
engine = _engine()
with _session(engine) as session:
wh = WebhookConfig(
event_type="run.failed",
url="https://example.com/hook",
)
session.add(wh)
session.commit()
assert wh.is_active is True
# ---------------------------------------------------------------------------
# Relationship cascades: Run → StageResult + Score
# ---------------------------------------------------------------------------
def test_run_cascade_deletes_children():
engine = _engine()
with _session(engine) as session:
user = User(username="u", password_hash="h")
project = Project(name="P", owner=user)
exp = Experiment(project=project, name="E")
run = Run(experiment=exp, config_hash="g" * 64, config={})
sr = StageResult(
run=run, stage_index=0, prompt_sent="p",
response_raw="r", model_used="m",
)
score = Score(run=run, scorer_name="test", value=0.5)
session.add_all([run, sr, score])
session.commit()
sr_id, score_id = sr.id, score.id
session.delete(run)
session.commit()
assert session.get(StageResult, sr_id) is None
assert session.get(Score, score_id) is None

View file

@ -0,0 +1,224 @@
"""Tests for router stubs — verify all routes are mounted and return 501."""
import pytest
from fastapi.testclient import TestClient
@pytest.fixture()
def client(tmp_path, monkeypatch):
"""Create a test client with a temporary database."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
monkeypatch.setenv("DATABASE_URL", "")
monkeypatch.setenv("REDIS_URL", "")
# Reload config to pick up test env
import importlib
import config as config_mod
importlib.reload(config_mod)
import main as main_mod
importlib.reload(main_mod)
with TestClient(main_mod.app) as c:
yield c
# ---- Auth router (/api/auth) ----
def test_auth_setup(client):
resp = client.post("/api/auth/setup")
assert resp.status_code == 501
def test_auth_login(client):
resp = client.post("/api/auth/login")
assert resp.status_code == 501
def test_auth_me(client):
resp = client.get("/api/auth/me")
assert resp.status_code == 501
# ---- Projects router (/api/projects) ----
def test_projects_list(client):
resp = client.get("/api/projects/")
assert resp.status_code == 501
def test_projects_create(client):
resp = client.post("/api/projects/")
assert resp.status_code == 501
def test_projects_get(client):
resp = client.get("/api/projects/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_projects_update(client):
resp = client.put("/api/projects/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_projects_delete(client):
resp = client.delete("/api/projects/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
# ---- Experiments router (/api/experiments) ----
def test_experiments_list(client):
resp = client.get("/api/experiments/")
assert resp.status_code == 501
def test_experiments_create(client):
resp = client.post("/api/experiments/")
assert resp.status_code == 501
def test_experiments_get(client):
resp = client.get("/api/experiments/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_experiments_update(client):
resp = client.put("/api/experiments/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_experiments_delete(client):
resp = client.delete("/api/experiments/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_experiments_sweep(client):
resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/sweep")
assert resp.status_code == 501
def test_experiments_pause(client):
resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/pause")
assert resp.status_code == 501
def test_experiments_resume(client):
resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/resume")
assert resp.status_code == 501
def test_experiments_stop(client):
resp = client.post("/api/experiments/00000000-0000-0000-0000-000000000001/stop")
assert resp.status_code == 501
# ---- Runs router (/api/runs) ----
def test_runs_list(client):
resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/runs")
assert resp.status_code == 501
def test_runs_get(client):
resp = client.get("/api/runs/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_runs_create(client):
resp = client.post("/api/runs/")
assert resp.status_code == 501
def test_runs_score(client):
resp = client.post("/api/runs/00000000-0000-0000-0000-000000000001/score")
assert resp.status_code == 501
def test_runs_leaderboard(client):
resp = client.get("/api/runs/experiments/00000000-0000-0000-0000-000000000001/leaderboard")
assert resp.status_code == 501
# ---- Endpoints router (/api/endpoints) ----
def test_endpoints_list(client):
resp = client.get("/api/endpoints/")
assert resp.status_code == 501
def test_endpoints_create(client):
resp = client.post("/api/endpoints/")
assert resp.status_code == 501
def test_endpoints_update(client):
resp = client.put("/api/endpoints/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_endpoints_delete(client):
resp = client.delete("/api/endpoints/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
def test_endpoints_test(client):
resp = client.post("/api/endpoints/00000000-0000-0000-0000-000000000001/test")
assert resp.status_code == 501
# ---- Export router (/api/export) ----
def test_export_best(client):
resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/best")
assert resp.status_code == 501
def test_export_env(client):
resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/env")
assert resp.status_code == 501
def test_export_yaml(client):
resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/yaml")
assert resp.status_code == 501
def test_export_report(client):
resp = client.get("/api/export/experiments/00000000-0000-0000-0000-000000000001/report")
assert resp.status_code == 501
# ---- Webhooks router (/api/webhooks) ----
def test_webhooks_list(client):
resp = client.get("/api/webhooks/")
assert resp.status_code == 501
def test_webhooks_create(client):
resp = client.post("/api/webhooks/")
assert resp.status_code == 501
def test_webhooks_delete(client):
resp = client.delete("/api/webhooks/00000000-0000-0000-0000-000000000001")
assert resp.status_code == 501
# ---- Admin router (/api/admin) ----
def test_admin_get_settings(client):
resp = client.get("/api/admin/settings")
assert resp.status_code == 501
def test_admin_update_settings(client):
resp = client.put("/api/admin/settings")
assert resp.status_code == 501
def test_admin_stats(client):
resp = client.get("/api/admin/stats")
assert resp.status_code == 501

View file

@ -0,0 +1,339 @@
"""Tests for backend/schemas.py."""
import uuid
from datetime import datetime, timezone
import pytest
from pydantic import ValidationError
from models import ExperimentStatus, RunStatus
from schemas import (
EndpointCreate,
EndpointResponse,
EndpointUpdate,
ExperimentCreate,
ExperimentResponse,
ExperimentUpdate,
ExportResponse,
ExportRunRow,
HealthResponse,
LoginRequest,
ProjectCreate,
ProjectResponse,
ProjectUpdate,
RunDetailResponse,
RunResponse,
ScoreInput,
ScoreResponse,
SetupRequest,
StageResultResponse,
TokenResponse,
UserResponse,
WebhookCreate,
WebhookResponse,
WebhookUpdate,
)
NOW = datetime.now(timezone.utc)
UUID1 = uuid.uuid4()
UUID2 = uuid.uuid4()
# ---------------------------------------------------------------------------
# Project schemas
# ---------------------------------------------------------------------------
class TestProjectSchemas:
def test_create_valid(self) -> None:
p = ProjectCreate(name="My Project", description="desc")
assert p.name == "My Project"
assert p.description == "desc"
def test_create_name_required(self) -> None:
with pytest.raises(ValidationError):
ProjectCreate() # type: ignore[call-arg]
def test_create_empty_name_rejected(self) -> None:
with pytest.raises(ValidationError):
ProjectCreate(name="")
def test_update_partial(self) -> None:
p = ProjectUpdate(name="New Name")
assert p.name == "New Name"
assert p.description is None
def test_response_from_attributes(self) -> None:
class Fake:
id = UUID1
name = "Proj"
description = None
owner_id = UUID2
created_at = NOW
updated_at = NOW
r = ProjectResponse.model_validate(Fake())
assert r.id == UUID1
assert r.name == "Proj"
# ---------------------------------------------------------------------------
# Experiment schemas
# ---------------------------------------------------------------------------
class TestExperimentSchemas:
def test_create_minimal(self) -> None:
e = ExperimentCreate(name="Exp 1")
assert e.name == "Exp 1"
assert e.sample_data is None
def test_create_with_all_fields(self) -> None:
e = ExperimentCreate(
name="Full",
description="desc",
sample_data={"key": "value"},
pipeline_stages={"stages": []},
scoring_config={"scorer": "exact"},
parameter_space={"temp": [0.5, 1.0]},
)
assert e.parameter_space == {"temp": [0.5, 1.0]}
def test_update_status(self) -> None:
e = ExperimentUpdate(status=ExperimentStatus.running)
assert e.status == ExperimentStatus.running
def test_response_from_attributes(self) -> None:
class Fake:
id = UUID1
project_id = UUID2
name = "Exp"
description = None
sample_data = None
pipeline_stages = None
scoring_config = None
parameter_space = None
status = ExperimentStatus.draft
created_at = NOW
updated_at = NOW
r = ExperimentResponse.model_validate(Fake())
assert r.status == ExperimentStatus.draft
# ---------------------------------------------------------------------------
# Run schemas
# ---------------------------------------------------------------------------
class TestRunSchemas:
def test_response_from_attributes(self) -> None:
class Fake:
id = UUID1
experiment_id = UUID2
config_hash = "abc123"
config = {"model": "gpt-4"}
status = RunStatus.completed
started_at = NOW
completed_at = NOW
duration_ms = 1234
tokens_in = 100
tokens_out = 200
cost_estimate = 0.003
r = RunResponse.model_validate(Fake())
assert r.duration_ms == 1234
assert r.cost_estimate == 0.003
def test_detail_response_nested(self) -> None:
data = {
"id": UUID1,
"experiment_id": UUID2,
"config_hash": "abc",
"config": {},
"status": RunStatus.pending,
"started_at": None,
"completed_at": None,
"duration_ms": None,
"tokens_in": None,
"tokens_out": None,
"cost_estimate": None,
"stage_results": [],
"scores": [],
}
r = RunDetailResponse(**data)
assert r.stage_results == []
assert r.scores == []
# ---------------------------------------------------------------------------
# Score schemas
# ---------------------------------------------------------------------------
class TestScoreSchemas:
def test_input_valid(self) -> None:
s = ScoreInput(scorer_name="exact_match", value=0.95, metadata={"note": "ok"})
assert s.value == 0.95
assert s.metadata == {"note": "ok"}
def test_input_missing_name(self) -> None:
with pytest.raises(ValidationError):
ScoreInput(value=0.5) # type: ignore[call-arg]
def test_response_from_attributes(self) -> None:
class Fake:
id = UUID1
run_id = UUID2
scorer_name = "bleu"
value = 0.8
scorer_metadata = {"n": 4}
created_at = NOW
r = ScoreResponse.model_validate(Fake())
assert r.scorer_metadata == {"n": 4}
# ---------------------------------------------------------------------------
# Endpoint schemas
# ---------------------------------------------------------------------------
class TestEndpointSchemas:
def test_create_valid(self) -> None:
e = EndpointCreate(name="OpenAI", url="https://api.openai.com/v1")
assert e.api_key is None
def test_create_empty_name_rejected(self) -> None:
with pytest.raises(ValidationError):
EndpointCreate(name="", url="https://example.com")
def test_update_partial(self) -> None:
e = EndpointUpdate(url="https://new-url.com")
assert e.name is None
# ---------------------------------------------------------------------------
# Webhook schemas
# ---------------------------------------------------------------------------
class TestWebhookSchemas:
def test_create_valid(self) -> None:
w = WebhookCreate(
event_type="run.completed",
url="https://hooks.example.com/promptlooper",
headers={"Authorization": "Bearer xyz"},
)
assert w.is_active is True
def test_create_inactive(self) -> None:
w = WebhookCreate(
event_type="run.failed",
url="https://example.com",
is_active=False,
)
assert w.is_active is False
def test_update_partial(self) -> None:
w = WebhookUpdate(is_active=False)
assert w.event_type is None
assert w.is_active is False
def test_response_from_attributes(self) -> None:
class Fake:
id = UUID1
event_type = "run.completed"
url = "https://example.com"
headers = None
is_active = True
r = WebhookResponse.model_validate(Fake())
assert r.event_type == "run.completed"
# ---------------------------------------------------------------------------
# Auth schemas
# ---------------------------------------------------------------------------
class TestAuthSchemas:
def test_setup_password_min_length(self) -> None:
with pytest.raises(ValidationError):
SetupRequest(username="admin", password="short")
def test_setup_valid(self) -> None:
s = SetupRequest(username="admin", password="securepass123")
assert s.username == "admin"
def test_login_valid(self) -> None:
l = LoginRequest(username="user", password="pass")
assert l.username == "user"
def test_token_response(self) -> None:
t = TokenResponse(access_token="jwt.token.here")
assert t.token_type == "bearer"
def test_user_response_from_attributes(self) -> None:
class Fake:
id = UUID1
username = "admin"
is_admin = True
created_at = NOW
r = UserResponse.model_validate(Fake())
assert r.is_admin is True
# ---------------------------------------------------------------------------
# Export schemas
# ---------------------------------------------------------------------------
class TestExportSchemas:
def test_export_run_row(self) -> None:
row = ExportRunRow(
run_id=UUID1,
experiment_id=UUID2,
config_hash="abc",
config={"model": "gpt-4"},
status=RunStatus.completed,
duration_ms=500,
tokens_in=10,
tokens_out=20,
cost_estimate=0.001,
scores={"exact_match": 1.0, "bleu": 0.85},
)
assert row.scores["bleu"] == 0.85
def test_export_run_row_default_scores(self) -> None:
row = ExportRunRow(
run_id=UUID1,
experiment_id=UUID2,
config_hash="abc",
config={},
status=RunStatus.pending,
)
assert row.scores == {}
def test_export_response(self) -> None:
r = ExportResponse(
experiment_id=UUID1,
experiment_name="Test Exp",
rows=[],
)
assert r.rows == []
# ---------------------------------------------------------------------------
# Health schema
# ---------------------------------------------------------------------------
class TestHealthSchema:
def test_health_response(self) -> None:
h = HealthResponse(database=True, redis=False)
assert h.status == "ok"
assert h.database is True
assert h.redis is False

View file

@ -0,0 +1,138 @@
"""Stack integration verification tests.
These tests verify that all configuration files needed for 'docker compose up'
are present, consistent, and well-formed. They do NOT start actual containers.
"""
import os
from pathlib import Path
import pytest
ROOT = Path(__file__).resolve().parents[2] # repo root
class TestDockerComposeConfig:
"""Verify docker-compose.yml references are satisfied."""
def test_docker_compose_exists(self):
assert (ROOT / "docker-compose.yml").is_file()
def test_dockerfile_exists(self):
assert (ROOT / "docker" / "Dockerfile").is_file()
def test_nginx_conf_exists(self):
assert (ROOT / "docker" / "nginx.conf").is_file()
def test_entrypoint_exists(self):
assert (ROOT / "docker" / "entrypoint.sh").is_file()
def test_requirements_txt_exists(self):
assert (ROOT / "backend" / "requirements.txt").is_file()
def test_alembic_ini_exists(self):
assert (ROOT / "alembic.ini").is_file()
def test_alembic_env_exists(self):
assert (ROOT / "alembic" / "env.py").is_file()
def test_alembic_has_migration(self):
versions = list((ROOT / "alembic" / "versions").glob("*.py"))
assert len(versions) >= 1, "Expected at least one Alembic migration"
class TestDockerfileConsistency:
"""Verify Dockerfile references match actual files."""
def test_dockerfile_copies_backend(self):
content = (ROOT / "docker" / "Dockerfile").read_text()
assert "COPY backend/" in content
def test_dockerfile_copies_alembic(self):
content = (ROOT / "docker" / "Dockerfile").read_text()
assert "COPY alembic/" in content
assert "COPY alembic.ini" in content
def test_dockerfile_copies_entrypoint(self):
content = (ROOT / "docker" / "Dockerfile").read_text()
assert "entrypoint.sh" in content
def test_dockerfile_runs_migrations_via_entrypoint(self):
content = (ROOT / "docker" / "entrypoint.sh").read_text()
assert "alembic upgrade head" in content
class TestNginxConfig:
"""Verify nginx proxies correctly."""
def test_nginx_proxies_api(self):
content = (ROOT / "docker" / "nginx.conf").read_text()
assert "proxy_pass http://promptlooper-api:8000" in content
def test_nginx_proxies_websocket(self):
content = (ROOT / "docker" / "nginx.conf").read_text()
assert "upgrade" in content.lower()
def test_nginx_serves_spa_fallback(self):
content = (ROOT / "docker" / "nginx.conf").read_text()
assert "try_files" in content
assert "/index.html" in content
class TestFrontendBuildability:
"""Verify frontend has all files needed for a build."""
def test_package_json_exists(self):
assert (ROOT / "frontend" / "package.json").is_file()
def test_index_html_exists(self):
assert (ROOT / "frontend" / "index.html").is_file()
def test_main_tsx_exists(self):
assert (ROOT / "frontend" / "src" / "main.tsx").is_file()
def test_app_tsx_exists(self):
assert (ROOT / "frontend" / "src" / "App.tsx").is_file()
def test_all_page_components_exist(self):
pages = [
"SetupPage", "LoginPage", "DashboardPage", "ProjectsPage",
"ExperimentPage", "LivePage", "ComparePage", "AdminPage",
]
for page in pages:
assert (ROOT / "frontend" / "src" / "pages" / f"{page}.tsx").is_file(), f"Missing {page}.tsx"
def test_vite_config_exists(self):
assert (ROOT / "frontend" / "vite.config.ts").is_file()
def test_tailwind_config_exists(self):
assert (ROOT / "frontend" / "tailwind.config.js").is_file()
class TestWorkerConfig:
"""Verify Celery worker module exists and is importable."""
def test_worker_module_exists(self):
assert (ROOT / "backend" / "worker.py").is_file()
class TestHealthEndpoint:
"""Verify /health endpoint works in test mode."""
def test_health_returns_ok(self):
from fastapi.testclient import TestClient
# Ensure backend is importable
import sys
backend_dir = str(ROOT / "backend")
if backend_dir not in sys.path:
sys.path.insert(0, backend_dir)
from main import app
client = TestClient(app)
resp = client.get("/health")
assert resp.status_code == 200
data = resp.json()
assert data["status"] in ("ok", "degraded")
assert "database" in data
assert "redis" in data

View file

@ -0,0 +1,47 @@
"""Tests for backend/worker.py — Celery configuration."""
import importlib
import sys
from unittest.mock import patch
def test_celery_app_is_importable():
"""worker.py exports a celery_app instance."""
# Need to ensure config module is importable
backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
if backend_dir not in sys.path:
sys.path.insert(0, backend_dir)
import worker
assert hasattr(worker, "celery_app")
assert worker.celery_app.main == "promptlooper"
def test_celery_app_serializer_settings():
"""Verify JSON serialization is configured."""
backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
if backend_dir not in sys.path:
sys.path.insert(0, backend_dir)
import worker
assert worker.celery_app.conf.task_serializer == "json"
assert worker.celery_app.conf.result_serializer == "json"
def test_celery_defaults_to_memory_broker_without_redis():
"""Without REDIS_URL, broker falls back to memory://."""
backend_dir = str(__import__("pathlib").Path(__file__).resolve().parents[1])
if backend_dir not in sys.path:
sys.path.insert(0, backend_dir)
with patch.dict("os.environ", {"REDIS_URL": ""}, clear=False):
# Force reload to pick up env change
if "config" in sys.modules:
importlib.reload(sys.modules["config"])
if "worker" in sys.modules:
importlib.reload(sys.modules["worker"])
import worker
# In no-redis mode, broker should be memory://
# (may have been set from settings.redis_url == None)
assert worker.celery_app is not None

View file

30
backend/worker.py Normal file
View file

@ -0,0 +1,30 @@
"""PromptLooper Celery worker configuration."""
from celery import Celery
from config import settings
# Determine broker and backend URLs
broker_url = settings.redis_url or "memory://"
result_backend = settings.redis_url or "cache+memory://"
celery_app = Celery(
"promptlooper",
broker=broker_url,
backend=result_backend,
)
celery_app.conf.update(
task_serializer="json",
accept_content=["json"],
result_serializer="json",
timezone="UTC",
enable_utc=True,
worker_concurrency=settings.max_concurrent_runs,
task_track_started=True,
task_acks_late=True,
worker_prefetch_multiplier=1,
)
# Auto-discover tasks in engine package
celery_app.autodiscover_tasks(["engine"], force=True)

108
docker-compose.yml Normal file
View file

@ -0,0 +1,108 @@
name: xpltd_promptlooper
networks:
promptlooper:
driver: bridge
ipam:
config:
- subnet: 172.33.0.0/24
services:
promptlooper-db:
image: postgres:16-alpine
container_name: promptlooper-db
restart: unless-stopped
networks:
- promptlooper
ports:
- "5434:5432"
environment:
POSTGRES_USER: promptlooper
POSTGRES_PASSWORD: promptlooper
POSTGRES_DB: promptlooper
volumes:
- /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U promptlooper"]
interval: 10s
timeout: 5s
retries: 5
promptlooper-redis:
image: redis:7-alpine
container_name: promptlooper-redis
restart: unless-stopped
networks:
- promptlooper
volumes:
- /vmPool/r/services/promptlooper_redis:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
promptlooper-api:
build:
context: .
dockerfile: docker/Dockerfile
target: api
container_name: promptlooper-api
restart: unless-stopped
networks:
- promptlooper
ports:
- "8401:8401" # MCP server
environment:
DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
REDIS_URL: redis://promptlooper-redis:6379/0
JWT_SECRET: ${JWT_SECRET:-dev-secret-change-in-production}
API_KEY: ${API_KEY:-}
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
MCP_ENABLED: ${MCP_ENABLED:-true}
MCP_PORT: "8401"
depends_on:
promptlooper-db:
condition: service_healthy
promptlooper-redis:
condition: service_healthy
promptlooper-worker:
build:
context: .
dockerfile: docker/Dockerfile
target: api
container_name: promptlooper-worker
restart: unless-stopped
networks:
- promptlooper
command: celery -A worker:celery_app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
working_dir: /app/backend
environment:
DATABASE_URL: postgresql://promptlooper:promptlooper@promptlooper-db:5432/promptlooper
REDIS_URL: redis://promptlooper-redis:6379/0
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
depends_on:
promptlooper-db:
condition: service_healthy
promptlooper-redis:
condition: service_healthy
promptlooper-web:
build:
context: .
dockerfile: docker/Dockerfile
target: web
container_name: promptlooper-web
restart: unless-stopped
networks:
- promptlooper
ports:
- "8400:80"
depends_on:
- promptlooper-api

0
docker/.gitkeep Normal file
View file

67
docker/Dockerfile Normal file
View file

@ -0,0 +1,67 @@
# =============================================================================
# Stage 1: Frontend build
# =============================================================================
FROM node:20-alpine AS frontend-build
WORKDIR /build
COPY frontend/package.json frontend/package-lock.json* ./
RUN npm ci || npm install
COPY frontend/ ./
RUN npm run build
# =============================================================================
# Stage 2: Python API runtime
# =============================================================================
FROM python:3.12-slim AS api
WORKDIR /app
# Install system dependencies for psycopg2 and general use
RUN apt-get update && \
apt-get install -y --no-install-recommends gcc libpq-dev curl && \
rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY backend/requirements.txt /app/backend/requirements.txt
RUN pip install --no-cache-dir -r /app/backend/requirements.txt
# Copy backend source
COPY backend/ /app/backend/
COPY alembic/ /app/alembic/
COPY alembic.ini /app/alembic.ini
# Copy frontend build for single-container mode
COPY --from=frontend-build /build/dist /app/static
# Create data directory for SQLite mode
RUN mkdir -p /data
ENV PYTHONPATH=/app/backend
ENV DATA_DIR=/data
# Entrypoint runs migrations then starts the app
COPY docker/entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
EXPOSE 8000 8401
# Default: run migrations then start the API server
ENTRYPOINT ["/app/entrypoint.sh"]
# =============================================================================
# Stage 3: Nginx frontend (production compose)
# =============================================================================
FROM nginx:1.27-alpine AS web
# Remove default config
RUN rm /etc/nginx/conf.d/default.conf
# Copy custom nginx config
COPY docker/nginx.conf /etc/nginx/conf.d/default.conf
# Copy built frontend assets
COPY --from=frontend-build /build/dist /usr/share/nginx/html
EXPOSE 80

10
docker/entrypoint.sh Normal file
View file

@ -0,0 +1,10 @@
#!/bin/sh
set -e
# Run database migrations
echo "Running database migrations..."
cd /app && alembic upgrade head
# Start the application
echo "Starting PromptLooper API..."
exec uvicorn main:app --host 0.0.0.0 --port 8000 --app-dir /app/backend "$@"

44
docker/nginx.conf Normal file
View file

@ -0,0 +1,44 @@
server {
listen 80;
server_name _;
root /usr/share/nginx/html;
index index.html;
# Frontend static assets
location / {
try_files $uri $uri/ /index.html;
}
# API proxy
location /api/ {
proxy_pass http://promptlooper-api:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# Health endpoint proxy
location /health {
proxy_pass http://promptlooper-api:8000;
proxy_set_header Host $host;
}
# WebSocket proxy
location /ws/ {
proxy_pass http://promptlooper-api:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 86400;
}
# Gzip compression
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml text/javascript;
gzip_min_length 256;
}

23
env.example Normal file
View file

@ -0,0 +1,23 @@
# PromptLooper — Environment Configuration
# Copy to .env and fill in required values
# ── Database ──────────────────────────────────────────────
POSTGRES_USER=promptlooper
POSTGRES_PASSWORD= # REQUIRED: set a strong password
POSTGRES_DB=promptlooper
# ── Auth ──────────────────────────────────────────────────
JWT_SECRET= # REQUIRED: generate with `openssl rand -hex 32`
# ── Default LLM Endpoint (optional) ──────────────────────
# Pre-configure an LLM endpoint so users don't have to add one manually
DEFAULT_ENDPOINT_URL= # e.g. http://chat.forgetyour.name/api/v1
DEFAULT_ENDPOINT_KEY= # API key for the default endpoint
# ── Limits ────────────────────────────────────────────────
MAX_CONCURRENT_RUNS=4 # Parallel run limit per sweep
MAX_TOKENS_PER_SWEEP=0 # 0 = unlimited; set a number to cap token spend
# ── MCP Server ────────────────────────────────────────────
MCP_ENABLED=true # Enable/disable MCP server for agent access
# MCP_PORT=8401 # MCP server port (set in docker-compose)

12
frontend/index.html Normal file
View file

@ -0,0 +1,12 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>PromptLooper</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

3948
frontend/package-lock.json generated Normal file

File diff suppressed because it is too large Load diff

31
frontend/package.json Normal file
View file

@ -0,0 +1,31 @@
{
"name": "promptlooper-frontend",
"private": true,
"version": "0.1.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc && vite build",
"preview": "vite preview",
"test": "vitest run"
},
"dependencies": {
"react": "^18.3.1",
"react-dom": "^18.3.1",
"react-router-dom": "^6.28.0"
},
"devDependencies": {
"@testing-library/jest-dom": "^6.9.1",
"@testing-library/react": "^16.3.2",
"@types/react": "^18.3.12",
"@types/react-dom": "^18.3.1",
"@vitejs/plugin-react": "^4.3.4",
"autoprefixer": "^10.4.20",
"jsdom": "^29.0.2",
"postcss": "^8.4.49",
"tailwindcss": "^3.4.15",
"typescript": "^5.6.3",
"vite": "^6.0.0",
"vitest": "^4.1.2"
}
}

View file

@ -0,0 +1,6 @@
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
};

59
frontend/src/App.test.tsx Normal file
View file

@ -0,0 +1,59 @@
import { render, screen } from "@testing-library/react";
import { MemoryRouter } from "react-router-dom";
import { describe, it, expect } from "vitest";
import App from "./App";
function renderWithRouter(route: string) {
return render(
<MemoryRouter initialEntries={[route]}>
<App />
</MemoryRouter>,
);
}
describe("App routing", () => {
it("renders SetupPage at /setup", () => {
renderWithRouter("/setup");
expect(screen.getByText("PromptLooper Setup")).toBeInTheDocument();
});
it("renders LoginPage at /login", () => {
renderWithRouter("/login");
expect(screen.getByText("Sign In")).toBeInTheDocument();
});
it("renders DashboardPage at /", () => {
renderWithRouter("/");
expect(screen.getByText("Dashboard")).toBeInTheDocument();
});
it("renders ProjectsPage at /projects", () => {
renderWithRouter("/projects");
expect(screen.getByText("Projects")).toBeInTheDocument();
});
it("renders ExperimentPage at /experiments/:id", () => {
renderWithRouter("/experiments/abc-123");
expect(screen.getByText("Experiment")).toBeInTheDocument();
});
it("renders LivePage at /live/:id", () => {
renderWithRouter("/live/abc-123");
expect(screen.getByText("Live")).toBeInTheDocument();
});
it("renders ComparePage at /compare", () => {
renderWithRouter("/compare");
expect(screen.getByText("Compare")).toBeInTheDocument();
});
it("renders AdminPage at /admin", () => {
renderWithRouter("/admin");
expect(screen.getByText("Admin")).toBeInTheDocument();
});
it("redirects unknown routes to dashboard", () => {
renderWithRouter("/nonexistent");
expect(screen.getByText("Dashboard")).toBeInTheDocument();
});
});

25
frontend/src/App.tsx Normal file
View file

@ -0,0 +1,25 @@
import { Routes, Route, Navigate } from "react-router-dom";
import SetupPage from "./pages/SetupPage";
import LoginPage from "./pages/LoginPage";
import DashboardPage from "./pages/DashboardPage";
import ProjectsPage from "./pages/ProjectsPage";
import ExperimentPage from "./pages/ExperimentPage";
import LivePage from "./pages/LivePage";
import ComparePage from "./pages/ComparePage";
import AdminPage from "./pages/AdminPage";
export default function App() {
return (
<Routes>
<Route path="/setup" element={<SetupPage />} />
<Route path="/login" element={<LoginPage />} />
<Route path="/" element={<DashboardPage />} />
<Route path="/projects" element={<ProjectsPage />} />
<Route path="/experiments/:id" element={<ExperimentPage />} />
<Route path="/live/:id" element={<LivePage />} />
<Route path="/compare" element={<ComparePage />} />
<Route path="/admin" element={<AdminPage />} />
<Route path="*" element={<Navigate to="/" replace />} />
</Routes>
);
}

View file

@ -0,0 +1,552 @@
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
import {
setToken,
getToken,
clearToken,
ApiError,
auth,
projects,
experiments,
runs,
endpoints,
exportApi,
webhooks,
admin,
health,
connectWebSocket,
} from "./client";
// ---------------------------------------------------------------------------
// Mock fetch
// ---------------------------------------------------------------------------
const mockFetch = vi.fn();
beforeEach(() => {
mockFetch.mockReset();
vi.stubGlobal("fetch", mockFetch);
clearToken();
});
afterEach(() => {
vi.restoreAllMocks();
});
function jsonResponse(body: unknown, status = 200): Response {
return {
ok: status >= 200 && status < 300,
status,
statusText: status === 200 ? "OK" : "Error",
json: () => Promise.resolve(body),
text: () => Promise.resolve(JSON.stringify(body)),
headers: new Headers(),
} as unknown as Response;
}
function noContentResponse(): Response {
return {
ok: true,
status: 204,
statusText: "No Content",
json: () => Promise.reject(new Error("no body")),
text: () => Promise.resolve(""),
headers: new Headers(),
} as unknown as Response;
}
// ---------------------------------------------------------------------------
// Token management
// ---------------------------------------------------------------------------
describe("token management", () => {
it("starts with null token", () => {
expect(getToken()).toBeNull();
});
it("sets and gets token", () => {
setToken("abc123");
expect(getToken()).toBe("abc123");
});
it("clears token", () => {
setToken("abc123");
clearToken();
expect(getToken()).toBeNull();
});
});
// ---------------------------------------------------------------------------
// Auth header injection
// ---------------------------------------------------------------------------
describe("auth header injection", () => {
it("sends Authorization header when token is set", async () => {
setToken("my-jwt");
mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
await health.check();
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect((init.headers as Record<string, string>)["Authorization"]).toBe(
"Bearer my-jwt",
);
});
it("omits Authorization header when no token", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ status: "ok" }));
await health.check();
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(
(init.headers as Record<string, string>)["Authorization"],
).toBeUndefined();
});
});
// ---------------------------------------------------------------------------
// ApiError
// ---------------------------------------------------------------------------
describe("ApiError", () => {
it("throws ApiError on non-ok response", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ detail: "not found" }, 404),
);
await expect(projects.get("some-id")).rejects.toThrow(ApiError);
try {
mockFetch.mockResolvedValueOnce(
jsonResponse({ detail: "bad" }, 400),
);
await projects.get("some-id");
} catch (e) {
expect(e).toBeInstanceOf(ApiError);
expect((e as ApiError).status).toBe(400);
}
});
});
// ---------------------------------------------------------------------------
// Content-Type header
// ---------------------------------------------------------------------------
describe("content-type", () => {
it("sets Content-Type for POST with body", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ access_token: "tok", token_type: "bearer" }),
);
await auth.setup({ username: "admin", password: "password123" });
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect((init.headers as Record<string, string>)["Content-Type"]).toBe(
"application/json",
);
});
it("omits Content-Type for GET requests", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await projects.list();
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(
(init.headers as Record<string, string>)["Content-Type"],
).toBeUndefined();
});
});
// ---------------------------------------------------------------------------
// Health
// ---------------------------------------------------------------------------
describe("health", () => {
it("calls /health", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ status: "ok", database: true, redis: true }),
);
const result = await health.check();
expect(mockFetch).toHaveBeenCalledWith("/health", expect.anything());
expect(result.status).toBe("ok");
});
});
// ---------------------------------------------------------------------------
// Auth endpoints
// ---------------------------------------------------------------------------
describe("auth", () => {
it("setup POSTs to /api/auth/setup", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ access_token: "tok", token_type: "bearer" }),
);
const result = await auth.setup({
username: "admin",
password: "password123",
});
expect(mockFetch).toHaveBeenCalledWith(
"/api/auth/setup",
expect.anything(),
);
expect(result.access_token).toBe("tok");
});
it("login sets token automatically", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ access_token: "jwt-123", token_type: "bearer" }),
);
await auth.login({ username: "admin", password: "pass" });
expect(getToken()).toBe("jwt-123");
});
it("me GETs /api/auth/me", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({
id: "u1",
username: "admin",
is_admin: true,
created_at: "2026-01-01T00:00:00Z",
}),
);
const user = await auth.me();
expect(user.username).toBe("admin");
});
it("logout clears token", () => {
setToken("tok");
auth.logout();
expect(getToken()).toBeNull();
});
});
// ---------------------------------------------------------------------------
// Projects
// ---------------------------------------------------------------------------
describe("projects", () => {
it("list GETs /api/projects/", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await projects.list();
expect(mockFetch).toHaveBeenCalledWith(
"/api/projects/",
expect.anything(),
);
});
it("create POSTs to /api/projects/", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ id: "p1", name: "Test" }),
);
await projects.create({ name: "Test" });
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(init.method).toBe("POST");
expect(JSON.parse(init.body as string)).toEqual({ name: "Test" });
});
it("get fetches by id", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
await projects.get("p1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/projects/p1",
expect.anything(),
);
});
it("update PUTs by id", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ id: "p1" }));
await projects.update("p1", { name: "New" });
const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(url).toBe("/api/projects/p1");
expect(init.method).toBe("PUT");
});
it("delete DELETEs by id", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await projects.delete("p1");
const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(url).toBe("/api/projects/p1");
expect(init.method).toBe("DELETE");
});
});
// ---------------------------------------------------------------------------
// Experiments
// ---------------------------------------------------------------------------
describe("experiments", () => {
it("list GETs /api/experiments/", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await experiments.list();
expect(mockFetch).toHaveBeenCalledWith(
"/api/experiments/",
expect.anything(),
);
});
it("startSweep POSTs to sweep endpoint", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await experiments.startSweep("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/experiments/e1/sweep",
expect.anything(),
);
});
it("pause POSTs to pause endpoint", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await experiments.pause("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/experiments/e1/pause",
expect.anything(),
);
});
it("resume POSTs to resume endpoint", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await experiments.resume("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/experiments/e1/resume",
expect.anything(),
);
});
it("stop POSTs to stop endpoint", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await experiments.stop("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/experiments/e1/stop",
expect.anything(),
);
});
});
// ---------------------------------------------------------------------------
// Runs
// ---------------------------------------------------------------------------
describe("runs", () => {
it("list GETs runs for experiment", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await runs.list("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/runs/experiments/e1/runs",
expect.anything(),
);
});
it("get fetches run detail", async () => {
mockFetch.mockResolvedValueOnce(
jsonResponse({ id: "r1", stage_results: [], scores: [] }),
);
await runs.get("r1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/runs/r1",
expect.anything(),
);
});
it("score POSTs to run score endpoint", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ id: "s1" }));
await runs.score("r1", { scorer_name: "human", value: 0.9 });
const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(url).toBe("/api/runs/r1/score");
expect(init.method).toBe("POST");
});
it("leaderboard GETs leaderboard", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await runs.leaderboard("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/runs/experiments/e1/leaderboard",
expect.anything(),
);
});
});
// ---------------------------------------------------------------------------
// Endpoints
// ---------------------------------------------------------------------------
describe("endpoints", () => {
it("list GETs /api/endpoints/", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await endpoints.list();
expect(mockFetch).toHaveBeenCalledWith(
"/api/endpoints/",
expect.anything(),
);
});
it("test POSTs to test endpoint", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ models: [] }));
await endpoints.test("ep1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/endpoints/ep1/test",
expect.anything(),
);
});
});
// ---------------------------------------------------------------------------
// Export
// ---------------------------------------------------------------------------
describe("exportApi", () => {
it("best GETs best config", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({}));
await exportApi.best("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/export/experiments/e1/best",
expect.anything(),
);
});
it("env GETs env export", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse("KEY=val"));
await exportApi.env("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/export/experiments/e1/env",
expect.anything(),
);
});
it("report GETs report", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse("# Report"));
await exportApi.report("e1");
expect(mockFetch).toHaveBeenCalledWith(
"/api/export/experiments/e1/report",
expect.anything(),
);
});
});
// ---------------------------------------------------------------------------
// Webhooks
// ---------------------------------------------------------------------------
describe("webhooks", () => {
it("list GETs /api/webhooks/", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ items: [], total: 0 }));
await webhooks.list();
expect(mockFetch).toHaveBeenCalledWith(
"/api/webhooks/",
expect.anything(),
);
});
it("create POSTs webhook", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({ id: "w1" }));
await webhooks.create({ event_type: "run.complete", url: "http://x" });
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(init.method).toBe("POST");
});
it("delete DELETEs webhook", async () => {
mockFetch.mockResolvedValueOnce(noContentResponse());
await webhooks.delete("w1");
const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(url).toBe("/api/webhooks/w1");
expect(init.method).toBe("DELETE");
});
});
// ---------------------------------------------------------------------------
// Admin
// ---------------------------------------------------------------------------
describe("admin", () => {
it("getSettings GETs /api/admin/settings", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({}));
await admin.getSettings();
expect(mockFetch).toHaveBeenCalledWith(
"/api/admin/settings",
expect.anything(),
);
});
it("updateSettings PUTs /api/admin/settings", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({}));
await admin.updateSettings({ guest_access: true });
const [, init] = mockFetch.mock.calls[0] as [string, RequestInit];
expect(init.method).toBe("PUT");
});
it("getStats GETs /api/admin/stats", async () => {
mockFetch.mockResolvedValueOnce(jsonResponse({}));
await admin.getStats();
expect(mockFetch).toHaveBeenCalledWith(
"/api/admin/stats",
expect.anything(),
);
});
});
// ---------------------------------------------------------------------------
// WebSocket helper
// ---------------------------------------------------------------------------
describe("connectWebSocket", () => {
it("creates WebSocket with correct URL and handles messages", () => {
const sendSpy = vi.fn();
const closeSpy = vi.fn();
let capturedInstance: {
onmessage: ((ev: { data: string }) => void) | null;
onclose: (() => void) | null;
readyState: number;
};
// Use a class constructor so `new WebSocket(...)` works
class MockWebSocket {
static OPEN = 1;
readyState = 1;
onmessage: ((ev: { data: string }) => void) | null = null;
onclose: (() => void) | null = null;
send = sendSpy;
close = closeSpy;
constructor(public url: string) {
capturedInstance = this;
}
}
vi.stubGlobal("WebSocket", MockWebSocket);
Object.defineProperty(window, "location", {
value: { protocol: "http:", host: "localhost:5173" },
writable: true,
configurable: true,
});
const onMessage = vi.fn();
const onClose = vi.fn();
const conn = connectWebSocket(onMessage, onClose);
expect(capturedInstance!.url).toBe("ws://localhost:5173/ws");
// Simulate incoming message
capturedInstance!.onmessage!({ data: JSON.stringify({ type: "update" }) });
expect(onMessage).toHaveBeenCalledWith({ type: "update" });
// Send message
conn.send({ type: "ping" });
expect(sendSpy).toHaveBeenCalledWith('{"type":"ping"}');
// Simulate close
capturedInstance!.onclose!();
expect(onClose).toHaveBeenCalled();
// Close from client
conn.close();
expect(closeSpy).toHaveBeenCalled();
vi.unstubAllGlobals();
});
});

545
frontend/src/api/client.ts Normal file
View file

@ -0,0 +1,545 @@
/**
* PromptLooper typed API client.
*
* - JWT token stored in memory (never localStorage) for security.
* - Automatic Authorization header injection.
* - Typed wrapper functions for every API endpoint group.
* - WebSocket connection helper for real-time updates.
*/
// ---------------------------------------------------------------------------
// Types — mirrors backend Pydantic schemas
// ---------------------------------------------------------------------------
export interface ProjectCreate {
name: string;
description?: string | null;
}
export interface ProjectUpdate {
name?: string | null;
description?: string | null;
}
export interface ProjectResponse {
id: string;
name: string;
description: string | null;
owner_id: string;
created_at: string;
updated_at: string;
}
export interface ProjectListResponse {
items: ProjectResponse[];
total: number;
}
export interface ExperimentCreate {
name: string;
description?: string | null;
sample_data?: Record<string, unknown> | null;
pipeline_stages?: Record<string, unknown> | null;
scoring_config?: Record<string, unknown> | null;
parameter_space?: Record<string, unknown> | null;
}
export interface ExperimentUpdate {
name?: string | null;
description?: string | null;
sample_data?: Record<string, unknown> | null;
pipeline_stages?: Record<string, unknown> | null;
scoring_config?: Record<string, unknown> | null;
parameter_space?: Record<string, unknown> | null;
status?: string | null;
}
export interface ExperimentResponse {
id: string;
project_id: string;
name: string;
description: string | null;
sample_data: Record<string, unknown> | null;
pipeline_stages: Record<string, unknown> | null;
scoring_config: Record<string, unknown> | null;
parameter_space: Record<string, unknown> | null;
status: string;
created_at: string;
updated_at: string;
}
export interface ExperimentListResponse {
items: ExperimentResponse[];
total: number;
}
export interface RunResponse {
id: string;
experiment_id: string;
config_hash: string;
config: Record<string, unknown>;
status: string;
started_at: string | null;
completed_at: string | null;
duration_ms: number | null;
tokens_in: number | null;
tokens_out: number | null;
cost_estimate: number | null;
}
export interface RunListResponse {
items: RunResponse[];
total: number;
}
export interface StageResultResponse {
id: string;
run_id: string;
stage_index: number;
prompt_sent: string;
response_raw: string;
model_used: string;
parameters: Record<string, unknown> | null;
tokens_in: number | null;
tokens_out: number | null;
latency_ms: number | null;
}
export interface ScoreResponse {
id: string;
run_id: string;
scorer_name: string;
value: number;
scorer_metadata: Record<string, unknown> | null;
created_at: string;
}
export interface RunDetailResponse extends RunResponse {
stage_results: StageResultResponse[];
scores: ScoreResponse[];
}
export interface ScoreInput {
scorer_name: string;
value: number;
metadata?: Record<string, unknown> | null;
}
export interface EndpointCreate {
name: string;
url: string;
api_key?: string | null;
default_model?: string | null;
}
export interface EndpointUpdate {
name?: string | null;
url?: string | null;
api_key?: string | null;
default_model?: string | null;
}
export interface EndpointResponse {
id: string;
name: string;
url: string;
default_model: string | null;
}
export interface EndpointListResponse {
items: EndpointResponse[];
total: number;
}
export interface WebhookCreate {
event_type: string;
url: string;
headers?: Record<string, string> | null;
is_active?: boolean;
}
export interface WebhookUpdate {
event_type?: string | null;
url?: string | null;
headers?: Record<string, string> | null;
is_active?: boolean | null;
}
export interface WebhookResponse {
id: string;
event_type: string;
url: string;
headers: Record<string, string> | null;
is_active: boolean;
}
export interface WebhookListResponse {
items: WebhookResponse[];
total: number;
}
export interface SetupRequest {
username: string;
password: string;
}
export interface LoginRequest {
username: string;
password: string;
}
export interface TokenResponse {
access_token: string;
token_type: string;
}
export interface UserResponse {
id: string;
username: string;
is_admin: boolean;
created_at: string;
}
export interface HealthResponse {
status: string;
database: boolean;
redis: boolean;
}
export interface ExportRunRow {
run_id: string;
experiment_id: string;
config_hash: string;
config: Record<string, unknown>;
status: string;
duration_ms: number | null;
tokens_in: number | null;
tokens_out: number | null;
cost_estimate: number | null;
scores: Record<string, number>;
}
export interface ExportResponse {
experiment_id: string;
experiment_name: string;
rows: ExportRunRow[];
}
// ---------------------------------------------------------------------------
// API Error
// ---------------------------------------------------------------------------
export class ApiError extends Error {
constructor(
public status: number,
public statusText: string,
public body: unknown,
) {
super(`API ${status}: ${statusText}`);
this.name = "ApiError";
}
}
// ---------------------------------------------------------------------------
// Token management (in-memory only)
// ---------------------------------------------------------------------------
let _accessToken: string | null = null;
export function setToken(token: string | null): void {
_accessToken = token;
}
export function getToken(): string | null {
return _accessToken;
}
export function clearToken(): void {
_accessToken = null;
}
// ---------------------------------------------------------------------------
// Base fetch wrapper
// ---------------------------------------------------------------------------
const BASE_URL = ""; // Uses Vite proxy in dev; same origin in prod
async function request<T>(
path: string,
options: RequestInit = {},
): Promise<T> {
const headers: Record<string, string> = {
...(options.headers as Record<string, string> | undefined),
};
// Inject auth header
if (_accessToken) {
headers["Authorization"] = `Bearer ${_accessToken}`;
}
// Default content-type for requests with bodies
if (options.body && !headers["Content-Type"]) {
headers["Content-Type"] = "application/json";
}
const response = await fetch(`${BASE_URL}${path}`, {
...options,
headers,
});
if (!response.ok) {
let body: unknown;
try {
body = await response.json();
} catch {
body = await response.text();
}
throw new ApiError(response.status, response.statusText, body);
}
// 204 No Content
if (response.status === 204) {
return undefined as T;
}
return response.json() as Promise<T>;
}
function get<T>(path: string): Promise<T> {
return request<T>(path, { method: "GET" });
}
function post<T>(path: string, body?: unknown): Promise<T> {
return request<T>(path, {
method: "POST",
body: body != null ? JSON.stringify(body) : undefined,
});
}
function put<T>(path: string, body?: unknown): Promise<T> {
return request<T>(path, {
method: "PUT",
body: body != null ? JSON.stringify(body) : undefined,
});
}
function del<T>(path: string): Promise<T> {
return request<T>(path, { method: "DELETE" });
}
// ---------------------------------------------------------------------------
// Health
// ---------------------------------------------------------------------------
export const health = {
check: () => get<HealthResponse>("/health"),
};
// ---------------------------------------------------------------------------
// Auth
// ---------------------------------------------------------------------------
export const auth = {
setup: (data: SetupRequest) =>
post<TokenResponse>("/api/auth/setup", data),
login: async (data: LoginRequest): Promise<TokenResponse> => {
const resp = await post<TokenResponse>("/api/auth/login", data);
setToken(resp.access_token);
return resp;
},
me: () => get<UserResponse>("/api/auth/me"),
logout: () => {
clearToken();
},
};
// ---------------------------------------------------------------------------
// Projects
// ---------------------------------------------------------------------------
export const projects = {
list: () => get<ProjectListResponse>("/api/projects/"),
create: (data: ProjectCreate) =>
post<ProjectResponse>("/api/projects/", data),
get: (id: string) => get<ProjectResponse>(`/api/projects/${id}`),
update: (id: string, data: ProjectUpdate) =>
put<ProjectResponse>(`/api/projects/${id}`, data),
delete: (id: string) => del<void>(`/api/projects/${id}`),
};
// ---------------------------------------------------------------------------
// Experiments
// ---------------------------------------------------------------------------
export const experiments = {
list: () => get<ExperimentListResponse>("/api/experiments/"),
create: (data: ExperimentCreate) =>
post<ExperimentResponse>("/api/experiments/", data),
get: (id: string) => get<ExperimentResponse>(`/api/experiments/${id}`),
update: (id: string, data: ExperimentUpdate) =>
put<ExperimentResponse>(`/api/experiments/${id}`, data),
delete: (id: string) => del<void>(`/api/experiments/${id}`),
startSweep: (id: string) =>
post<void>(`/api/experiments/${id}/sweep`),
pause: (id: string) =>
post<void>(`/api/experiments/${id}/pause`),
resume: (id: string) =>
post<void>(`/api/experiments/${id}/resume`),
stop: (id: string) =>
post<void>(`/api/experiments/${id}/stop`),
};
// ---------------------------------------------------------------------------
// Runs
// ---------------------------------------------------------------------------
export const runs = {
list: (experimentId: string) =>
get<RunListResponse>(`/api/runs/experiments/${experimentId}/runs`),
get: (runId: string) =>
get<RunDetailResponse>(`/api/runs/${runId}`),
create: (data: Record<string, unknown>) =>
post<RunResponse>("/api/runs/", data),
score: (runId: string, data: ScoreInput) =>
post<ScoreResponse>(`/api/runs/${runId}/score`, data),
leaderboard: (experimentId: string) =>
get<RunListResponse>(
`/api/runs/experiments/${experimentId}/leaderboard`,
),
};
// ---------------------------------------------------------------------------
// Endpoints (LLM targets)
// ---------------------------------------------------------------------------
export const endpoints = {
list: () => get<EndpointListResponse>("/api/endpoints/"),
create: (data: EndpointCreate) =>
post<EndpointResponse>("/api/endpoints/", data),
update: (id: string, data: EndpointUpdate) =>
put<EndpointResponse>(`/api/endpoints/${id}`, data),
delete: (id: string) => del<void>(`/api/endpoints/${id}`),
test: (id: string) =>
post<Record<string, unknown>>(`/api/endpoints/${id}/test`),
};
// ---------------------------------------------------------------------------
// Export
// ---------------------------------------------------------------------------
export const exportApi = {
best: (experimentId: string) =>
get<Record<string, unknown>>(
`/api/export/experiments/${experimentId}/best`,
),
env: (experimentId: string) =>
get<string>(`/api/export/experiments/${experimentId}/env`),
yaml: (experimentId: string) =>
get<string>(`/api/export/experiments/${experimentId}/yaml`),
report: (experimentId: string) =>
get<string>(`/api/export/experiments/${experimentId}/report`),
};
// ---------------------------------------------------------------------------
// Webhooks
// ---------------------------------------------------------------------------
export const webhooks = {
list: () => get<WebhookListResponse>("/api/webhooks/"),
create: (data: WebhookCreate) =>
post<WebhookResponse>("/api/webhooks/", data),
delete: (id: string) => del<void>(`/api/webhooks/${id}`),
};
// ---------------------------------------------------------------------------
// Admin
// ---------------------------------------------------------------------------
export const admin = {
getSettings: () =>
get<Record<string, unknown>>("/api/admin/settings"),
updateSettings: (data: Record<string, unknown>) =>
put<Record<string, unknown>>("/api/admin/settings", data),
getStats: () => get<Record<string, unknown>>("/api/admin/stats"),
};
// ---------------------------------------------------------------------------
// WebSocket helper
// ---------------------------------------------------------------------------
export type WsMessageHandler = (data: unknown) => void;
export interface WsConnection {
send: (data: unknown) => void;
close: () => void;
}
/**
* Connect to the real-time WebSocket endpoint.
*
* @param onMessage Called for each incoming message.
* @param onClose Optional callback when connection closes.
* @returns Object with `send()` and `close()` methods.
*/
export function connectWebSocket(
onMessage: WsMessageHandler,
onClose?: () => void,
): WsConnection {
const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
const wsUrl = `${protocol}//${window.location.host}/ws`;
const ws = new WebSocket(wsUrl);
ws.onmessage = (event) => {
try {
const data: unknown = JSON.parse(event.data as string);
onMessage(data);
} catch {
onMessage(event.data);
}
};
ws.onclose = () => {
onClose?.();
};
return {
send: (data: unknown) => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(data));
}
},
close: () => {
ws.close();
},
};
}

View file

3
frontend/src/index.css Normal file
View file

@ -0,0 +1,3 @@
@tailwind base;
@tailwind components;
@tailwind utilities;

13
frontend/src/main.tsx Normal file
View file

@ -0,0 +1,13 @@
import React from "react";
import ReactDOM from "react-dom/client";
import { BrowserRouter } from "react-router-dom";
import App from "./App";
import "./index.css";
ReactDOM.createRoot(document.getElementById("root")!).render(
<React.StrictMode>
<BrowserRouter>
<App />
</BrowserRouter>
</React.StrictMode>,
);

View file

@ -0,0 +1,8 @@
export default function AdminPage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Admin</h1>
<p className="text-gray-600">System administration and user management.</p>
</div>
);
}

View file

@ -0,0 +1,8 @@
export default function ComparePage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Compare</h1>
<p className="text-gray-600">Compare results across runs and experiments.</p>
</div>
);
}

View file

@ -0,0 +1,8 @@
export default function DashboardPage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Dashboard</h1>
<p className="text-gray-600">Overview of recent experiments and runs.</p>
</div>
);
}

View file

@ -0,0 +1,8 @@
export default function ExperimentPage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Experiment</h1>
<p className="text-gray-600">Configure and run prompt experiments.</p>
</div>
);
}

View file

@ -0,0 +1,8 @@
export default function LivePage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Live</h1>
<p className="text-gray-600">Real-time experiment progress and results.</p>
</div>
);
}

View file

@ -0,0 +1,10 @@
export default function LoginPage() {
return (
<div className="flex min-h-screen items-center justify-center bg-gray-50">
<div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
<h1 className="mb-4 text-2xl font-bold">Sign In</h1>
<p className="text-gray-600">Log in to PromptLooper.</p>
</div>
</div>
);
}

View file

@ -0,0 +1,8 @@
export default function ProjectsPage() {
return (
<div className="p-8">
<h1 className="mb-4 text-2xl font-bold">Projects</h1>
<p className="text-gray-600">Manage your prompt tuning projects.</p>
</div>
);
}

View file

@ -0,0 +1,10 @@
export default function SetupPage() {
return (
<div className="flex min-h-screen items-center justify-center bg-gray-50">
<div className="w-full max-w-md rounded-lg bg-white p-8 shadow">
<h1 className="mb-4 text-2xl font-bold">PromptLooper Setup</h1>
<p className="text-gray-600">Create your admin account to get started.</p>
</div>
</div>
);
}

View file

@ -0,0 +1 @@
import "@testing-library/jest-dom/vitest";

1
frontend/src/vite-env.d.ts vendored Normal file
View file

@ -0,0 +1 @@
/// <reference types="vite/client" />

View file

@ -0,0 +1,8 @@
/** @type {import('tailwindcss').Config} */
export default {
content: ["./index.html", "./src/**/*.{js,ts,jsx,tsx}"],
theme: {
extend: {},
},
plugins: [],
};

21
frontend/tsconfig.json Normal file
View file

@ -0,0 +1,21 @@
{
"compilerOptions": {
"target": "ES2020",
"useDefineForClassFields": true,
"lib": ["ES2020", "DOM", "DOM.Iterable"],
"module": "ESNext",
"skipLibCheck": true,
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"isolatedModules": true,
"moduleDetection": "force",
"noEmit": true,
"jsx": "react-jsx",
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noFallthroughCasesInSwitch": true,
"forceConsistentCasingInFileNames": true
},
"include": ["src"]
}

25
frontend/vite.config.ts Normal file
View file

@ -0,0 +1,25 @@
import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
export default defineConfig({
plugins: [react()],
build: {
outDir: "dist",
},
server: {
port: 5173,
proxy: {
"/api": "http://localhost:8000",
"/ws": {
target: "ws://localhost:8000",
ws: true,
},
"/health": "http://localhost:8000",
},
},
test: {
environment: "jsdom",
globals: true,
setupFiles: ["./src/test-setup.ts"],
},
});

635
promptlooper-spec.md Normal file
View file

@ -0,0 +1,635 @@
# PromptLooper
> The one who loops prompts — a universal LLM pipeline tuning workbench.
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
---
## Problem Statement
Anyone building LLM-powered applications faces the same painful loop:
1. Write a system prompt
2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
3. Run it against sample data
4. Read the output and decide if it's "good enough"
5. Tweak something and repeat
This process is manual, unscientific, and wasteful. There's no way to:
- Systematically compare configurations side-by-side
- Know if you've already tested a particular combination
- Quantify "better" beyond gut feeling
- Let an agent handle the iteration while you steer from above
- Share optimized configurations between projects or team members
PromptLooper makes this process systematic, observable, cached, and agent-drivable.
---
## Target Users
| User | Use Case |
|------|----------|
| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
---
## Core Concepts
### Experiment
A named configuration that defines:
- **Sample data**: Input documents, queries, or any text the pipeline will process
- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
- **Evaluation criteria**: Scoring functions that grade the output
- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
### Run
A single execution of one specific configuration within an experiment. A run captures:
- Full input configuration (prompt, model, all parameters)
- Raw LLM response(s)
- Timing data (latency, tokens in/out)
- Evaluation scores
- Configuration hash (for cache deduplication)
### Sweep
A batch of runs that systematically explores a parameter space. Types:
- **Grid sweep**: Every combination of specified parameter values
- **Random sweep**: Random sampling from parameter ranges
- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
### Scoring Function
A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
- **Embedding similarity**: How semantically close is the output to a reference answer?
- **Length compliance**: Does the output meet length constraints?
- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
- **Keyword presence**: Do required terms appear in the output?
- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
- **Custom function**: User-provided Python snippet or HTTP webhook
### Project
A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
- All experiments and their runs
- Saved "best" configurations
- Notes and annotations
- Export history
---
## Architecture
```
┌──────────────────────────────────────────────────────────────────────────┐
│ Docker Compose: xpltd_promptlooper (ub01) │
│ Network: promptlooper (172.33.0.0/24) │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ FastAPI (API) │ │
│ │ :5434 │ │ job queue │ │ Experiments, Runs, Scoring, │ │
│ │ experiments│ │ pub/sub │ │ Projects, Auth, MCP Server │ │
│ │ runs, cache│ │ live state │ │ WebSocket for live dashboard │ │
│ └─────┬───────┘ └──────┬──────┘ └──────────────┬───────────────────┘ │
│ │ │ │ │
│ ┌─────┴─────────────────┴────────────────────────┴───────────────────┐ │
│ │ Celery Worker │ │
│ │ Executes runs against target LLM endpoints │ │
│ │ Caches responses by config hash │ │
│ │ Streams progress via Redis pub/sub │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Web UI (React + Vite) │ │
│ │ nginx → :8400 │ │
│ │ Dashboard, Experiment Builder, Live Observability, Steering │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│ HTTP (OpenAI-compatible)
┌───────────────────────────────┐
│ Target LLM Endpoints │
│ OpenWebUI, vLLM, Ollama, │
│ OpenAI, Anthropic, any │
│ OpenAI-compatible API │
└───────────────────────────────┘
```
### Services (Production Compose)
| Service | Image | Port | Purpose |
|---------|-------|------|---------|
| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
### Single Container Mode
When `DATABASE_URL` is not set, PromptLooper runs with:
- SQLite at `/data/promptlooper.db`
- In-process task queue (no Celery/Redis dependency)
- All services in one container on port 8400
```bash
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
```
---
## Data Model
### User
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| username | string | Unique, "admin" created on first boot |
| password_hash | string | bcrypt |
| is_admin | bool | Default true for first user |
| created_at | timestamp | |
### Project
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| name | string | |
| description | text | Optional |
| owner_id | UUID | FK → User |
| created_at | timestamp | |
| updated_at | timestamp | |
### Experiment
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| project_id | UUID | FK → Project |
| name | string | |
| description | text | Optional |
| sample_data | JSONB | Input documents/queries |
| pipeline_stages | JSONB | Stage definitions with prompt templates |
| scoring_config | JSONB | Which scoring functions to use and their weights |
| parameter_space | JSONB | What to vary and ranges/options |
| status | enum | draft, running, paused, completed |
| created_at | timestamp | |
| updated_at | timestamp | |
### Run
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| experiment_id | UUID | FK → Experiment |
| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
| config | JSONB | Complete configuration snapshot |
| status | enum | pending, running, completed, failed, cached |
| started_at | timestamp | |
| completed_at | timestamp | |
| duration_ms | int | Wall clock time |
| tokens_in | int | Total input tokens across all stages |
| tokens_out | int | Total output tokens |
| cost_estimate | decimal | Estimated cost based on model pricing |
### StageResult
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| run_id | UUID | FK → Run |
| stage_index | int | 0-based stage number |
| prompt_sent | text | Actual prompt after template rendering |
| response_raw | text | Raw LLM response |
| model_used | string | Model identifier |
| parameters | JSONB | Temperature, top_p, etc. |
| tokens_in | int | This stage |
| tokens_out | int | This stage |
| latency_ms | int | This stage |
### Score
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| run_id | UUID | FK → Run |
| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
| value | float | Normalized 0.01.0 |
| metadata | JSONB | Scorer-specific details |
| created_at | timestamp | |
### ResponseCache
| Field | Type | Notes |
|-------|------|-------|
| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
| response | text | Cached LLM response |
| model | string | |
| tokens_in | int | |
| tokens_out | int | |
| latency_ms | int | Original latency |
| created_at | timestamp | |
### WebhookConfig
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
| url | string | Target URL |
| headers | JSONB | Optional auth headers |
| is_active | bool | |
---
## API Endpoints
### Auth
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/auth/setup` | First-boot admin password setup |
| POST | `/api/v1/auth/login` | Login, returns JWT |
| GET | `/api/v1/auth/me` | Current user info |
### Admin
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
| PUT | `/api/v1/admin/settings` | Update settings |
| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
### Projects
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/projects` | List projects |
| POST | `/api/v1/projects` | Create project |
| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
| PUT | `/api/v1/projects/{id}` | Update project |
| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
### Experiments
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments` | List experiments (filter by project) |
| POST | `/api/v1/experiments` | Create experiment |
| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
| PUT | `/api/v1/experiments/{id}` | Update experiment config |
| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
### Runs
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
| GET | `/api/v1/runs/{id}` | Run detail with stage results |
| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
### Export
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
### LLM Endpoints (Target Management)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/endpoints` | List configured LLM endpoints |
| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
### Webhooks
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/webhooks` | List webhook configs |
| POST | `/api/v1/webhooks` | Create webhook |
| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
### WebSocket
| Path | Description |
|------|-------------|
| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
| `/ws/dashboard` | Global activity feed across all experiments |
### Health
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check (DB + Redis connectivity) |
---
## MCP Server
PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
### MCP Tools
| Tool | Description |
|------|-------------|
| `create_project` | Create a new project workspace |
| `create_experiment` | Define an experiment with sample data, stages, and scoring |
| `configure_endpoint` | Add or update an LLM target endpoint |
| `run_single` | Execute one specific configuration and return results |
| `run_sweep` | Start a parameter sweep (grid/random/guided) |
| `get_leaderboard` | Get top N configurations ranked by score |
| `get_run_detail` | Get full details of a specific run |
| `export_best_config` | Export the best configuration in JSON/YAML/env format |
| `pause_sweep` | Pause a running sweep |
| `resume_sweep` | Resume a paused sweep |
| `add_human_score` | Rate a run's output |
| `get_experiment_status` | Check experiment progress |
| `list_models` | List available models across all configured endpoints |
### Example Agent Interaction
```
Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
sweeping temperature from 0.1 to 0.9 in 0.2 increments.
Use embedding similarity scoring against these reference outputs.
Run a grid sweep."
PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
→ streams progress → [get_leaderboard]
Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
a .env snippet I can drop into Chrysopedia."
PromptLooper MCP: [export_best_config format=env]
```
---
## Response Caching
Every LLM call is cached by a SHA-256 hash of:
- Prompt text (after template rendering)
- Model identifier
- All inference parameters (temperature, top_p, max_tokens, etc.)
- Input data
If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
- Re-running experiments with new scoring functions costs zero tokens
- Adding a new scorer retroactively evaluates all historical runs
- Accidentally re-running a sweep wastes nothing
- Cache can be invalidated per-run or per-experiment if needed
---
## Authentication Model
### First Boot
- App detects no users exist
- Presents a setup screen: create admin username + password
- Admin account is created, user is logged in
### Guest Access
- Admin can toggle `allow_guest_access` in settings
- Guests can view experiments and results (read-only)
- Guests cannot create experiments, run sweeps, or modify configs
- Default: guest access disabled
### API Authentication
- JWT tokens for the web UI
- API key (generated in admin settings) for programmatic access and MCP
- API key passed via `Authorization: Bearer <key>` header
---
## Real-Time Observability Dashboard
The dashboard is the primary user interface during active experimentation. It provides:
### Live Experiment View
- Progress bar: X of Y runs completed
- Token usage accumulator (running total)
- Cost estimate (based on configured model pricing)
- Cache hit rate for current sweep
- Estimated time remaining
### Side-by-Side Output Comparison
- Pick any two runs and diff their outputs
- Highlight differences in prompt, parameters, and response
- Score comparison overlay
### Leaderboard
- Real-time ranked list of runs by weighted score
- Sortable by any individual scorer
- Click to expand full run detail
### Steering Controls
- **Pause**: Stop the sweep after current run completes
- **Fork**: Create a new experiment branching from current best, with modified parameters
- **Redirect**: Change remaining sweep parameters mid-flight
- **Approve**: Mark a configuration as "good enough" and export
- **Reject**: Exclude a run from leaderboard consideration
### Activity Timeline
- Chronological feed of events: run started, run completed, new best found, cache hit, error
- Filterable by event type
---
## Webhook Events
| Event | Payload | Trigger |
|-------|---------|---------|
| `experiment.started` | experiment_id, sweep config | Sweep begins |
| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
| `experiment.paused` | experiment_id, reason | Manual or budget pause |
| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
| `human_needed` | experiment_id, reason, context | Agent requests human review |
| `run.failed` | run_id, error | Individual run error |
---
## Configuration Export Formats
### JSON
```json
{
"model": "qwen2.5-72b-instruct",
"endpoint": "http://chat.forgetyour.name/api",
"temperature": 0.3,
"top_p": 0.85,
"max_tokens": 2048,
"system_prompt": "You are a music production knowledge extractor...",
"score": 0.87,
"experiment": "chrysopedia-extraction-v2",
"exported_at": "2026-04-06T12:00:00Z"
}
```
### .env
```bash
LLM_MODEL=qwen2.5-72b-instruct
LLM_API_URL=http://chat.forgetyour.name/api
LLM_TEMPERATURE=0.3
LLM_TOP_P=0.85
LLM_MAX_TOKENS=2048
# Score: 0.87 | Experiment: chrysopedia-extraction-v2
```
### YAML
```yaml
model: qwen2.5-72b-instruct
endpoint: http://chat.forgetyour.name/api
parameters:
temperature: 0.3
top_p: 0.85
max_tokens: 2048
system_prompt: |
You are a music production knowledge extractor...
metadata:
score: 0.87
experiment: chrysopedia-extraction-v2
exported_at: 2026-04-06T12:00:00Z
```
---
## Environment Variables
| Group | Variable | Default | Notes |
|-------|----------|---------|-------|
| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
| **Server** | `HOST` | `0.0.0.0` | Bind address |
| **Server** | `PORT` | `8400` | HTTP port |
| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
| **MCP** | `MCP_PORT` | `8401` | MCP server port |
---
## Docker Compose (Production — XPLTD Conventions)
Project name: `xpltd_promptlooper`
Network: `promptlooper` (`172.33.0.0/24`)
Persistent data: `/vmPool/r/services/promptlooper_*`
PostgreSQL port: `5434` (external)
Web UI port: `8400` (external)
---
## Technology Stack
| Layer | Technology | Rationale |
|-------|-----------|-----------|
| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
| **Styling** | Tailwind CSS | Fast iteration, utility-first |
| **MCP** | Python MCP SDK | Standard protocol for agent integration |
| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
---
## Development & Deployment
### Local Development
```bash
git clone git@git.xpltd.co:xpltdco/promptlooper.git
cd promptlooper
cp .env.example .env
docker compose up -d promptlooper-db promptlooper-redis
cd backend && pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# In another terminal:
cd frontend && npm install && npm run dev
```
### Production Deployment (ub01)
```bash
ssh ub01
cd /vmPool/r/repos/xpltdco/promptlooper
git pull && docker compose build && docker compose up -d
```
### Project Structure
```
promptlooper/
├── backend/
│ ├── main.py # FastAPI entry point
│ ├── config.py # Pydantic Settings
│ ├── models.py # SQLAlchemy ORM
│ ├── schemas.py # Pydantic request/response
│ ├── auth.py # JWT + API key auth
│ ├── worker.py # Celery app config
│ ├── routers/
│ │ ├── auth.py
│ │ ├── projects.py
│ │ ├── experiments.py
│ │ ├── runs.py
│ │ ├── endpoints.py
│ │ ├── export.py
│ │ ├── webhooks.py
│ │ └── admin.py
│ ├── engine/
│ │ ├── runner.py # Run execution logic
│ │ ├── sweep.py # Sweep orchestration
│ │ ├── cache.py # Response cache layer
│ │ ├── adapters/ # LLM endpoint adapters
│ │ │ ├── openai_compat.py
│ │ │ └── base.py
│ │ └── scorers/ # Pluggable scoring functions
│ │ ├── embedding.py
│ │ ├── format.py
│ │ ├── keyword.py
│ │ ├── llm_judge.py
│ │ └── base.py
│ ├── mcp/
│ │ ├── server.py # MCP server implementation
│ │ └── tools.py # MCP tool definitions
│ ├── websocket/
│ │ └── manager.py # WebSocket connection management
│ └── tests/
├── frontend/
│ └── src/
│ ├── pages/
│ │ ├── Setup.tsx # First-boot admin setup
│ │ ├── Login.tsx
│ │ ├── Dashboard.tsx # Global activity
│ │ ├── Projects.tsx
│ │ ├── Experiment.tsx # Experiment builder + config
│ │ ├── Live.tsx # Real-time observability
│ │ ├── Compare.tsx # Side-by-side run comparison
│ │ └── Admin.tsx # System settings
│ ├── components/
│ │ ├── Leaderboard.tsx
│ │ ├── SteeringControls.tsx
│ │ ├── RunCard.tsx
│ │ ├── ScoreChart.tsx
│ │ └── Timeline.tsx
│ └── api/
├── docker/
│ ├── Dockerfile # Multi-stage: API + frontend
│ └── nginx.conf
├── alembic/
├── docker-compose.yml
├── .env.example
├── CLAUDE.md
└── README.md
```