MAESTRO: Initialize repository with README, .gitignore, and project files

Add README.md with project description, quick-start instructions, and
AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker
artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and
env.example.
This commit is contained in:
John Lightner 2026-04-07 01:39:18 -05:00
commit fc2e4cd7d1
6 changed files with 1013 additions and 0 deletions

57
.gitignore vendored Normal file
View file

@ -0,0 +1,57 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
*.egg
dist/
build/
.eggs/
*.whl
.venv/
venv/
env/
.env
*.pyc
.pytest_cache/
.mypy_cache/
.ruff_cache/
htmlcov/
.coverage
.coverage.*
# Node / Frontend
node_modules/
frontend/dist/
frontend/build/
.npm
*.tsbuildinfo
# Docker
docker/nginx.conf.bak
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# OS
Thumbs.db
Desktop.ini
# Data (single-container mode)
*.db
/data/
# Alembic
alembic/versions/__pycache__/
# Auto Run Docs (Maestro working files)
Auto Run Docs/Working/
# Misc
*.log
*.bak

127
CLAUDE.md Normal file
View file

@ -0,0 +1,127 @@
# CLAUDE.md — PromptLooper
## What is this project?
PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
## Repository
- **Hosted at**: git.xpltd.co/xpltdco/promptlooper
- **XPLTD project name**: `xpltd_promptlooper`
- **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
## Tech Stack
- **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
- **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
- **Cache/Queue**: Redis 7 (production) / in-process (single-container)
- **Real-time**: WebSocket via FastAPI + Redis pub/sub
- **MCP**: Python MCP SDK
- **Container**: Multi-stage Docker build, nginx for frontend
## XPLTD Conventions
These are non-negotiable project conventions shared across all XPLTD projects:
- Docker Compose project name: `xpltd_promptlooper`
- Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
- Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
- PostgreSQL on external port `5434` (internal `5432`)
- Web UI on port `8400`
- MCP server on port `8401`
- Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
## Key Architecture Decisions
1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
## File Organization
```
backend/
main.py — FastAPI app, middleware, router mounting
config.py — Pydantic Settings from env vars
models.py — SQLAlchemy ORM models
schemas.py — Pydantic request/response schemas
auth.py — JWT + API key authentication
worker.py — Celery app configuration
routers/ — API endpoint handlers
engine/ — Core experiment execution logic
runner.py — Individual run execution
sweep.py — Sweep orchestration (grid/random/guided)
cache.py — Response cache layer
adapters/ — LLM endpoint adapters
scorers/ — Pluggable scoring functions
mcp/ — MCP server implementation
websocket/ — WebSocket connection management
frontend/src/
pages/ — Route-level components
components/ — Shared UI components
api/ — Typed API client functions
```
## Database Migrations
Use Alembic. Same patterns as Chrysopedia:
```bash
alembic revision --autogenerate -m "describe_change"
alembic upgrade head
```
## Running Locally
```bash
docker compose up -d promptlooper-db promptlooper-redis
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend in another terminal:
cd frontend && npm run dev
```
## Testing
```bash
cd backend && pytest
cd frontend && npm test
```
## Important Patterns
### Adding a new scorer
1. Create `backend/engine/scorers/my_scorer.py`
2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
3. Register in `backend/engine/scorers/__init__.py`
4. Add to frontend scorer picker component
### Adding a new LLM adapter
1. Create `backend/engine/adapters/my_adapter.py`
2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
3. Register in `backend/engine/adapters/__init__.py`
4. Currently only OpenAI-compatible is implemented; all others should be edge cases
### Adding a new MCP tool
1. Add tool definition in `backend/mcp/tools.py`
2. Implement handler in `backend/mcp/server.py`
3. Tools should map 1:1 to API endpoints where possible
## Common Gotchas
- Always hash the FULL config when checking cache — missing a single parameter means cache misses
- WebSocket connections must be cleaned up on disconnect — use the connection manager
- SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
- Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
- MCP server runs on a separate port from the main API
## Deployment
```bash
ssh ub01
cd /vmPool/r/repos/xpltdco/promptlooper
git pull && docker compose build && docker compose up -d
```

65
README.md Normal file
View file

@ -0,0 +1,65 @@
# PromptLooper
[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
[![Status: Alpha](https://img.shields.io/badge/Status-Alpha-orange.svg)]()
> The one who loops prompts — a universal LLM pipeline tuning workbench.
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
## Quick Start
### Single Container (zero dependencies)
```bash
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
```
Open `http://localhost:8400` — you'll be prompted to create an admin account on first boot.
### Production (Docker Compose)
```bash
git clone git@git.xpltd.co:xpltdco/promptlooper.git
cd promptlooper
cp .env.example .env
# Edit .env — set POSTGRES_PASSWORD and JWT_SECRET at minimum
docker compose up -d
```
## Features
- **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
- **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
- **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
- **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
- **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
- **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
## Development
```bash
# Start backing services
docker compose up -d promptlooper-db promptlooper-redis
# Backend
cd backend && pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Frontend (separate terminal)
cd frontend && npm install && npm run dev
```
## Testing
```bash
cd backend && pytest
cd frontend && npm test
```
## License
[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)

106
docker-compose.yml Normal file
View file

@ -0,0 +1,106 @@
name: xpltd_promptlooper
networks:
promptlooper:
driver: bridge
ipam:
config:
- subnet: 172.33.0.0/24
services:
promptlooper-db:
image: postgres:16-alpine
container_name: promptlooper-db
restart: unless-stopped
networks:
- promptlooper
ports:
- "5434:5432"
environment:
POSTGRES_USER: ${POSTGRES_USER:-promptlooper}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env}
POSTGRES_DB: ${POSTGRES_DB:-promptlooper}
volumes:
- /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-promptlooper}"]
interval: 10s
timeout: 5s
retries: 5
promptlooper-redis:
image: redis:7-alpine
container_name: promptlooper-redis
restart: unless-stopped
networks:
- promptlooper
volumes:
- /vmPool/r/services/promptlooper_redis:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
promptlooper-api:
build:
context: .
dockerfile: docker/Dockerfile
target: api
container_name: promptlooper-api
restart: unless-stopped
networks:
- promptlooper
ports:
- "8401:8401" # MCP server
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
REDIS_URL: redis://promptlooper-redis:6379/0
JWT_SECRET: ${JWT_SECRET:?Set JWT_SECRET in .env}
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
MCP_ENABLED: ${MCP_ENABLED:-true}
MCP_PORT: 8401
depends_on:
promptlooper-db:
condition: service_healthy
promptlooper-redis:
condition: service_healthy
promptlooper-worker:
build:
context: .
dockerfile: docker/Dockerfile
target: api
container_name: promptlooper-worker
restart: unless-stopped
networks:
- promptlooper
command: celery -A backend.worker:app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
REDIS_URL: redis://promptlooper-redis:6379/0
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
depends_on:
promptlooper-db:
condition: service_healthy
promptlooper-redis:
condition: service_healthy
promptlooper-web:
build:
context: .
dockerfile: docker/Dockerfile
target: web
container_name: promptlooper-web
restart: unless-stopped
networks:
- promptlooper
ports:
- "8400:80"
depends_on:
- promptlooper-api

23
env.example Normal file
View file

@ -0,0 +1,23 @@
# PromptLooper — Environment Configuration
# Copy to .env and fill in required values
# ── Database ──────────────────────────────────────────────
POSTGRES_USER=promptlooper
POSTGRES_PASSWORD= # REQUIRED: set a strong password
POSTGRES_DB=promptlooper
# ── Auth ──────────────────────────────────────────────────
JWT_SECRET= # REQUIRED: generate with `openssl rand -hex 32`
# ── Default LLM Endpoint (optional) ──────────────────────
# Pre-configure an LLM endpoint so users don't have to add one manually
DEFAULT_ENDPOINT_URL= # e.g. http://chat.forgetyour.name/api/v1
DEFAULT_ENDPOINT_KEY= # API key for the default endpoint
# ── Limits ────────────────────────────────────────────────
MAX_CONCURRENT_RUNS=4 # Parallel run limit per sweep
MAX_TOKENS_PER_SWEEP=0 # 0 = unlimited; set a number to cap token spend
# ── MCP Server ────────────────────────────────────────────
MCP_ENABLED=true # Enable/disable MCP server for agent access
# MCP_PORT=8401 # MCP server port (set in docker-compose)

635
promptlooper-spec.md Normal file
View file

@ -0,0 +1,635 @@
# PromptLooper
> The one who loops prompts — a universal LLM pipeline tuning workbench.
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
---
## Problem Statement
Anyone building LLM-powered applications faces the same painful loop:
1. Write a system prompt
2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
3. Run it against sample data
4. Read the output and decide if it's "good enough"
5. Tweak something and repeat
This process is manual, unscientific, and wasteful. There's no way to:
- Systematically compare configurations side-by-side
- Know if you've already tested a particular combination
- Quantify "better" beyond gut feeling
- Let an agent handle the iteration while you steer from above
- Share optimized configurations between projects or team members
PromptLooper makes this process systematic, observable, cached, and agent-drivable.
---
## Target Users
| User | Use Case |
|------|----------|
| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
---
## Core Concepts
### Experiment
A named configuration that defines:
- **Sample data**: Input documents, queries, or any text the pipeline will process
- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
- **Evaluation criteria**: Scoring functions that grade the output
- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
### Run
A single execution of one specific configuration within an experiment. A run captures:
- Full input configuration (prompt, model, all parameters)
- Raw LLM response(s)
- Timing data (latency, tokens in/out)
- Evaluation scores
- Configuration hash (for cache deduplication)
### Sweep
A batch of runs that systematically explores a parameter space. Types:
- **Grid sweep**: Every combination of specified parameter values
- **Random sweep**: Random sampling from parameter ranges
- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
### Scoring Function
A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
- **Embedding similarity**: How semantically close is the output to a reference answer?
- **Length compliance**: Does the output meet length constraints?
- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
- **Keyword presence**: Do required terms appear in the output?
- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
- **Custom function**: User-provided Python snippet or HTTP webhook
### Project
A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
- All experiments and their runs
- Saved "best" configurations
- Notes and annotations
- Export history
---
## Architecture
```
┌──────────────────────────────────────────────────────────────────────────┐
│ Docker Compose: xpltd_promptlooper (ub01) │
│ Network: promptlooper (172.33.0.0/24) │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ FastAPI (API) │ │
│ │ :5434 │ │ job queue │ │ Experiments, Runs, Scoring, │ │
│ │ experiments│ │ pub/sub │ │ Projects, Auth, MCP Server │ │
│ │ runs, cache│ │ live state │ │ WebSocket for live dashboard │ │
│ └─────┬───────┘ └──────┬──────┘ └──────────────┬───────────────────┘ │
│ │ │ │ │
│ ┌─────┴─────────────────┴────────────────────────┴───────────────────┐ │
│ │ Celery Worker │ │
│ │ Executes runs against target LLM endpoints │ │
│ │ Caches responses by config hash │ │
│ │ Streams progress via Redis pub/sub │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Web UI (React + Vite) │ │
│ │ nginx → :8400 │ │
│ │ Dashboard, Experiment Builder, Live Observability, Steering │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│ HTTP (OpenAI-compatible)
┌───────────────────────────────┐
│ Target LLM Endpoints │
│ OpenWebUI, vLLM, Ollama, │
│ OpenAI, Anthropic, any │
│ OpenAI-compatible API │
└───────────────────────────────┘
```
### Services (Production Compose)
| Service | Image | Port | Purpose |
|---------|-------|------|---------|
| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
### Single Container Mode
When `DATABASE_URL` is not set, PromptLooper runs with:
- SQLite at `/data/promptlooper.db`
- In-process task queue (no Celery/Redis dependency)
- All services in one container on port 8400
```bash
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
```
---
## Data Model
### User
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| username | string | Unique, "admin" created on first boot |
| password_hash | string | bcrypt |
| is_admin | bool | Default true for first user |
| created_at | timestamp | |
### Project
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| name | string | |
| description | text | Optional |
| owner_id | UUID | FK → User |
| created_at | timestamp | |
| updated_at | timestamp | |
### Experiment
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| project_id | UUID | FK → Project |
| name | string | |
| description | text | Optional |
| sample_data | JSONB | Input documents/queries |
| pipeline_stages | JSONB | Stage definitions with prompt templates |
| scoring_config | JSONB | Which scoring functions to use and their weights |
| parameter_space | JSONB | What to vary and ranges/options |
| status | enum | draft, running, paused, completed |
| created_at | timestamp | |
| updated_at | timestamp | |
### Run
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| experiment_id | UUID | FK → Experiment |
| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
| config | JSONB | Complete configuration snapshot |
| status | enum | pending, running, completed, failed, cached |
| started_at | timestamp | |
| completed_at | timestamp | |
| duration_ms | int | Wall clock time |
| tokens_in | int | Total input tokens across all stages |
| tokens_out | int | Total output tokens |
| cost_estimate | decimal | Estimated cost based on model pricing |
### StageResult
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| run_id | UUID | FK → Run |
| stage_index | int | 0-based stage number |
| prompt_sent | text | Actual prompt after template rendering |
| response_raw | text | Raw LLM response |
| model_used | string | Model identifier |
| parameters | JSONB | Temperature, top_p, etc. |
| tokens_in | int | This stage |
| tokens_out | int | This stage |
| latency_ms | int | This stage |
### Score
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| run_id | UUID | FK → Run |
| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
| value | float | Normalized 0.01.0 |
| metadata | JSONB | Scorer-specific details |
| created_at | timestamp | |
### ResponseCache
| Field | Type | Notes |
|-------|------|-------|
| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
| response | text | Cached LLM response |
| model | string | |
| tokens_in | int | |
| tokens_out | int | |
| latency_ms | int | Original latency |
| created_at | timestamp | |
### WebhookConfig
| Field | Type | Notes |
|-------|------|-------|
| id | UUID | PK |
| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
| url | string | Target URL |
| headers | JSONB | Optional auth headers |
| is_active | bool | |
---
## API Endpoints
### Auth
| Method | Path | Description |
|--------|------|-------------|
| POST | `/api/v1/auth/setup` | First-boot admin password setup |
| POST | `/api/v1/auth/login` | Login, returns JWT |
| GET | `/api/v1/auth/me` | Current user info |
### Admin
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
| PUT | `/api/v1/admin/settings` | Update settings |
| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
### Projects
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/projects` | List projects |
| POST | `/api/v1/projects` | Create project |
| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
| PUT | `/api/v1/projects/{id}` | Update project |
| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
### Experiments
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments` | List experiments (filter by project) |
| POST | `/api/v1/experiments` | Create experiment |
| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
| PUT | `/api/v1/experiments/{id}` | Update experiment config |
| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
### Runs
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
| GET | `/api/v1/runs/{id}` | Run detail with stage results |
| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
### Export
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
### LLM Endpoints (Target Management)
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/endpoints` | List configured LLM endpoints |
| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
### Webhooks
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/webhooks` | List webhook configs |
| POST | `/api/v1/webhooks` | Create webhook |
| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
### WebSocket
| Path | Description |
|------|-------------|
| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
| `/ws/dashboard` | Global activity feed across all experiments |
### Health
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check (DB + Redis connectivity) |
---
## MCP Server
PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
### MCP Tools
| Tool | Description |
|------|-------------|
| `create_project` | Create a new project workspace |
| `create_experiment` | Define an experiment with sample data, stages, and scoring |
| `configure_endpoint` | Add or update an LLM target endpoint |
| `run_single` | Execute one specific configuration and return results |
| `run_sweep` | Start a parameter sweep (grid/random/guided) |
| `get_leaderboard` | Get top N configurations ranked by score |
| `get_run_detail` | Get full details of a specific run |
| `export_best_config` | Export the best configuration in JSON/YAML/env format |
| `pause_sweep` | Pause a running sweep |
| `resume_sweep` | Resume a paused sweep |
| `add_human_score` | Rate a run's output |
| `get_experiment_status` | Check experiment progress |
| `list_models` | List available models across all configured endpoints |
### Example Agent Interaction
```
Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
sweeping temperature from 0.1 to 0.9 in 0.2 increments.
Use embedding similarity scoring against these reference outputs.
Run a grid sweep."
PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
→ streams progress → [get_leaderboard]
Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
a .env snippet I can drop into Chrysopedia."
PromptLooper MCP: [export_best_config format=env]
```
---
## Response Caching
Every LLM call is cached by a SHA-256 hash of:
- Prompt text (after template rendering)
- Model identifier
- All inference parameters (temperature, top_p, max_tokens, etc.)
- Input data
If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
- Re-running experiments with new scoring functions costs zero tokens
- Adding a new scorer retroactively evaluates all historical runs
- Accidentally re-running a sweep wastes nothing
- Cache can be invalidated per-run or per-experiment if needed
---
## Authentication Model
### First Boot
- App detects no users exist
- Presents a setup screen: create admin username + password
- Admin account is created, user is logged in
### Guest Access
- Admin can toggle `allow_guest_access` in settings
- Guests can view experiments and results (read-only)
- Guests cannot create experiments, run sweeps, or modify configs
- Default: guest access disabled
### API Authentication
- JWT tokens for the web UI
- API key (generated in admin settings) for programmatic access and MCP
- API key passed via `Authorization: Bearer <key>` header
---
## Real-Time Observability Dashboard
The dashboard is the primary user interface during active experimentation. It provides:
### Live Experiment View
- Progress bar: X of Y runs completed
- Token usage accumulator (running total)
- Cost estimate (based on configured model pricing)
- Cache hit rate for current sweep
- Estimated time remaining
### Side-by-Side Output Comparison
- Pick any two runs and diff their outputs
- Highlight differences in prompt, parameters, and response
- Score comparison overlay
### Leaderboard
- Real-time ranked list of runs by weighted score
- Sortable by any individual scorer
- Click to expand full run detail
### Steering Controls
- **Pause**: Stop the sweep after current run completes
- **Fork**: Create a new experiment branching from current best, with modified parameters
- **Redirect**: Change remaining sweep parameters mid-flight
- **Approve**: Mark a configuration as "good enough" and export
- **Reject**: Exclude a run from leaderboard consideration
### Activity Timeline
- Chronological feed of events: run started, run completed, new best found, cache hit, error
- Filterable by event type
---
## Webhook Events
| Event | Payload | Trigger |
|-------|---------|---------|
| `experiment.started` | experiment_id, sweep config | Sweep begins |
| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
| `experiment.paused` | experiment_id, reason | Manual or budget pause |
| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
| `human_needed` | experiment_id, reason, context | Agent requests human review |
| `run.failed` | run_id, error | Individual run error |
---
## Configuration Export Formats
### JSON
```json
{
"model": "qwen2.5-72b-instruct",
"endpoint": "http://chat.forgetyour.name/api",
"temperature": 0.3,
"top_p": 0.85,
"max_tokens": 2048,
"system_prompt": "You are a music production knowledge extractor...",
"score": 0.87,
"experiment": "chrysopedia-extraction-v2",
"exported_at": "2026-04-06T12:00:00Z"
}
```
### .env
```bash
LLM_MODEL=qwen2.5-72b-instruct
LLM_API_URL=http://chat.forgetyour.name/api
LLM_TEMPERATURE=0.3
LLM_TOP_P=0.85
LLM_MAX_TOKENS=2048
# Score: 0.87 | Experiment: chrysopedia-extraction-v2
```
### YAML
```yaml
model: qwen2.5-72b-instruct
endpoint: http://chat.forgetyour.name/api
parameters:
temperature: 0.3
top_p: 0.85
max_tokens: 2048
system_prompt: |
You are a music production knowledge extractor...
metadata:
score: 0.87
experiment: chrysopedia-extraction-v2
exported_at: 2026-04-06T12:00:00Z
```
---
## Environment Variables
| Group | Variable | Default | Notes |
|-------|----------|---------|-------|
| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
| **Server** | `HOST` | `0.0.0.0` | Bind address |
| **Server** | `PORT` | `8400` | HTTP port |
| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
| **MCP** | `MCP_PORT` | `8401` | MCP server port |
---
## Docker Compose (Production — XPLTD Conventions)
Project name: `xpltd_promptlooper`
Network: `promptlooper` (`172.33.0.0/24`)
Persistent data: `/vmPool/r/services/promptlooper_*`
PostgreSQL port: `5434` (external)
Web UI port: `8400` (external)
---
## Technology Stack
| Layer | Technology | Rationale |
|-------|-----------|-----------|
| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
| **Styling** | Tailwind CSS | Fast iteration, utility-first |
| **MCP** | Python MCP SDK | Standard protocol for agent integration |
| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
---
## Development & Deployment
### Local Development
```bash
git clone git@git.xpltd.co:xpltdco/promptlooper.git
cd promptlooper
cp .env.example .env
docker compose up -d promptlooper-db promptlooper-redis
cd backend && pip install -r requirements.txt
alembic upgrade head
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# In another terminal:
cd frontend && npm install && npm run dev
```
### Production Deployment (ub01)
```bash
ssh ub01
cd /vmPool/r/repos/xpltdco/promptlooper
git pull && docker compose build && docker compose up -d
```
### Project Structure
```
promptlooper/
├── backend/
│ ├── main.py # FastAPI entry point
│ ├── config.py # Pydantic Settings
│ ├── models.py # SQLAlchemy ORM
│ ├── schemas.py # Pydantic request/response
│ ├── auth.py # JWT + API key auth
│ ├── worker.py # Celery app config
│ ├── routers/
│ │ ├── auth.py
│ │ ├── projects.py
│ │ ├── experiments.py
│ │ ├── runs.py
│ │ ├── endpoints.py
│ │ ├── export.py
│ │ ├── webhooks.py
│ │ └── admin.py
│ ├── engine/
│ │ ├── runner.py # Run execution logic
│ │ ├── sweep.py # Sweep orchestration
│ │ ├── cache.py # Response cache layer
│ │ ├── adapters/ # LLM endpoint adapters
│ │ │ ├── openai_compat.py
│ │ │ └── base.py
│ │ └── scorers/ # Pluggable scoring functions
│ │ ├── embedding.py
│ │ ├── format.py
│ │ ├── keyword.py
│ │ ├── llm_judge.py
│ │ └── base.py
│ ├── mcp/
│ │ ├── server.py # MCP server implementation
│ │ └── tools.py # MCP tool definitions
│ ├── websocket/
│ │ └── manager.py # WebSocket connection management
│ └── tests/
├── frontend/
│ └── src/
│ ├── pages/
│ │ ├── Setup.tsx # First-boot admin setup
│ │ ├── Login.tsx
│ │ ├── Dashboard.tsx # Global activity
│ │ ├── Projects.tsx
│ │ ├── Experiment.tsx # Experiment builder + config
│ │ ├── Live.tsx # Real-time observability
│ │ ├── Compare.tsx # Side-by-side run comparison
│ │ └── Admin.tsx # System settings
│ ├── components/
│ │ ├── Leaderboard.tsx
│ │ ├── SteeringControls.tsx
│ │ ├── RunCard.tsx
│ │ ├── ScoreChart.tsx
│ │ └── Timeline.tsx
│ └── api/
├── docker/
│ ├── Dockerfile # Multi-stage: API + frontend
│ └── nginx.conf
├── alembic/
├── docker-compose.yml
├── .env.example
├── CLAUDE.md
└── README.md
```