MAESTRO: Initialize repository with README, .gitignore, and project files
Add README.md with project description, quick-start instructions, and AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and env.example.
This commit is contained in:
commit
fc2e4cd7d1
6 changed files with 1013 additions and 0 deletions
57
.gitignore
vendored
Normal file
57
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.egg-info/
|
||||
*.egg
|
||||
dist/
|
||||
build/
|
||||
.eggs/
|
||||
*.whl
|
||||
.venv/
|
||||
venv/
|
||||
env/
|
||||
.env
|
||||
*.pyc
|
||||
.pytest_cache/
|
||||
.mypy_cache/
|
||||
.ruff_cache/
|
||||
htmlcov/
|
||||
.coverage
|
||||
.coverage.*
|
||||
|
||||
# Node / Frontend
|
||||
node_modules/
|
||||
frontend/dist/
|
||||
frontend/build/
|
||||
.npm
|
||||
*.tsbuildinfo
|
||||
|
||||
# Docker
|
||||
docker/nginx.conf.bak
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.DS_Store
|
||||
|
||||
# OS
|
||||
Thumbs.db
|
||||
Desktop.ini
|
||||
|
||||
# Data (single-container mode)
|
||||
*.db
|
||||
/data/
|
||||
|
||||
# Alembic
|
||||
alembic/versions/__pycache__/
|
||||
|
||||
# Auto Run Docs (Maestro working files)
|
||||
Auto Run Docs/Working/
|
||||
|
||||
# Misc
|
||||
*.log
|
||||
*.bak
|
||||
127
CLAUDE.md
Normal file
127
CLAUDE.md
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
# CLAUDE.md — PromptLooper
|
||||
|
||||
## What is this project?
|
||||
|
||||
PromptLooper is a self-hosted LLM pipeline tuning workbench. It runs experiments across prompt × model × parameter combinations, caches every response, scores results, and surfaces optimal configurations through a real-time dashboard. It has an MCP server so AI agents can drive it programmatically.
|
||||
|
||||
## Repository
|
||||
|
||||
- **Hosted at**: git.xpltd.co/xpltdco/promptlooper
|
||||
- **XPLTD project name**: `xpltd_promptlooper`
|
||||
- **Sister project**: Chrysopedia (git.xpltd.co/xpltdco/chrysopedia) — a knowledge extraction pipeline that is PromptLooper's first integration target
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Backend**: Python 3.12, FastAPI, Celery, SQLAlchemy, Alembic
|
||||
- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS
|
||||
- **Database**: PostgreSQL 16 (production) / SQLite (single-container mode)
|
||||
- **Cache/Queue**: Redis 7 (production) / in-process (single-container)
|
||||
- **Real-time**: WebSocket via FastAPI + Redis pub/sub
|
||||
- **MCP**: Python MCP SDK
|
||||
- **Container**: Multi-stage Docker build, nginx for frontend
|
||||
|
||||
## XPLTD Conventions
|
||||
|
||||
These are non-negotiable project conventions shared across all XPLTD projects:
|
||||
|
||||
- Docker Compose project name: `xpltd_promptlooper`
|
||||
- Dedicated bridge network: `promptlooper` (`172.33.0.0/24`)
|
||||
- Persistent data bind mounts under `/vmPool/r/services/promptlooper_*`
|
||||
- PostgreSQL on external port `5434` (internal `5432`)
|
||||
- Web UI on port `8400`
|
||||
- MCP server on port `8401`
|
||||
- Container naming: `promptlooper-{service}` (e.g., `promptlooper-api`, `promptlooper-db`)
|
||||
|
||||
## Key Architecture Decisions
|
||||
|
||||
1. **No LLM runs inside PromptLooper itself** — it's purely an HTTP client that calls external LLM endpoints. The only exception is the optional "LLM-as-judge" scorer.
|
||||
2. **Response caching by config hash** — SHA-256 of (prompt + model + params + input). Cache hits return instantly. This is critical for cost control.
|
||||
3. **Single-container mode** — when `DATABASE_URL` is not set, use SQLite + in-process queue. Zero dependencies.
|
||||
4. **WebSocket for real-time** — the dashboard connects via WebSocket to receive run progress, score updates, and steering events.
|
||||
5. **Pluggable scorers** — all scoring functions implement a base class with `score(input, output, context) → float` signature.
|
||||
6. **OpenAI-compatible adapter** — the LLM adapter layer speaks OpenAI's chat completions API. This covers OpenWebUI, vLLM, Ollama, and most providers.
|
||||
|
||||
## File Organization
|
||||
|
||||
```
|
||||
backend/
|
||||
main.py — FastAPI app, middleware, router mounting
|
||||
config.py — Pydantic Settings from env vars
|
||||
models.py — SQLAlchemy ORM models
|
||||
schemas.py — Pydantic request/response schemas
|
||||
auth.py — JWT + API key authentication
|
||||
worker.py — Celery app configuration
|
||||
routers/ — API endpoint handlers
|
||||
engine/ — Core experiment execution logic
|
||||
runner.py — Individual run execution
|
||||
sweep.py — Sweep orchestration (grid/random/guided)
|
||||
cache.py — Response cache layer
|
||||
adapters/ — LLM endpoint adapters
|
||||
scorers/ — Pluggable scoring functions
|
||||
mcp/ — MCP server implementation
|
||||
websocket/ — WebSocket connection management
|
||||
|
||||
frontend/src/
|
||||
pages/ — Route-level components
|
||||
components/ — Shared UI components
|
||||
api/ — Typed API client functions
|
||||
```
|
||||
|
||||
## Database Migrations
|
||||
|
||||
Use Alembic. Same patterns as Chrysopedia:
|
||||
```bash
|
||||
alembic revision --autogenerate -m "describe_change"
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
## Running Locally
|
||||
|
||||
```bash
|
||||
docker compose up -d promptlooper-db promptlooper-redis
|
||||
cd backend && uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
# Frontend in another terminal:
|
||||
cd frontend && npm run dev
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
cd backend && pytest
|
||||
cd frontend && npm test
|
||||
```
|
||||
|
||||
## Important Patterns
|
||||
|
||||
### Adding a new scorer
|
||||
1. Create `backend/engine/scorers/my_scorer.py`
|
||||
2. Implement `BaseScorer` with `name`, `score(input, output, context) → float`
|
||||
3. Register in `backend/engine/scorers/__init__.py`
|
||||
4. Add to frontend scorer picker component
|
||||
|
||||
### Adding a new LLM adapter
|
||||
1. Create `backend/engine/adapters/my_adapter.py`
|
||||
2. Implement `BaseAdapter` with `complete(prompt, model, params) → response`
|
||||
3. Register in `backend/engine/adapters/__init__.py`
|
||||
4. Currently only OpenAI-compatible is implemented; all others should be edge cases
|
||||
|
||||
### Adding a new MCP tool
|
||||
1. Add tool definition in `backend/mcp/tools.py`
|
||||
2. Implement handler in `backend/mcp/server.py`
|
||||
3. Tools should map 1:1 to API endpoints where possible
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
- Always hash the FULL config when checking cache — missing a single parameter means cache misses
|
||||
- WebSocket connections must be cleaned up on disconnect — use the connection manager
|
||||
- SQLite mode doesn't support concurrent writes — the in-process queue must be single-threaded
|
||||
- Frontend must handle both WebSocket and polling fallback for environments where WS is blocked
|
||||
- MCP server runs on a separate port from the main API
|
||||
|
||||
## Deployment
|
||||
|
||||
```bash
|
||||
ssh ub01
|
||||
cd /vmPool/r/repos/xpltdco/promptlooper
|
||||
git pull && docker compose build && docker compose up -d
|
||||
```
|
||||
65
README.md
Normal file
65
README.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
# PromptLooper
|
||||
|
||||
[](https://www.gnu.org/licenses/agpl-3.0)
|
||||
[]()
|
||||
|
||||
> The one who loops prompts — a universal LLM pipeline tuning workbench.
|
||||
|
||||
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt x model x parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
|
||||
|
||||
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Single Container (zero dependencies)
|
||||
|
||||
```bash
|
||||
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
|
||||
```
|
||||
|
||||
Open `http://localhost:8400` — you'll be prompted to create an admin account on first boot.
|
||||
|
||||
### Production (Docker Compose)
|
||||
|
||||
```bash
|
||||
git clone git@git.xpltd.co:xpltdco/promptlooper.git
|
||||
cd promptlooper
|
||||
cp .env.example .env
|
||||
# Edit .env — set POSTGRES_PASSWORD and JWT_SECRET at minimum
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Systematic experimentation** — grid, random, and guided sweeps across prompt x model x parameter space
|
||||
- **Response caching** — SHA-256 deduplication means re-runs cost zero tokens
|
||||
- **Pluggable scoring** — embedding similarity, format compliance, keyword presence, LLM-as-judge, human rating, custom webhooks
|
||||
- **Real-time dashboard** — live progress, leaderboard, side-by-side comparison, steering controls
|
||||
- **MCP server** — AI agents can create experiments, run sweeps, and export results programmatically
|
||||
- **Single-container mode** — SQLite + in-process queue when no external dependencies are configured
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Start backing services
|
||||
docker compose up -d promptlooper-db promptlooper-redis
|
||||
|
||||
# Backend
|
||||
cd backend && pip install -r requirements.txt
|
||||
alembic upgrade head
|
||||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# Frontend (separate terminal)
|
||||
cd frontend && npm install && npm run dev
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
cd backend && pytest
|
||||
cd frontend && npm test
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html)
|
||||
106
docker-compose.yml
Normal file
106
docker-compose.yml
Normal file
|
|
@ -0,0 +1,106 @@
|
|||
name: xpltd_promptlooper
|
||||
|
||||
networks:
|
||||
promptlooper:
|
||||
driver: bridge
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 172.33.0.0/24
|
||||
|
||||
services:
|
||||
promptlooper-db:
|
||||
image: postgres:16-alpine
|
||||
container_name: promptlooper-db
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- promptlooper
|
||||
ports:
|
||||
- "5434:5432"
|
||||
environment:
|
||||
POSTGRES_USER: ${POSTGRES_USER:-promptlooper}
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:?Set POSTGRES_PASSWORD in .env}
|
||||
POSTGRES_DB: ${POSTGRES_DB:-promptlooper}
|
||||
volumes:
|
||||
- /vmPool/r/services/promptlooper_db:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-promptlooper}"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
promptlooper-redis:
|
||||
image: redis:7-alpine
|
||||
container_name: promptlooper-redis
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- promptlooper
|
||||
volumes:
|
||||
- /vmPool/r/services/promptlooper_redis:/data
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
promptlooper-api:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: api
|
||||
container_name: promptlooper-api
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- promptlooper
|
||||
ports:
|
||||
- "8401:8401" # MCP server
|
||||
environment:
|
||||
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
|
||||
REDIS_URL: redis://promptlooper-redis:6379/0
|
||||
JWT_SECRET: ${JWT_SECRET:?Set JWT_SECRET in .env}
|
||||
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
|
||||
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
|
||||
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
|
||||
MAX_TOKENS_PER_SWEEP: ${MAX_TOKENS_PER_SWEEP:-0}
|
||||
MCP_ENABLED: ${MCP_ENABLED:-true}
|
||||
MCP_PORT: 8401
|
||||
depends_on:
|
||||
promptlooper-db:
|
||||
condition: service_healthy
|
||||
promptlooper-redis:
|
||||
condition: service_healthy
|
||||
|
||||
promptlooper-worker:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: api
|
||||
container_name: promptlooper-worker
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- promptlooper
|
||||
command: celery -A backend.worker:app worker --loglevel=info --concurrency=${MAX_CONCURRENT_RUNS:-4}
|
||||
environment:
|
||||
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-promptlooper}:${POSTGRES_PASSWORD}@promptlooper-db:5432/${POSTGRES_DB:-promptlooper}
|
||||
REDIS_URL: redis://promptlooper-redis:6379/0
|
||||
DEFAULT_ENDPOINT_URL: ${DEFAULT_ENDPOINT_URL:-}
|
||||
DEFAULT_ENDPOINT_KEY: ${DEFAULT_ENDPOINT_KEY:-}
|
||||
MAX_CONCURRENT_RUNS: ${MAX_CONCURRENT_RUNS:-4}
|
||||
depends_on:
|
||||
promptlooper-db:
|
||||
condition: service_healthy
|
||||
promptlooper-redis:
|
||||
condition: service_healthy
|
||||
|
||||
promptlooper-web:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: web
|
||||
container_name: promptlooper-web
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- promptlooper
|
||||
ports:
|
||||
- "8400:80"
|
||||
depends_on:
|
||||
- promptlooper-api
|
||||
23
env.example
Normal file
23
env.example
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
# PromptLooper — Environment Configuration
|
||||
# Copy to .env and fill in required values
|
||||
|
||||
# ── Database ──────────────────────────────────────────────
|
||||
POSTGRES_USER=promptlooper
|
||||
POSTGRES_PASSWORD= # REQUIRED: set a strong password
|
||||
POSTGRES_DB=promptlooper
|
||||
|
||||
# ── Auth ──────────────────────────────────────────────────
|
||||
JWT_SECRET= # REQUIRED: generate with `openssl rand -hex 32`
|
||||
|
||||
# ── Default LLM Endpoint (optional) ──────────────────────
|
||||
# Pre-configure an LLM endpoint so users don't have to add one manually
|
||||
DEFAULT_ENDPOINT_URL= # e.g. http://chat.forgetyour.name/api/v1
|
||||
DEFAULT_ENDPOINT_KEY= # API key for the default endpoint
|
||||
|
||||
# ── Limits ────────────────────────────────────────────────
|
||||
MAX_CONCURRENT_RUNS=4 # Parallel run limit per sweep
|
||||
MAX_TOKENS_PER_SWEEP=0 # 0 = unlimited; set a number to cap token spend
|
||||
|
||||
# ── MCP Server ────────────────────────────────────────────
|
||||
MCP_ENABLED=true # Enable/disable MCP server for agent access
|
||||
# MCP_PORT=8401 # MCP server port (set in docker-compose)
|
||||
635
promptlooper-spec.md
Normal file
635
promptlooper-spec.md
Normal file
|
|
@ -0,0 +1,635 @@
|
|||
# PromptLooper
|
||||
|
||||
> The one who loops prompts — a universal LLM pipeline tuning workbench.
|
||||
|
||||
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
|
||||
|
||||
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Anyone building LLM-powered applications faces the same painful loop:
|
||||
|
||||
1. Write a system prompt
|
||||
2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
|
||||
3. Run it against sample data
|
||||
4. Read the output and decide if it's "good enough"
|
||||
5. Tweak something and repeat
|
||||
|
||||
This process is manual, unscientific, and wasteful. There's no way to:
|
||||
- Systematically compare configurations side-by-side
|
||||
- Know if you've already tested a particular combination
|
||||
- Quantify "better" beyond gut feeling
|
||||
- Let an agent handle the iteration while you steer from above
|
||||
- Share optimized configurations between projects or team members
|
||||
|
||||
PromptLooper makes this process systematic, observable, cached, and agent-drivable.
|
||||
|
||||
---
|
||||
|
||||
## Target Users
|
||||
|
||||
| User | Use Case |
|
||||
|------|----------|
|
||||
| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
|
||||
| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
|
||||
| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
|
||||
| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
|
||||
| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
|
||||
|
||||
---
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Experiment
|
||||
|
||||
A named configuration that defines:
|
||||
- **Sample data**: Input documents, queries, or any text the pipeline will process
|
||||
- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
|
||||
- **Evaluation criteria**: Scoring functions that grade the output
|
||||
- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
|
||||
|
||||
### Run
|
||||
|
||||
A single execution of one specific configuration within an experiment. A run captures:
|
||||
- Full input configuration (prompt, model, all parameters)
|
||||
- Raw LLM response(s)
|
||||
- Timing data (latency, tokens in/out)
|
||||
- Evaluation scores
|
||||
- Configuration hash (for cache deduplication)
|
||||
|
||||
### Sweep
|
||||
|
||||
A batch of runs that systematically explores a parameter space. Types:
|
||||
- **Grid sweep**: Every combination of specified parameter values
|
||||
- **Random sweep**: Random sampling from parameter ranges
|
||||
- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
|
||||
|
||||
### Scoring Function
|
||||
|
||||
A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
|
||||
- **Embedding similarity**: How semantically close is the output to a reference answer?
|
||||
- **Length compliance**: Does the output meet length constraints?
|
||||
- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
|
||||
- **Keyword presence**: Do required terms appear in the output?
|
||||
- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
|
||||
- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
|
||||
- **Custom function**: User-provided Python snippet or HTTP webhook
|
||||
|
||||
### Project
|
||||
|
||||
A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
|
||||
- All experiments and their runs
|
||||
- Saved "best" configurations
|
||||
- Notes and annotations
|
||||
- Export history
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────┐
|
||||
│ Docker Compose: xpltd_promptlooper (ub01) │
|
||||
│ Network: promptlooper (172.33.0.0/24) │
|
||||
│ │
|
||||
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
|
||||
│ │ PostgreSQL │ │ Redis │ │ FastAPI (API) │ │
|
||||
│ │ :5434 │ │ job queue │ │ Experiments, Runs, Scoring, │ │
|
||||
│ │ experiments│ │ pub/sub │ │ Projects, Auth, MCP Server │ │
|
||||
│ │ runs, cache│ │ live state │ │ WebSocket for live dashboard │ │
|
||||
│ └─────┬───────┘ └──────┬──────┘ └──────────────┬───────────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌─────┴─────────────────┴────────────────────────┴───────────────────┐ │
|
||||
│ │ Celery Worker │ │
|
||||
│ │ Executes runs against target LLM endpoints │ │
|
||||
│ │ Caches responses by config hash │ │
|
||||
│ │ Streams progress via Redis pub/sub │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Web UI (React + Vite) │ │
|
||||
│ │ nginx → :8400 │ │
|
||||
│ │ Dashboard, Experiment Builder, Live Observability, Steering │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTP (OpenAI-compatible)
|
||||
▼
|
||||
┌───────────────────────────────┐
|
||||
│ Target LLM Endpoints │
|
||||
│ OpenWebUI, vLLM, Ollama, │
|
||||
│ OpenAI, Anthropic, any │
|
||||
│ OpenAI-compatible API │
|
||||
└───────────────────────────────┘
|
||||
```
|
||||
|
||||
### Services (Production Compose)
|
||||
|
||||
| Service | Image | Port | Purpose |
|
||||
|---------|-------|------|---------|
|
||||
| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
|
||||
| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
|
||||
| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
|
||||
| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
|
||||
| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
|
||||
|
||||
### Single Container Mode
|
||||
|
||||
When `DATABASE_URL` is not set, PromptLooper runs with:
|
||||
- SQLite at `/data/promptlooper.db`
|
||||
- In-process task queue (no Celery/Redis dependency)
|
||||
- All services in one container on port 8400
|
||||
|
||||
```bash
|
||||
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### User
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| username | string | Unique, "admin" created on first boot |
|
||||
| password_hash | string | bcrypt |
|
||||
| is_admin | bool | Default true for first user |
|
||||
| created_at | timestamp | |
|
||||
|
||||
### Project
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| name | string | |
|
||||
| description | text | Optional |
|
||||
| owner_id | UUID | FK → User |
|
||||
| created_at | timestamp | |
|
||||
| updated_at | timestamp | |
|
||||
|
||||
### Experiment
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| project_id | UUID | FK → Project |
|
||||
| name | string | |
|
||||
| description | text | Optional |
|
||||
| sample_data | JSONB | Input documents/queries |
|
||||
| pipeline_stages | JSONB | Stage definitions with prompt templates |
|
||||
| scoring_config | JSONB | Which scoring functions to use and their weights |
|
||||
| parameter_space | JSONB | What to vary and ranges/options |
|
||||
| status | enum | draft, running, paused, completed |
|
||||
| created_at | timestamp | |
|
||||
| updated_at | timestamp | |
|
||||
|
||||
### Run
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| experiment_id | UUID | FK → Experiment |
|
||||
| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
|
||||
| config | JSONB | Complete configuration snapshot |
|
||||
| status | enum | pending, running, completed, failed, cached |
|
||||
| started_at | timestamp | |
|
||||
| completed_at | timestamp | |
|
||||
| duration_ms | int | Wall clock time |
|
||||
| tokens_in | int | Total input tokens across all stages |
|
||||
| tokens_out | int | Total output tokens |
|
||||
| cost_estimate | decimal | Estimated cost based on model pricing |
|
||||
|
||||
### StageResult
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| run_id | UUID | FK → Run |
|
||||
| stage_index | int | 0-based stage number |
|
||||
| prompt_sent | text | Actual prompt after template rendering |
|
||||
| response_raw | text | Raw LLM response |
|
||||
| model_used | string | Model identifier |
|
||||
| parameters | JSONB | Temperature, top_p, etc. |
|
||||
| tokens_in | int | This stage |
|
||||
| tokens_out | int | This stage |
|
||||
| latency_ms | int | This stage |
|
||||
|
||||
### Score
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| run_id | UUID | FK → Run |
|
||||
| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
|
||||
| value | float | Normalized 0.0–1.0 |
|
||||
| metadata | JSONB | Scorer-specific details |
|
||||
| created_at | timestamp | |
|
||||
|
||||
### ResponseCache
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
|
||||
| response | text | Cached LLM response |
|
||||
| model | string | |
|
||||
| tokens_in | int | |
|
||||
| tokens_out | int | |
|
||||
| latency_ms | int | Original latency |
|
||||
| created_at | timestamp | |
|
||||
|
||||
### WebhookConfig
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| id | UUID | PK |
|
||||
| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
|
||||
| url | string | Target URL |
|
||||
| headers | JSONB | Optional auth headers |
|
||||
| is_active | bool | |
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Auth
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| POST | `/api/v1/auth/setup` | First-boot admin password setup |
|
||||
| POST | `/api/v1/auth/login` | Login, returns JWT |
|
||||
| GET | `/api/v1/auth/me` | Current user info |
|
||||
|
||||
### Admin
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
|
||||
| PUT | `/api/v1/admin/settings` | Update settings |
|
||||
| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
|
||||
|
||||
### Projects
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/projects` | List projects |
|
||||
| POST | `/api/v1/projects` | Create project |
|
||||
| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
|
||||
| PUT | `/api/v1/projects/{id}` | Update project |
|
||||
| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
|
||||
|
||||
### Experiments
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/experiments` | List experiments (filter by project) |
|
||||
| POST | `/api/v1/experiments` | Create experiment |
|
||||
| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
|
||||
| PUT | `/api/v1/experiments/{id}` | Update experiment config |
|
||||
| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
|
||||
| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
|
||||
| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
|
||||
| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
|
||||
| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
|
||||
|
||||
### Runs
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
|
||||
| GET | `/api/v1/runs/{id}` | Run detail with stage results |
|
||||
| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
|
||||
| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
|
||||
| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
|
||||
|
||||
### Export
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
|
||||
| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
|
||||
| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
|
||||
| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
|
||||
|
||||
### LLM Endpoints (Target Management)
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/endpoints` | List configured LLM endpoints |
|
||||
| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
|
||||
| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
|
||||
| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
|
||||
| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
|
||||
|
||||
### Webhooks
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/api/v1/webhooks` | List webhook configs |
|
||||
| POST | `/api/v1/webhooks` | Create webhook |
|
||||
| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
|
||||
|
||||
### WebSocket
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
|
||||
| `/ws/dashboard` | Global activity feed across all experiments |
|
||||
|
||||
### Health
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/health` | Health check (DB + Redis connectivity) |
|
||||
|
||||
---
|
||||
|
||||
## MCP Server
|
||||
|
||||
PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
|
||||
|
||||
### MCP Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `create_project` | Create a new project workspace |
|
||||
| `create_experiment` | Define an experiment with sample data, stages, and scoring |
|
||||
| `configure_endpoint` | Add or update an LLM target endpoint |
|
||||
| `run_single` | Execute one specific configuration and return results |
|
||||
| `run_sweep` | Start a parameter sweep (grid/random/guided) |
|
||||
| `get_leaderboard` | Get top N configurations ranked by score |
|
||||
| `get_run_detail` | Get full details of a specific run |
|
||||
| `export_best_config` | Export the best configuration in JSON/YAML/env format |
|
||||
| `pause_sweep` | Pause a running sweep |
|
||||
| `resume_sweep` | Resume a paused sweep |
|
||||
| `add_human_score` | Rate a run's output |
|
||||
| `get_experiment_status` | Check experiment progress |
|
||||
| `list_models` | List available models across all configured endpoints |
|
||||
|
||||
### Example Agent Interaction
|
||||
|
||||
```
|
||||
Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
|
||||
that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
|
||||
sweeping temperature from 0.1 to 0.9 in 0.2 increments.
|
||||
Use embedding similarity scoring against these reference outputs.
|
||||
Run a grid sweep."
|
||||
|
||||
PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
|
||||
→ streams progress → [get_leaderboard]
|
||||
|
||||
Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
|
||||
a .env snippet I can drop into Chrysopedia."
|
||||
|
||||
PromptLooper MCP: [export_best_config format=env]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Response Caching
|
||||
|
||||
Every LLM call is cached by a SHA-256 hash of:
|
||||
- Prompt text (after template rendering)
|
||||
- Model identifier
|
||||
- All inference parameters (temperature, top_p, max_tokens, etc.)
|
||||
- Input data
|
||||
|
||||
If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
|
||||
- Re-running experiments with new scoring functions costs zero tokens
|
||||
- Adding a new scorer retroactively evaluates all historical runs
|
||||
- Accidentally re-running a sweep wastes nothing
|
||||
- Cache can be invalidated per-run or per-experiment if needed
|
||||
|
||||
---
|
||||
|
||||
## Authentication Model
|
||||
|
||||
### First Boot
|
||||
- App detects no users exist
|
||||
- Presents a setup screen: create admin username + password
|
||||
- Admin account is created, user is logged in
|
||||
|
||||
### Guest Access
|
||||
- Admin can toggle `allow_guest_access` in settings
|
||||
- Guests can view experiments and results (read-only)
|
||||
- Guests cannot create experiments, run sweeps, or modify configs
|
||||
- Default: guest access disabled
|
||||
|
||||
### API Authentication
|
||||
- JWT tokens for the web UI
|
||||
- API key (generated in admin settings) for programmatic access and MCP
|
||||
- API key passed via `Authorization: Bearer <key>` header
|
||||
|
||||
---
|
||||
|
||||
## Real-Time Observability Dashboard
|
||||
|
||||
The dashboard is the primary user interface during active experimentation. It provides:
|
||||
|
||||
### Live Experiment View
|
||||
- Progress bar: X of Y runs completed
|
||||
- Token usage accumulator (running total)
|
||||
- Cost estimate (based on configured model pricing)
|
||||
- Cache hit rate for current sweep
|
||||
- Estimated time remaining
|
||||
|
||||
### Side-by-Side Output Comparison
|
||||
- Pick any two runs and diff their outputs
|
||||
- Highlight differences in prompt, parameters, and response
|
||||
- Score comparison overlay
|
||||
|
||||
### Leaderboard
|
||||
- Real-time ranked list of runs by weighted score
|
||||
- Sortable by any individual scorer
|
||||
- Click to expand full run detail
|
||||
|
||||
### Steering Controls
|
||||
- **Pause**: Stop the sweep after current run completes
|
||||
- **Fork**: Create a new experiment branching from current best, with modified parameters
|
||||
- **Redirect**: Change remaining sweep parameters mid-flight
|
||||
- **Approve**: Mark a configuration as "good enough" and export
|
||||
- **Reject**: Exclude a run from leaderboard consideration
|
||||
|
||||
### Activity Timeline
|
||||
- Chronological feed of events: run started, run completed, new best found, cache hit, error
|
||||
- Filterable by event type
|
||||
|
||||
---
|
||||
|
||||
## Webhook Events
|
||||
|
||||
| Event | Payload | Trigger |
|
||||
|-------|---------|---------|
|
||||
| `experiment.started` | experiment_id, sweep config | Sweep begins |
|
||||
| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
|
||||
| `experiment.paused` | experiment_id, reason | Manual or budget pause |
|
||||
| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
|
||||
| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
|
||||
| `human_needed` | experiment_id, reason, context | Agent requests human review |
|
||||
| `run.failed` | run_id, error | Individual run error |
|
||||
|
||||
---
|
||||
|
||||
## Configuration Export Formats
|
||||
|
||||
### JSON
|
||||
```json
|
||||
{
|
||||
"model": "qwen2.5-72b-instruct",
|
||||
"endpoint": "http://chat.forgetyour.name/api",
|
||||
"temperature": 0.3,
|
||||
"top_p": 0.85,
|
||||
"max_tokens": 2048,
|
||||
"system_prompt": "You are a music production knowledge extractor...",
|
||||
"score": 0.87,
|
||||
"experiment": "chrysopedia-extraction-v2",
|
||||
"exported_at": "2026-04-06T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### .env
|
||||
```bash
|
||||
LLM_MODEL=qwen2.5-72b-instruct
|
||||
LLM_API_URL=http://chat.forgetyour.name/api
|
||||
LLM_TEMPERATURE=0.3
|
||||
LLM_TOP_P=0.85
|
||||
LLM_MAX_TOKENS=2048
|
||||
# Score: 0.87 | Experiment: chrysopedia-extraction-v2
|
||||
```
|
||||
|
||||
### YAML
|
||||
```yaml
|
||||
model: qwen2.5-72b-instruct
|
||||
endpoint: http://chat.forgetyour.name/api
|
||||
parameters:
|
||||
temperature: 0.3
|
||||
top_p: 0.85
|
||||
max_tokens: 2048
|
||||
system_prompt: |
|
||||
You are a music production knowledge extractor...
|
||||
metadata:
|
||||
score: 0.87
|
||||
experiment: chrysopedia-extraction-v2
|
||||
exported_at: 2026-04-06T12:00:00Z
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Group | Variable | Default | Notes |
|
||||
|-------|----------|---------|-------|
|
||||
| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
|
||||
| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
|
||||
| **Server** | `HOST` | `0.0.0.0` | Bind address |
|
||||
| **Server** | `PORT` | `8400` | HTTP port |
|
||||
| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
|
||||
| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
|
||||
| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
|
||||
| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
|
||||
| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
|
||||
| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
|
||||
| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
|
||||
| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
|
||||
| **MCP** | `MCP_PORT` | `8401` | MCP server port |
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose (Production — XPLTD Conventions)
|
||||
|
||||
Project name: `xpltd_promptlooper`
|
||||
Network: `promptlooper` (`172.33.0.0/24`)
|
||||
Persistent data: `/vmPool/r/services/promptlooper_*`
|
||||
PostgreSQL port: `5434` (external)
|
||||
Web UI port: `8400` (external)
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Technology | Rationale |
|
||||
|-------|-----------|-----------|
|
||||
| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
|
||||
| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
|
||||
| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
|
||||
| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
|
||||
| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
|
||||
| **Styling** | Tailwind CSS | Fast iteration, utility-first |
|
||||
| **MCP** | Python MCP SDK | Standard protocol for agent integration |
|
||||
| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
|
||||
|
||||
---
|
||||
|
||||
## Development & Deployment
|
||||
|
||||
### Local Development
|
||||
```bash
|
||||
git clone git@git.xpltd.co:xpltdco/promptlooper.git
|
||||
cd promptlooper
|
||||
cp .env.example .env
|
||||
docker compose up -d promptlooper-db promptlooper-redis
|
||||
cd backend && pip install -r requirements.txt
|
||||
alembic upgrade head
|
||||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
# In another terminal:
|
||||
cd frontend && npm install && npm run dev
|
||||
```
|
||||
|
||||
### Production Deployment (ub01)
|
||||
```bash
|
||||
ssh ub01
|
||||
cd /vmPool/r/repos/xpltdco/promptlooper
|
||||
git pull && docker compose build && docker compose up -d
|
||||
```
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
promptlooper/
|
||||
├── backend/
|
||||
│ ├── main.py # FastAPI entry point
|
||||
│ ├── config.py # Pydantic Settings
|
||||
│ ├── models.py # SQLAlchemy ORM
|
||||
│ ├── schemas.py # Pydantic request/response
|
||||
│ ├── auth.py # JWT + API key auth
|
||||
│ ├── worker.py # Celery app config
|
||||
│ ├── routers/
|
||||
│ │ ├── auth.py
|
||||
│ │ ├── projects.py
|
||||
│ │ ├── experiments.py
|
||||
│ │ ├── runs.py
|
||||
│ │ ├── endpoints.py
|
||||
│ │ ├── export.py
|
||||
│ │ ├── webhooks.py
|
||||
│ │ └── admin.py
|
||||
│ ├── engine/
|
||||
│ │ ├── runner.py # Run execution logic
|
||||
│ │ ├── sweep.py # Sweep orchestration
|
||||
│ │ ├── cache.py # Response cache layer
|
||||
│ │ ├── adapters/ # LLM endpoint adapters
|
||||
│ │ │ ├── openai_compat.py
|
||||
│ │ │ └── base.py
|
||||
│ │ └── scorers/ # Pluggable scoring functions
|
||||
│ │ ├── embedding.py
|
||||
│ │ ├── format.py
|
||||
│ │ ├── keyword.py
|
||||
│ │ ├── llm_judge.py
|
||||
│ │ └── base.py
|
||||
│ ├── mcp/
|
||||
│ │ ├── server.py # MCP server implementation
|
||||
│ │ └── tools.py # MCP tool definitions
|
||||
│ ├── websocket/
|
||||
│ │ └── manager.py # WebSocket connection management
|
||||
│ └── tests/
|
||||
├── frontend/
|
||||
│ └── src/
|
||||
│ ├── pages/
|
||||
│ │ ├── Setup.tsx # First-boot admin setup
|
||||
│ │ ├── Login.tsx
|
||||
│ │ ├── Dashboard.tsx # Global activity
|
||||
│ │ ├── Projects.tsx
|
||||
│ │ ├── Experiment.tsx # Experiment builder + config
|
||||
│ │ ├── Live.tsx # Real-time observability
|
||||
│ │ ├── Compare.tsx # Side-by-side run comparison
|
||||
│ │ └── Admin.tsx # System settings
|
||||
│ ├── components/
|
||||
│ │ ├── Leaderboard.tsx
|
||||
│ │ ├── SteeringControls.tsx
|
||||
│ │ ├── RunCard.tsx
|
||||
│ │ ├── ScoreChart.tsx
|
||||
│ │ └── Timeline.tsx
|
||||
│ └── api/
|
||||
├── docker/
|
||||
│ ├── Dockerfile # Multi-stage: API + frontend
|
||||
│ └── nginx.conf
|
||||
├── alembic/
|
||||
├── docker-compose.yml
|
||||
├── .env.example
|
||||
├── CLAUDE.md
|
||||
└── README.md
|
||||
```
|
||||
Loading…
Add table
Reference in a new issue