Add README.md with project description, quick-start instructions, and AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and env.example.
635 lines
25 KiB
Markdown
635 lines
25 KiB
Markdown
# PromptLooper
|
||
|
||
> The one who loops prompts — a universal LLM pipeline tuning workbench.
|
||
|
||
PromptLooper is a self-hosted tool for systematically optimizing LLM prompts, model selection, and inference parameters. It runs experiments across prompt × model × parameter combinations, caches every response, scores results against pluggable evaluation functions, and surfaces the best configurations through a real-time observability dashboard with human-in-the-loop steering.
|
||
|
||
It ships as a single Docker container (SQLite mode) for zero-config quickstart, or a Docker Compose stack (Postgres + Redis) for production use. An MCP server enables any AI agent to drive PromptLooper programmatically — creating experiments, running sweeps, and reading results without human intervention.
|
||
|
||
---
|
||
|
||
## Problem Statement
|
||
|
||
Anyone building LLM-powered applications faces the same painful loop:
|
||
|
||
1. Write a system prompt
|
||
2. Pick a model and parameters (temperature, top_p, max_tokens, etc.)
|
||
3. Run it against sample data
|
||
4. Read the output and decide if it's "good enough"
|
||
5. Tweak something and repeat
|
||
|
||
This process is manual, unscientific, and wasteful. There's no way to:
|
||
- Systematically compare configurations side-by-side
|
||
- Know if you've already tested a particular combination
|
||
- Quantify "better" beyond gut feeling
|
||
- Let an agent handle the iteration while you steer from above
|
||
- Share optimized configurations between projects or team members
|
||
|
||
PromptLooper makes this process systematic, observable, cached, and agent-drivable.
|
||
|
||
---
|
||
|
||
## Target Users
|
||
|
||
| User | Use Case |
|
||
|------|----------|
|
||
| **Solo developer** | Tuning prompts for a side project, wants to try 5 models and find the sweet spot |
|
||
| **Team building RAG pipelines** | Optimizing chunking + embedding + retrieval + synthesis prompts across stages |
|
||
| **AI agent (via MCP)** | Autonomously running optimization sweeps, reporting back to human when done |
|
||
| **Prompt engineer** | A/B testing prompt variants at scale with quantified scoring |
|
||
| **Infrastructure team** | Benchmarking new models against existing baselines before migration |
|
||
|
||
---
|
||
|
||
## Core Concepts
|
||
|
||
### Experiment
|
||
|
||
A named configuration that defines:
|
||
- **Sample data**: Input documents, queries, or any text the pipeline will process
|
||
- **Pipeline stages**: 1-N sequential stages, each with its own prompt template and model config
|
||
- **Evaluation criteria**: Scoring functions that grade the output
|
||
- **Parameter space**: What to vary (prompt text, model, temperature, top_p, chunk_size, etc.)
|
||
|
||
### Run
|
||
|
||
A single execution of one specific configuration within an experiment. A run captures:
|
||
- Full input configuration (prompt, model, all parameters)
|
||
- Raw LLM response(s)
|
||
- Timing data (latency, tokens in/out)
|
||
- Evaluation scores
|
||
- Configuration hash (for cache deduplication)
|
||
|
||
### Sweep
|
||
|
||
A batch of runs that systematically explores a parameter space. Types:
|
||
- **Grid sweep**: Every combination of specified parameter values
|
||
- **Random sweep**: Random sampling from parameter ranges
|
||
- **Guided sweep**: Agent-driven, where results from previous runs inform the next configuration to try
|
||
|
||
### Scoring Function
|
||
|
||
A pluggable evaluation that takes (input, output, context) and returns a numeric score. Built-in options:
|
||
- **Embedding similarity**: How semantically close is the output to a reference answer?
|
||
- **Length compliance**: Does the output meet length constraints?
|
||
- **Format compliance**: Does the output match expected structure (JSON, markdown, etc.)?
|
||
- **Keyword presence**: Do required terms appear in the output?
|
||
- **Human rating**: Manual thumbs-up/down or 1-5 star rating from the dashboard
|
||
- **LLM-as-judge**: Use a separate LLM call to evaluate quality (configurable judge prompt)
|
||
- **Custom function**: User-provided Python snippet or HTTP webhook
|
||
|
||
### Project
|
||
|
||
A workspace that groups related experiments. Users can return to a project and pick up where they left off. Projects store:
|
||
- All experiments and their runs
|
||
- Saved "best" configurations
|
||
- Notes and annotations
|
||
- Export history
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ Docker Compose: xpltd_promptlooper (ub01) │
|
||
│ Network: promptlooper (172.33.0.0/24) │
|
||
│ │
|
||
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────────────────────┐ │
|
||
│ │ PostgreSQL │ │ Redis │ │ FastAPI (API) │ │
|
||
│ │ :5434 │ │ job queue │ │ Experiments, Runs, Scoring, │ │
|
||
│ │ experiments│ │ pub/sub │ │ Projects, Auth, MCP Server │ │
|
||
│ │ runs, cache│ │ live state │ │ WebSocket for live dashboard │ │
|
||
│ └─────┬───────┘ └──────┬──────┘ └──────────────┬───────────────────┘ │
|
||
│ │ │ │ │
|
||
│ ┌─────┴─────────────────┴────────────────────────┴───────────────────┐ │
|
||
│ │ Celery Worker │ │
|
||
│ │ Executes runs against target LLM endpoints │ │
|
||
│ │ Caches responses by config hash │ │
|
||
│ │ Streams progress via Redis pub/sub │ │
|
||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||
│ │ Web UI (React + Vite) │ │
|
||
│ │ nginx → :8400 │ │
|
||
│ │ Dashboard, Experiment Builder, Live Observability, Steering │ │
|
||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
│
|
||
│ HTTP (OpenAI-compatible)
|
||
▼
|
||
┌───────────────────────────────┐
|
||
│ Target LLM Endpoints │
|
||
│ OpenWebUI, vLLM, Ollama, │
|
||
│ OpenAI, Anthropic, any │
|
||
│ OpenAI-compatible API │
|
||
└───────────────────────────────┘
|
||
```
|
||
|
||
### Services (Production Compose)
|
||
|
||
| Service | Image | Port | Purpose |
|
||
|---------|-------|------|---------|
|
||
| `promptlooper-db` | `postgres:16-alpine` | `5434 → 5432` | Primary data store |
|
||
| `promptlooper-redis` | `redis:7-alpine` | — | Celery broker + pub/sub for live dashboard |
|
||
| `promptlooper-api` | `Dockerfile` | `8000` | FastAPI REST API + MCP server |
|
||
| `promptlooper-worker` | `Dockerfile` | — | Celery worker (run execution) |
|
||
| `promptlooper-web` | `Dockerfile` | `8400 → 80` | React frontend (nginx) |
|
||
|
||
### Single Container Mode
|
||
|
||
When `DATABASE_URL` is not set, PromptLooper runs with:
|
||
- SQLite at `/data/promptlooper.db`
|
||
- In-process task queue (no Celery/Redis dependency)
|
||
- All services in one container on port 8400
|
||
|
||
```bash
|
||
docker run -p 8400:8400 -v promptlooper-data:/data ghcr.io/xpltdco/promptlooper
|
||
```
|
||
|
||
---
|
||
|
||
## Data Model
|
||
|
||
### User
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| username | string | Unique, "admin" created on first boot |
|
||
| password_hash | string | bcrypt |
|
||
| is_admin | bool | Default true for first user |
|
||
| created_at | timestamp | |
|
||
|
||
### Project
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| name | string | |
|
||
| description | text | Optional |
|
||
| owner_id | UUID | FK → User |
|
||
| created_at | timestamp | |
|
||
| updated_at | timestamp | |
|
||
|
||
### Experiment
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| project_id | UUID | FK → Project |
|
||
| name | string | |
|
||
| description | text | Optional |
|
||
| sample_data | JSONB | Input documents/queries |
|
||
| pipeline_stages | JSONB | Stage definitions with prompt templates |
|
||
| scoring_config | JSONB | Which scoring functions to use and their weights |
|
||
| parameter_space | JSONB | What to vary and ranges/options |
|
||
| status | enum | draft, running, paused, completed |
|
||
| created_at | timestamp | |
|
||
| updated_at | timestamp | |
|
||
|
||
### Run
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| experiment_id | UUID | FK → Experiment |
|
||
| config_hash | string(64) | SHA-256 of full configuration (for cache dedup) |
|
||
| config | JSONB | Complete configuration snapshot |
|
||
| status | enum | pending, running, completed, failed, cached |
|
||
| started_at | timestamp | |
|
||
| completed_at | timestamp | |
|
||
| duration_ms | int | Wall clock time |
|
||
| tokens_in | int | Total input tokens across all stages |
|
||
| tokens_out | int | Total output tokens |
|
||
| cost_estimate | decimal | Estimated cost based on model pricing |
|
||
|
||
### StageResult
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| run_id | UUID | FK → Run |
|
||
| stage_index | int | 0-based stage number |
|
||
| prompt_sent | text | Actual prompt after template rendering |
|
||
| response_raw | text | Raw LLM response |
|
||
| model_used | string | Model identifier |
|
||
| parameters | JSONB | Temperature, top_p, etc. |
|
||
| tokens_in | int | This stage |
|
||
| tokens_out | int | This stage |
|
||
| latency_ms | int | This stage |
|
||
|
||
### Score
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| run_id | UUID | FK → Run |
|
||
| scorer_name | string | e.g. "embedding_similarity", "human_rating" |
|
||
| value | float | Normalized 0.0–1.0 |
|
||
| metadata | JSONB | Scorer-specific details |
|
||
| created_at | timestamp | |
|
||
|
||
### ResponseCache
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| config_hash | string(64) | PK — SHA-256 of (prompt + model + params + input) |
|
||
| response | text | Cached LLM response |
|
||
| model | string | |
|
||
| tokens_in | int | |
|
||
| tokens_out | int | |
|
||
| latency_ms | int | Original latency |
|
||
| created_at | timestamp | |
|
||
|
||
### WebhookConfig
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| id | UUID | PK |
|
||
| event_type | string | experiment.complete, new_best_found, budget.exhausted, human_needed |
|
||
| url | string | Target URL |
|
||
| headers | JSONB | Optional auth headers |
|
||
| is_active | bool | |
|
||
|
||
---
|
||
|
||
## API Endpoints
|
||
|
||
### Auth
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/api/v1/auth/setup` | First-boot admin password setup |
|
||
| POST | `/api/v1/auth/login` | Login, returns JWT |
|
||
| GET | `/api/v1/auth/me` | Current user info |
|
||
|
||
### Admin
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/admin/settings` | System settings (guest access, default model, etc.) |
|
||
| PUT | `/api/v1/admin/settings` | Update settings |
|
||
| GET | `/api/v1/admin/stats` | System-wide stats (total runs, cache hit rate, etc.) |
|
||
|
||
### Projects
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/projects` | List projects |
|
||
| POST | `/api/v1/projects` | Create project |
|
||
| GET | `/api/v1/projects/{id}` | Project detail with experiment summaries |
|
||
| PUT | `/api/v1/projects/{id}` | Update project |
|
||
| DELETE | `/api/v1/projects/{id}` | Delete project and all experiments |
|
||
|
||
### Experiments
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/experiments` | List experiments (filter by project) |
|
||
| POST | `/api/v1/experiments` | Create experiment |
|
||
| GET | `/api/v1/experiments/{id}` | Experiment detail with run summaries |
|
||
| PUT | `/api/v1/experiments/{id}` | Update experiment config |
|
||
| DELETE | `/api/v1/experiments/{id}` | Delete experiment |
|
||
| POST | `/api/v1/experiments/{id}/sweep` | Start a sweep (grid, random, or guided) |
|
||
| POST | `/api/v1/experiments/{id}/pause` | Pause running sweep |
|
||
| POST | `/api/v1/experiments/{id}/resume` | Resume paused sweep |
|
||
| POST | `/api/v1/experiments/{id}/stop` | Stop sweep |
|
||
|
||
### Runs
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/experiments/{id}/runs` | List runs with scores (sortable, filterable) |
|
||
| GET | `/api/v1/runs/{id}` | Run detail with stage results |
|
||
| POST | `/api/v1/runs` | Execute a single run (ad-hoc) |
|
||
| POST | `/api/v1/runs/{id}/score` | Add human rating to a run |
|
||
| GET | `/api/v1/experiments/{id}/leaderboard` | Top runs ranked by weighted score |
|
||
|
||
### Export
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/experiments/{id}/export/best` | Best config as JSON |
|
||
| GET | `/api/v1/experiments/{id}/export/env` | Best config as .env snippet |
|
||
| GET | `/api/v1/experiments/{id}/export/yaml` | Best config as YAML |
|
||
| GET | `/api/v1/experiments/{id}/export/report` | Full experiment report (markdown) |
|
||
|
||
### LLM Endpoints (Target Management)
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/endpoints` | List configured LLM endpoints |
|
||
| POST | `/api/v1/endpoints` | Add endpoint (URL, API key, label) |
|
||
| PUT | `/api/v1/endpoints/{id}` | Update endpoint |
|
||
| DELETE | `/api/v1/endpoints/{id}` | Remove endpoint |
|
||
| POST | `/api/v1/endpoints/{id}/test` | Test connectivity and list available models |
|
||
|
||
### Webhooks
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/api/v1/webhooks` | List webhook configs |
|
||
| POST | `/api/v1/webhooks` | Create webhook |
|
||
| DELETE | `/api/v1/webhooks/{id}` | Remove webhook |
|
||
|
||
### WebSocket
|
||
| Path | Description |
|
||
|------|-------------|
|
||
| `/ws/experiments/{id}` | Live stream: run progress, scores, stage completions |
|
||
| `/ws/dashboard` | Global activity feed across all experiments |
|
||
|
||
### Health
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/health` | Health check (DB + Redis connectivity) |
|
||
|
||
---
|
||
|
||
## MCP Server
|
||
|
||
PromptLooper exposes an MCP (Model Context Protocol) server so AI agents can drive it programmatically. The MCP server runs as part of the API service.
|
||
|
||
### MCP Tools
|
||
|
||
| Tool | Description |
|
||
|------|-------------|
|
||
| `create_project` | Create a new project workspace |
|
||
| `create_experiment` | Define an experiment with sample data, stages, and scoring |
|
||
| `configure_endpoint` | Add or update an LLM target endpoint |
|
||
| `run_single` | Execute one specific configuration and return results |
|
||
| `run_sweep` | Start a parameter sweep (grid/random/guided) |
|
||
| `get_leaderboard` | Get top N configurations ranked by score |
|
||
| `get_run_detail` | Get full details of a specific run |
|
||
| `export_best_config` | Export the best configuration in JSON/YAML/env format |
|
||
| `pause_sweep` | Pause a running sweep |
|
||
| `resume_sweep` | Resume a paused sweep |
|
||
| `add_human_score` | Rate a run's output |
|
||
| `get_experiment_status` | Check experiment progress |
|
||
| `list_models` | List available models across all configured endpoints |
|
||
|
||
### Example Agent Interaction
|
||
|
||
```
|
||
Agent: "Create a project called 'Chrysopedia Extraction' and an experiment
|
||
that tests the stage3_extraction prompt against Qwen-72B and Qwen-32B,
|
||
sweeping temperature from 0.1 to 0.9 in 0.2 increments.
|
||
Use embedding similarity scoring against these reference outputs.
|
||
Run a grid sweep."
|
||
|
||
PromptLooper MCP: [create_project] → [create_experiment] → [run_sweep]
|
||
→ streams progress → [get_leaderboard]
|
||
|
||
Agent: "The top config uses Qwen-72B at temperature 0.3. Export it as
|
||
a .env snippet I can drop into Chrysopedia."
|
||
|
||
PromptLooper MCP: [export_best_config format=env]
|
||
```
|
||
|
||
---
|
||
|
||
## Response Caching
|
||
|
||
Every LLM call is cached by a SHA-256 hash of:
|
||
- Prompt text (after template rendering)
|
||
- Model identifier
|
||
- All inference parameters (temperature, top_p, max_tokens, etc.)
|
||
- Input data
|
||
|
||
If an identical configuration has been run before, the cached response is returned instantly with `status: cached`. This means:
|
||
- Re-running experiments with new scoring functions costs zero tokens
|
||
- Adding a new scorer retroactively evaluates all historical runs
|
||
- Accidentally re-running a sweep wastes nothing
|
||
- Cache can be invalidated per-run or per-experiment if needed
|
||
|
||
---
|
||
|
||
## Authentication Model
|
||
|
||
### First Boot
|
||
- App detects no users exist
|
||
- Presents a setup screen: create admin username + password
|
||
- Admin account is created, user is logged in
|
||
|
||
### Guest Access
|
||
- Admin can toggle `allow_guest_access` in settings
|
||
- Guests can view experiments and results (read-only)
|
||
- Guests cannot create experiments, run sweeps, or modify configs
|
||
- Default: guest access disabled
|
||
|
||
### API Authentication
|
||
- JWT tokens for the web UI
|
||
- API key (generated in admin settings) for programmatic access and MCP
|
||
- API key passed via `Authorization: Bearer <key>` header
|
||
|
||
---
|
||
|
||
## Real-Time Observability Dashboard
|
||
|
||
The dashboard is the primary user interface during active experimentation. It provides:
|
||
|
||
### Live Experiment View
|
||
- Progress bar: X of Y runs completed
|
||
- Token usage accumulator (running total)
|
||
- Cost estimate (based on configured model pricing)
|
||
- Cache hit rate for current sweep
|
||
- Estimated time remaining
|
||
|
||
### Side-by-Side Output Comparison
|
||
- Pick any two runs and diff their outputs
|
||
- Highlight differences in prompt, parameters, and response
|
||
- Score comparison overlay
|
||
|
||
### Leaderboard
|
||
- Real-time ranked list of runs by weighted score
|
||
- Sortable by any individual scorer
|
||
- Click to expand full run detail
|
||
|
||
### Steering Controls
|
||
- **Pause**: Stop the sweep after current run completes
|
||
- **Fork**: Create a new experiment branching from current best, with modified parameters
|
||
- **Redirect**: Change remaining sweep parameters mid-flight
|
||
- **Approve**: Mark a configuration as "good enough" and export
|
||
- **Reject**: Exclude a run from leaderboard consideration
|
||
|
||
### Activity Timeline
|
||
- Chronological feed of events: run started, run completed, new best found, cache hit, error
|
||
- Filterable by event type
|
||
|
||
---
|
||
|
||
## Webhook Events
|
||
|
||
| Event | Payload | Trigger |
|
||
|-------|---------|---------|
|
||
| `experiment.started` | experiment_id, sweep config | Sweep begins |
|
||
| `experiment.completed` | experiment_id, best config, summary stats | All runs finished |
|
||
| `experiment.paused` | experiment_id, reason | Manual or budget pause |
|
||
| `new_best_found` | experiment_id, run_id, scores, config | New top-scoring run |
|
||
| `budget.exhausted` | experiment_id, token_count, cost | Token/cost budget hit |
|
||
| `human_needed` | experiment_id, reason, context | Agent requests human review |
|
||
| `run.failed` | run_id, error | Individual run error |
|
||
|
||
---
|
||
|
||
## Configuration Export Formats
|
||
|
||
### JSON
|
||
```json
|
||
{
|
||
"model": "qwen2.5-72b-instruct",
|
||
"endpoint": "http://chat.forgetyour.name/api",
|
||
"temperature": 0.3,
|
||
"top_p": 0.85,
|
||
"max_tokens": 2048,
|
||
"system_prompt": "You are a music production knowledge extractor...",
|
||
"score": 0.87,
|
||
"experiment": "chrysopedia-extraction-v2",
|
||
"exported_at": "2026-04-06T12:00:00Z"
|
||
}
|
||
```
|
||
|
||
### .env
|
||
```bash
|
||
LLM_MODEL=qwen2.5-72b-instruct
|
||
LLM_API_URL=http://chat.forgetyour.name/api
|
||
LLM_TEMPERATURE=0.3
|
||
LLM_TOP_P=0.85
|
||
LLM_MAX_TOKENS=2048
|
||
# Score: 0.87 | Experiment: chrysopedia-extraction-v2
|
||
```
|
||
|
||
### YAML
|
||
```yaml
|
||
model: qwen2.5-72b-instruct
|
||
endpoint: http://chat.forgetyour.name/api
|
||
parameters:
|
||
temperature: 0.3
|
||
top_p: 0.85
|
||
max_tokens: 2048
|
||
system_prompt: |
|
||
You are a music production knowledge extractor...
|
||
metadata:
|
||
score: 0.87
|
||
experiment: chrysopedia-extraction-v2
|
||
exported_at: 2026-04-06T12:00:00Z
|
||
```
|
||
|
||
---
|
||
|
||
## Environment Variables
|
||
|
||
| Group | Variable | Default | Notes |
|
||
|-------|----------|---------|-------|
|
||
| **Database** | `DATABASE_URL` | (none → SQLite) | PostgreSQL connection string |
|
||
| **Redis** | `REDIS_URL` | (none → in-process) | Redis connection string |
|
||
| **Server** | `HOST` | `0.0.0.0` | Bind address |
|
||
| **Server** | `PORT` | `8400` | HTTP port |
|
||
| **Auth** | `JWT_SECRET` | (auto-generated) | JWT signing key |
|
||
| **Auth** | `API_KEY` | (none) | Static API key for programmatic access |
|
||
| **Defaults** | `DEFAULT_ENDPOINT_URL` | (none) | Pre-configured LLM endpoint |
|
||
| **Defaults** | `DEFAULT_ENDPOINT_KEY` | (none) | API key for default endpoint |
|
||
| **Limits** | `MAX_CONCURRENT_RUNS` | `4` | Parallel run limit |
|
||
| **Limits** | `MAX_TOKENS_PER_SWEEP` | `0` (unlimited) | Token budget per sweep |
|
||
| **Storage** | `DATA_DIR` | `/data` | SQLite DB + file storage location |
|
||
| **MCP** | `MCP_ENABLED` | `true` | Enable MCP server |
|
||
| **MCP** | `MCP_PORT` | `8401` | MCP server port |
|
||
|
||
---
|
||
|
||
## Docker Compose (Production — XPLTD Conventions)
|
||
|
||
Project name: `xpltd_promptlooper`
|
||
Network: `promptlooper` (`172.33.0.0/24`)
|
||
Persistent data: `/vmPool/r/services/promptlooper_*`
|
||
PostgreSQL port: `5434` (external)
|
||
Web UI port: `8400` (external)
|
||
|
||
---
|
||
|
||
## Technology Stack
|
||
|
||
| Layer | Technology | Rationale |
|
||
|-------|-----------|-----------|
|
||
| **API** | Python 3.12 + FastAPI | Async, OpenAPI auto-gen, matches XPLTD conventions |
|
||
| **Task Queue** | Celery + Redis | Proven for background job execution, matches Chrysopedia |
|
||
| **Database** | PostgreSQL 16 (prod) / SQLite (single-container) | JSONB for flexible experiment configs |
|
||
| **Real-time** | WebSocket via FastAPI + Redis pub/sub | Sub-second dashboard updates |
|
||
| **Frontend** | React 18 + TypeScript + Vite | Real-time dashboard, matches Chrysopedia |
|
||
| **Styling** | Tailwind CSS | Fast iteration, utility-first |
|
||
| **MCP** | Python MCP SDK | Standard protocol for agent integration |
|
||
| **Container** | Multi-stage Docker build | Single image serves both API and frontend |
|
||
|
||
---
|
||
|
||
## Development & Deployment
|
||
|
||
### Local Development
|
||
```bash
|
||
git clone git@git.xpltd.co:xpltdco/promptlooper.git
|
||
cd promptlooper
|
||
cp .env.example .env
|
||
docker compose up -d promptlooper-db promptlooper-redis
|
||
cd backend && pip install -r requirements.txt
|
||
alembic upgrade head
|
||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||
# In another terminal:
|
||
cd frontend && npm install && npm run dev
|
||
```
|
||
|
||
### Production Deployment (ub01)
|
||
```bash
|
||
ssh ub01
|
||
cd /vmPool/r/repos/xpltdco/promptlooper
|
||
git pull && docker compose build && docker compose up -d
|
||
```
|
||
|
||
### Project Structure
|
||
```
|
||
promptlooper/
|
||
├── backend/
|
||
│ ├── main.py # FastAPI entry point
|
||
│ ├── config.py # Pydantic Settings
|
||
│ ├── models.py # SQLAlchemy ORM
|
||
│ ├── schemas.py # Pydantic request/response
|
||
│ ├── auth.py # JWT + API key auth
|
||
│ ├── worker.py # Celery app config
|
||
│ ├── routers/
|
||
│ │ ├── auth.py
|
||
│ │ ├── projects.py
|
||
│ │ ├── experiments.py
|
||
│ │ ├── runs.py
|
||
│ │ ├── endpoints.py
|
||
│ │ ├── export.py
|
||
│ │ ├── webhooks.py
|
||
│ │ └── admin.py
|
||
│ ├── engine/
|
||
│ │ ├── runner.py # Run execution logic
|
||
│ │ ├── sweep.py # Sweep orchestration
|
||
│ │ ├── cache.py # Response cache layer
|
||
│ │ ├── adapters/ # LLM endpoint adapters
|
||
│ │ │ ├── openai_compat.py
|
||
│ │ │ └── base.py
|
||
│ │ └── scorers/ # Pluggable scoring functions
|
||
│ │ ├── embedding.py
|
||
│ │ ├── format.py
|
||
│ │ ├── keyword.py
|
||
│ │ ├── llm_judge.py
|
||
│ │ └── base.py
|
||
│ ├── mcp/
|
||
│ │ ├── server.py # MCP server implementation
|
||
│ │ └── tools.py # MCP tool definitions
|
||
│ ├── websocket/
|
||
│ │ └── manager.py # WebSocket connection management
|
||
│ └── tests/
|
||
├── frontend/
|
||
│ └── src/
|
||
│ ├── pages/
|
||
│ │ ├── Setup.tsx # First-boot admin setup
|
||
│ │ ├── Login.tsx
|
||
│ │ ├── Dashboard.tsx # Global activity
|
||
│ │ ├── Projects.tsx
|
||
│ │ ├── Experiment.tsx # Experiment builder + config
|
||
│ │ ├── Live.tsx # Real-time observability
|
||
│ │ ├── Compare.tsx # Side-by-side run comparison
|
||
│ │ └── Admin.tsx # System settings
|
||
│ ├── components/
|
||
│ │ ├── Leaderboard.tsx
|
||
│ │ ├── SteeringControls.tsx
|
||
│ │ ├── RunCard.tsx
|
||
│ │ ├── ScoreChart.tsx
|
||
│ │ └── Timeline.tsx
|
||
│ └── api/
|
||
├── docker/
|
||
│ ├── Dockerfile # Multi-stage: API + frontend
|
||
│ └── nginx.conf
|
||
├── alembic/
|
||
├── docker-compose.yml
|
||
├── .env.example
|
||
├── CLAUDE.md
|
||
└── README.md
|
||
```
|