Build the full Experiment Builder (ExperimentPage.tsx) with: basic info form,
sample data input (text/JSON/file upload), pipeline stage builder with template
variables and preview, scoring configuration with enable toggles and weight
sliders, parameter space definition (fixed/range/options types), and action
buttons (Save Draft, Run Single, Start Sweep). Supports both creating new
experiments and editing existing ones. 20 tests added.
Adds backend/engine/runner.py with run_single() that iterates pipeline stages,
renders Jinja2 prompt templates with stage history context, checks/stores response
cache, calls LLM adapters, runs configured scorers, creates StageResult and Score
records, and publishes progress events via Redis pub/sub or in-process EventBus.
Includes 21 passing tests covering all execution paths.
Add OpenAICompatAdapter that works with any OpenAI-compatible API endpoint
(OpenWebUI, vLLM, Ollama, OpenAI, Anthropic via proxy). Features:
- Async HTTP calls via httpx with configurable timeout
- Chat completions format with system + user messages
- Token usage parsing from responses
- Exponential backoff retries (configurable, default 3 attempts)
- Both streaming (SSE) and non-streaming modes
- Model listing and connection testing
- 21 tests covering construction, request building, response parsing,
retry logic, and error handling
- Full setup form with username, password, confirm password
- Auth detection on mount (redirects if already authenticated)
- Client-side validation (empty username, short password, mismatch)
- Server error handling (409 conflict, network errors)
- Welcoming UI with gradient background, dark mode support
- 9 new tests covering all states and error paths
- Updated App.test.tsx to handle async SetupPage rendering
- Added @testing-library/user-event dependency
Define the LLM adapter interface in backend/engine/adapters/base.py with
async methods complete(), list_models(), and test_connection(). The
AdapterResponse dataclass holds response text, token counts, latency,
model name, and raw metadata. Includes 11 tests covering instantiation
guards, concrete subclass behavior, and dataclass semantics.
Create docker/entrypoint.sh to run alembic migrations on API startup.
Create backend/worker.py with Celery app config for the compose worker service.
Fix README single-container port (8000) and add production compose documentation.
Add 27 tests (stack integration + worker) verifying all Docker/compose artifacts
are present, consistent, and the /health endpoint responds correctly.
Add SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage,
ComparePage, and AdminPage as placeholder components. Wire up react-router-dom routing
in App.tsx with BrowserRouter in main.tsx. Unknown routes redirect to dashboard.
Install vitest + @testing-library/react and add 9 routing tests. Build passes cleanly.
FastAPI application with:
- CORS middleware (permissive for dev)
- /health endpoint checking DB and Redis connectivity
- /ws WebSocket endpoint with ConnectionManager for real-time updates
- Async lifespan hooks for DB engine and Redis init/teardown
- get_db dependency for session management
- Dynamic router mounting that silently skips missing router modules
- 10 tests covering all endpoints and utilities
All 13 environment variables from the spec defined with proper defaults.
SQLite fallback when DATABASE_URL is unset, in-process queue flag when
REDIS_URL is unset, JWT_SECRET auto-generation, empty API_KEY normalization.
13 unit tests covering all configuration paths.
Three-stage Dockerfile: frontend-build (Node 20), api (Python 3.12 + uvicorn),
web (nginx 1.27). nginx.conf proxies /api and /ws to the API service with
WebSocket upgrade support. Includes backend/requirements.txt with all Python
deps, frontend scaffolding (Vite + React + TypeScript + Tailwind), and
placeholder alembic files for Docker COPY compatibility.
Fixed DATABASE_URL to use standard postgresql:// scheme, hardcoded DB
credentials for dev simplicity, added API_KEY pass-through, set worker
working_dir, and made JWT_SECRET optional with dev default. All 5 services:
db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400).
Includes all 13 env vars organized into 7 groups: Database, Redis,
Server, Auth, Default LLM Endpoint, Limits, Storage, and MCP.
Production-only variables are commented out; single-container defaults
work out of the box.
Set up all directories from the spec's Project Structure section:
- backend/ with routers/, engine/adapters/, engine/scorers/, mcp/,
websocket/, tests/ (all with __init__.py)
- frontend/src/ with pages/, components/, api/ (.gitkeep)
- docker/ (.gitkeep)
- alembic/versions/ (.gitkeep)
Add README.md with project description, quick-start instructions, and
AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker
artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and
env.example.