Extract the inline LeaderboardTable from LivePage into a standalone
Leaderboard component with click-to-expand detail rows, sortable
columns, smooth slide-in animation for new entries, and a subtle
glow effect on the best run. 29 tests added.
Created engine/tasks.py with:
- execute_run and execute_sweep Celery tasks registered via autodiscover
- SyncTaskResult class mimicking Celery AsyncResult for in-process mode
- dispatch_run/dispatch_sweep helpers that route to Celery or sync based on config
- Proper async-to-sync bridging for the async engine functions
- 17 tests covering task execution, sync fallback, error handling, and Celery dispatch
Full LivePage implementation with 60/40 split layout:
- Left column: Activity Timeline with color-coded event cards (run.started, run.completed, new_best_found, cache_hit, run.failed), event type filtering, and auto-scroll toggle
- Right column: Leaderboard table with sortable columns, best-run highlighting, and status badges; Steering Controls with pause/resume/stop (with confirmation dialogs), progress bar, token counter, cost estimate, and cache hit rate
- WebSocket integration with exponential backoff reconnect, connection status indicator, and experiment subscription
- 35 tests covering loading/error states, WebSocket events, timeline filtering, leaderboard updates, progress tracking, and steering control interactions
Built standalone PromptEditor with transparent-textarea overlay for syntax
highlighting of Jinja2 expressions, statements, and comments. Includes
clickable variable sidebar for insertion and preview panel with sample data
substitution. Integrated into ExperimentPage PipelineStageCard. 27 tests added.
Adds backend/engine/sweep.py with three sweep strategies:
- GridSweep: exhaustive enumeration of all parameter combinations
- RandomSweep: N random samples from parameter ranges (list, min/max, step)
- GuidedSweep: top-K exploitation + random exploration from previous results
Features: bounded parallelism via asyncio.Semaphore, token budget enforcement,
Redis-based pause/resume/stop control flags, sweep-level event publishing.
36 tests in test_sweep.py covering config generation, helpers, and full sweep execution.
Build the full Experiment Builder (ExperimentPage.tsx) with: basic info form,
sample data input (text/JSON/file upload), pipeline stage builder with template
variables and preview, scoring configuration with enable toggles and weight
sliders, parameter space definition (fixed/range/options types), and action
buttons (Save Draft, Run Single, Start Sweep). Supports both creating new
experiments and editing existing ones. 20 tests added.
Adds backend/engine/runner.py with run_single() that iterates pipeline stages,
renders Jinja2 prompt templates with stage history context, checks/stores response
cache, calls LLM adapters, runs configured scorers, creates StageResult and Score
records, and publishes progress events via Redis pub/sub or in-process EventBus.
Includes 21 passing tests covering all execution paths.
Add OpenAICompatAdapter that works with any OpenAI-compatible API endpoint
(OpenWebUI, vLLM, Ollama, OpenAI, Anthropic via proxy). Features:
- Async HTTP calls via httpx with configurable timeout
- Chat completions format with system + user messages
- Token usage parsing from responses
- Exponential backoff retries (configurable, default 3 attempts)
- Both streaming (SSE) and non-streaming modes
- Model listing and connection testing
- 21 tests covering construction, request building, response parsing,
retry logic, and error handling
- Full setup form with username, password, confirm password
- Auth detection on mount (redirects if already authenticated)
- Client-side validation (empty username, short password, mismatch)
- Server error handling (409 conflict, network errors)
- Welcoming UI with gradient background, dark mode support
- 9 new tests covering all states and error paths
- Updated App.test.tsx to handle async SetupPage rendering
- Added @testing-library/user-event dependency
Define the LLM adapter interface in backend/engine/adapters/base.py with
async methods complete(), list_models(), and test_connection(). The
AdapterResponse dataclass holds response text, token counts, latency,
model name, and raw metadata. Includes 11 tests covering instantiation
guards, concrete subclass behavior, and dataclass semantics.
Create docker/entrypoint.sh to run alembic migrations on API startup.
Create backend/worker.py with Celery app config for the compose worker service.
Fix README single-container port (8000) and add production compose documentation.
Add 27 tests (stack integration + worker) verifying all Docker/compose artifacts
are present, consistent, and the /health endpoint responds correctly.
Add SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage,
ComparePage, and AdminPage as placeholder components. Wire up react-router-dom routing
in App.tsx with BrowserRouter in main.tsx. Unknown routes redirect to dashboard.
Install vitest + @testing-library/react and add 9 routing tests. Build passes cleanly.
FastAPI application with:
- CORS middleware (permissive for dev)
- /health endpoint checking DB and Redis connectivity
- /ws WebSocket endpoint with ConnectionManager for real-time updates
- Async lifespan hooks for DB engine and Redis init/teardown
- get_db dependency for session management
- Dynamic router mounting that silently skips missing router modules
- 10 tests covering all endpoints and utilities
All 13 environment variables from the spec defined with proper defaults.
SQLite fallback when DATABASE_URL is unset, in-process queue flag when
REDIS_URL is unset, JWT_SECRET auto-generation, empty API_KEY normalization.
13 unit tests covering all configuration paths.
Three-stage Dockerfile: frontend-build (Node 20), api (Python 3.12 + uvicorn),
web (nginx 1.27). nginx.conf proxies /api and /ws to the API service with
WebSocket upgrade support. Includes backend/requirements.txt with all Python
deps, frontend scaffolding (Vite + React + TypeScript + Tailwind), and
placeholder alembic files for Docker COPY compatibility.
Fixed DATABASE_URL to use standard postgresql:// scheme, hardcoded DB
credentials for dev simplicity, added API_KEY pass-through, set worker
working_dir, and made JWT_SECRET optional with dev default. All 5 services:
db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400).
Includes all 13 env vars organized into 7 groups: Database, Redis,
Server, Auth, Default LLM Endpoint, Limits, Storage, and MCP.
Production-only variables are commented out; single-container defaults
work out of the box.
Set up all directories from the spec's Project Structure section:
- backend/ with routers/, engine/adapters/, engine/scorers/, mcp/,
websocket/, tests/ (all with __init__.py)
- frontend/src/ with pages/, components/, api/ (.gitkeep)
- docker/ (.gitkeep)
- alembic/versions/ (.gitkeep)
Add README.md with project description, quick-start instructions, and
AGPL-3.0 license badge. Add .gitignore for Python, Node, and Docker
artifacts. Include existing CLAUDE.md, spec, docker-compose.yml, and
env.example.