- Two-column run selectors with experiment→run cascading dropdowns and URL state sync
- Config diff with color-coded same/changed/added/removed entries using key-level comparison
- Line-level LCS response diff with added/removed/same highlighting
- Score comparison with overlaid indigo/emerald bars per scorer
- Pick Winner buttons submit human_preference score via API
- Full RunCard detail view for each run side by side
- 15 tests added (5 diff helper unit tests + 10 component integration tests)
- App.test.tsx updated to mock experiments.list for ComparePage route
Extracted inline SteeringControls from LivePage into standalone component.
Added Fork button (modal to clone experiment config), Export Best dropdown
(JSON/YAML/.env download), and estimated time remaining stat. LivePage
updated to import the new component. 33 tests added, all 284 tests pass.
Extract the inline LeaderboardTable from LivePage into a standalone
Leaderboard component with click-to-expand detail rows, sortable
columns, smooth slide-in animation for new entries, and a subtle
glow effect on the best run. 29 tests added.
Full LivePage implementation with 60/40 split layout:
- Left column: Activity Timeline with color-coded event cards (run.started, run.completed, new_best_found, cache_hit, run.failed), event type filtering, and auto-scroll toggle
- Right column: Leaderboard table with sortable columns, best-run highlighting, and status badges; Steering Controls with pause/resume/stop (with confirmation dialogs), progress bar, token counter, cost estimate, and cache hit rate
- WebSocket integration with exponential backoff reconnect, connection status indicator, and experiment subscription
- 35 tests covering loading/error states, WebSocket events, timeline filtering, leaderboard updates, progress tracking, and steering control interactions
Built standalone PromptEditor with transparent-textarea overlay for syntax
highlighting of Jinja2 expressions, statements, and comments. Includes
clickable variable sidebar for insertion and preview panel with sample data
substitution. Integrated into ExperimentPage PipelineStageCard. 27 tests added.
Build the full Experiment Builder (ExperimentPage.tsx) with: basic info form,
sample data input (text/JSON/file upload), pipeline stage builder with template
variables and preview, scoring configuration with enable toggles and weight
sliders, parameter space definition (fixed/range/options types), and action
buttons (Save Draft, Run Single, Start Sweep). Supports both creating new
experiments and editing existing ones. 20 tests added.
- Full setup form with username, password, confirm password
- Auth detection on mount (redirects if already authenticated)
- Client-side validation (empty username, short password, mismatch)
- Server error handling (409 conflict, network errors)
- Welcoming UI with gradient background, dark mode support
- 9 new tests covering all states and error paths
- Updated App.test.tsx to handle async SetupPage rendering
- Added @testing-library/user-event dependency
Add SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage,
ComparePage, and AdminPage as placeholder components. Wire up react-router-dom routing
in App.tsx with BrowserRouter in main.tsx. Unknown routes redirect to dashboard.
Install vitest + @testing-library/react and add 9 routing tests. Build passes cleanly.
Three-stage Dockerfile: frontend-build (Node 20), api (Python 3.12 + uvicorn),
web (nginx 1.27). nginx.conf proxies /api and /ws to the API service with
WebSocket upgrade support. Includes backend/requirements.txt with all Python
deps, frontend scaffolding (Vite + React + TypeScript + Tailwind), and
placeholder alembic files for Docker COPY compatibility.
Set up all directories from the spec's Project Structure section:
- backend/ with routers/, engine/adapters/, engine/scorers/, mcp/,
websocket/, tests/ (all with __init__.py)
- frontend/src/ with pages/, components/, api/ (.gitkeep)
- docker/ (.gitkeep)
- alembic/versions/ (.gitkeep)