Commit graph

36 commits

Author SHA1 Message Date
John Lightner
e6c344d554 MAESTRO: Build RunCard expandable component with scores, prompts, responses, and stage timing
Implements RunCard.tsx with expandable card showing config summary, cache
status badge, score bars, config JSON, per-stage timing breakdown,
collapsible prompt/response sections (with copy button), and metadata footer.
26 tests added, all 310 tests pass.
2026-04-07 03:20:38 -05:00
John Lightner
82e97e9dba MAESTRO: Implement experiments router with full CRUD and sweep control endpoints
Add complete experiments API: list (with project filter), get, create, update,
delete, plus sweep lifecycle (start/pause/resume/stop/status). Adds
SweepRequest and SweepStatusResponse schemas. Sweep dispatch routes through
Celery with synchronous fallback for single-container mode. Redis flags control
pause/resume/stop; direct DB updates used when Redis unavailable. 34 tests.
2026-04-07 03:19:43 -05:00
John Lightner
59f18a11c3 MAESTRO: Extract SteeringControls into standalone component with Fork, Export, and ETA
Extracted inline SteeringControls from LivePage into standalone component.
Added Fork button (modal to clone experiment config), Export Best dropdown
(JSON/YAML/.env download), and estimated time remaining stat. LivePage
updated to import the new component. 33 tests added, all 284 tests pass.
2026-04-07 03:17:47 -05:00
John Lightner
35d72e7fa8 MAESTRO: Implement LLM endpoints router with CRUD, test_connection, and Fernet-encrypted API key storage
- Add LLMEndpoint model to models.py with encrypted api_key field
- Create encryption.py with Fernet symmetric encryption (key derived from JWT_SECRET via PBKDF2)
- Implement full endpoints router: list, get, create, update, delete + test_connection
- Test endpoint calls adapter.test_connection() and list_models()
- API keys never exposed in responses; has_api_key boolean flag added
- 25 tests in test_endpoints.py, all 444 tests passing
2026-04-07 03:13:52 -05:00
John Lightner
1253994c9e MAESTRO: Extract Activity Timeline into standalone component with filter, auto-scroll, and color-coded events 2026-04-07 03:13:31 -05:00
John Lightner
cf49e9c888 MAESTRO: Extract Leaderboard into standalone component with expand, sort, and animation
Extract the inline LeaderboardTable from LivePage into a standalone
Leaderboard component with click-to-expand detail rows, sortable
columns, smooth slide-in animation for new entries, and a subtle
glow effect on the best run. 29 tests added.
2026-04-07 03:10:08 -05:00
John Lightner
b16454994e MAESTRO: Implement Celery tasks (execute_run, execute_sweep) with synchronous fallback for single-container mode
Created engine/tasks.py with:
- execute_run and execute_sweep Celery tasks registered via autodiscover
- SyncTaskResult class mimicking Celery AsyncResult for in-process mode
- dispatch_run/dispatch_sweep helpers that route to Celery or sync based on config
- Proper async-to-sync bridging for the async engine functions
- 17 tests covering task execution, sync fallback, error handling, and Celery dispatch
2026-04-07 03:08:41 -05:00
John Lightner
fb78eac1b0 MAESTRO: Implement LLMJudgeScorer with configurable judge prompt, rating parsing, and response caching 2026-04-07 03:05:00 -05:00
John Lightner
0d5a6169c5 MAESTRO: Implement KeywordScorer with presence/absence keyword checking and ratio scoring 2026-04-07 03:02:40 -05:00
John Lightner
bc1d41e3a6 MAESTRO: Implement FormatScorer with json, markdown, length, and structure format checks
Adds format.py scorer supporting four validation modes:
- json: validates parseable JSON
- markdown: checks for headers (0.5) and lists (0.5)
- length: proportional scoring against min/max token bounds
- structure: JSON schema validation via jsonschema library

Includes 38 passing tests covering all format types, edge cases, and async delegation.
2026-04-07 03:00:56 -05:00
John Lightner
7fc2a2b8c3 MAESTRO: Implement ModelSelector component with endpoint grouping, refresh, and connectivity indicators 2026-04-07 03:00:10 -05:00
John Lightner
3cc1e22e3f MAESTRO: Implement EmbeddingScorer with cosine similarity scoring via OpenAI-compatible embedding API 2026-04-07 02:58:00 -05:00
John Lightner
f2e6baa56f MAESTRO: Implement PromptEditor component with Jinja2 syntax highlighting, variable sidebar, and preview
Built standalone PromptEditor with transparent-textarea overlay for syntax
highlighting of Jinja2 expressions, statements, and comments. Includes
clickable variable sidebar for insertion and preview panel with sample data
substitution. Integrated into ExperimentPage PipelineStageCard. 27 tests added.
2026-04-07 02:56:48 -05:00
John Lightner
405bbf8206 MAESTRO: Implement BaseScorer abstract class with sync/async scoring interface
Adds backend/engine/scorers/base.py with abstract name property, score() method,
and score_async() default implementation. Updates scorers __init__.py to export
BaseScorer. Includes 9 tests covering instantiation guards, sync/async dispatch,
context dict usage, and partial implementation rejection.
2026-04-07 02:55:05 -05:00
John Lightner
ba8cb7e2c6 MAESTRO: Implement sweep orchestration engine with grid, random, and guided sweep types
Adds backend/engine/sweep.py with three sweep strategies:
- GridSweep: exhaustive enumeration of all parameter combinations
- RandomSweep: N random samples from parameter ranges (list, min/max, step)
- GuidedSweep: top-K exploitation + random exploration from previous results

Features: bounded parallelism via asyncio.Semaphore, token budget enforcement,
Redis-based pause/resume/stop control flags, sweep-level event publishing.
36 tests in test_sweep.py covering config generation, helpers, and full sweep execution.
2026-04-07 02:53:30 -05:00
John Lightner
d607970f0c MAESTRO: Implement run execution engine with Jinja2 templating, caching, scoring, and event bus
Adds backend/engine/runner.py with run_single() that iterates pipeline stages,
renders Jinja2 prompt templates with stage history context, checks/stores response
cache, calls LLM adapters, runs configured scorers, creates StageResult and Score
records, and publishes progress events via Redis pub/sub or in-process EventBus.
Includes 21 passing tests covering all execution paths.
2026-04-07 02:48:20 -05:00
John Lightner
04a96f3dc3 MAESTRO: Implement Projects page with card grid, creation modal, and comprehensive tests 2026-04-07 02:47:24 -05:00
John Lightner
0e6ae49b3c MAESTRO: Implement AuthContext provider with JWT management, session validation, and protected route redirects 2026-04-07 02:38:23 -05:00
John Lightner
f60128604f MAESTRO: Implement ResponseCache layer with SHA-256 config hashing and hit-rate tracking 2026-04-07 02:37:58 -05:00
John Lightner
bf1e9d1c84 MAESTRO: Implement OpenAI-compatible LLM adapter with streaming, retries, and tests
Add OpenAICompatAdapter that works with any OpenAI-compatible API endpoint
(OpenWebUI, vLLM, Ollama, OpenAI, Anthropic via proxy). Features:
- Async HTTP calls via httpx with configurable timeout
- Chat completions format with system + user messages
- Token usage parsing from responses
- Exponential backoff retries (configurable, default 3 attempts)
- Both streaming (SSE) and non-streaming modes
- Model listing and connection testing
- 21 tests covering construction, request building, response parsing,
  retry logic, and error handling
2026-04-07 02:35:52 -05:00
John Lightner
060f399789 MAESTRO: Implement Login page with form validation, error handling, and guest access link 2026-04-07 02:35:34 -05:00
John Lightner
1050109777 MAESTRO: Implement Setup page with first-boot admin creation flow
- Full setup form with username, password, confirm password
- Auth detection on mount (redirects if already authenticated)
- Client-side validation (empty username, short password, mismatch)
- Server error handling (409 conflict, network errors)
- Welcoming UI with gradient background, dark mode support
- 9 new tests covering all states and error paths
- Updated App.test.tsx to handle async SetupPage rendering
- Added @testing-library/user-event dependency
2026-04-07 02:34:00 -05:00
John Lightner
9e0dc4e9fe MAESTRO: Implement BaseAdapter abstract class and AdapterResponse dataclass
Define the LLM adapter interface in backend/engine/adapters/base.py with
async methods complete(), list_models(), and test_connection(). The
AdapterResponse dataclass holds response text, token counts, latency,
model name, and raw metadata. Includes 11 tests covering instantiation
guards, concrete subclass behavior, and dataclass semantics.
2026-04-07 02:32:57 -05:00
John Lightner
7dad9d97af MAESTRO: Add entrypoint migrations, worker config, and stack integration tests
Create docker/entrypoint.sh to run alembic migrations on API startup.
Create backend/worker.py with Celery app config for the compose worker service.
Fix README single-container port (8000) and add production compose documentation.
Add 27 tests (stack integration + worker) verifying all Docker/compose artifacts
are present, consistent, and the /health endpoint responds correctly.
2026-04-07 02:09:56 -05:00
John Lightner
43d2aafbbe MAESTRO: Create typed API client with in-memory JWT auth, fetch wrappers, and WebSocket helper 2026-04-07 02:07:03 -05:00
John Lightner
4cd0b8a1c8 MAESTRO: Initialize frontend routing with 8 placeholder page components and vitest test suite
Add SetupPage, LoginPage, DashboardPage, ProjectsPage, ExperimentPage, LivePage,
ComparePage, and AdminPage as placeholder components. Wire up react-router-dom routing
in App.tsx with BrowserRouter in main.tsx. Unknown routes redirect to dashboard.
Install vitest + @testing-library/react and add 9 routing tests. Build passes cleanly.
2026-04-07 02:03:48 -05:00
John Lightner
848fb06407 MAESTRO: Create backend/auth.py with JWT, API key auth, and first-boot setup flow 2026-04-07 01:59:24 -05:00
John Lightner
15ca2c922a MAESTRO: Create backend/main.py with FastAPI app, CORS, health check, WebSocket, and router mounting
FastAPI application with:
- CORS middleware (permissive for dev)
- /health endpoint checking DB and Redis connectivity
- /ws WebSocket endpoint with ConnectionManager for real-time updates
- Async lifespan hooks for DB engine and Redis init/teardown
- get_db dependency for session management
- Dynamic router mounting that silently skips missing router modules
- 10 tests covering all endpoints and utilities
2026-04-07 01:56:40 -05:00
John Lightner
42668eeeb1 MAESTRO: Create backend/schemas.py with all Pydantic request/response schemas
Create/update/response schemas for Project, Experiment, Run, Endpoint,
Webhook, Score, Auth (setup/login/token), Export, and Health. All use
Pydantic v2 ConfigDict(from_attributes=True) for ORM compatibility.
RunDetailResponse nests StageResults and Scores. ExportRunRow provides
flat scorer_name→value dict for CSV/JSON export. 30 tests added.
2026-04-07 01:54:02 -05:00
John Lightner
0ec75ab617 MAESTRO: Set up Alembic with initial migration for all 8 ORM models 2026-04-07 01:52:03 -05:00
John Lightner
7ef116e2f9 MAESTRO: Create backend/models.py with all 8 SQLAlchemy ORM models from spec
Define User, Project, Experiment, Run, StageResult, Score, ResponseCache,
and WebhookConfig with UUID primary keys, JSON columns, enum types
(ExperimentStatus, RunStatus), full relationship cascades, and indexes.
Uses sqlalchemy.JSON (not JSONB) for SQLite compatibility in single-container
mode. 16 tests added covering table creation, CRUD, uniqueness constraints,
default values, and cascade deletes — all passing.
2026-04-07 01:49:10 -05:00
John Lightner
309bbacb5d MAESTRO: Create backend/config.py with Pydantic Settings and SQLite/in-process fallback
All 13 environment variables from the spec defined with proper defaults.
SQLite fallback when DATABASE_URL is unset, in-process queue flag when
REDIS_URL is unset, JWT_SECRET auto-generation, empty API_KEY normalization.
13 unit tests covering all configuration paths.
2026-04-07 01:46:30 -05:00
John Lightner
9e2961d648 MAESTRO: Create multi-stage Dockerfile, nginx.conf, and frontend/backend scaffolding
Three-stage Dockerfile: frontend-build (Node 20), api (Python 3.12 + uvicorn),
web (nginx 1.27). nginx.conf proxies /api and /ws to the API service with
WebSocket upgrade support. Includes backend/requirements.txt with all Python
deps, frontend scaffolding (Vite + React + TypeScript + Tailwind), and
placeholder alembic files for Docker COPY compatibility.
2026-04-07 01:44:52 -05:00
John Lightner
3c5fdace31 MAESTRO: Update docker-compose.yml with corrected XPLTD conventions
Fixed DATABASE_URL to use standard postgresql:// scheme, hardcoded DB
credentials for dev simplicity, added API_KEY pass-through, set worker
working_dir, and made JWT_SECRET optional with dev default. All 5 services:
db (:5434), redis, api (MCP :8401), worker (Celery), web (:8400).
2026-04-07 01:42:58 -05:00
John Lightner
4a0e4b6c65 MAESTRO: Add .env.example with all environment variables from spec
Includes all 13 env vars organized into 7 groups: Database, Redis,
Server, Auth, Default LLM Endpoint, Limits, Storage, and MCP.
Production-only variables are commented out; single-container defaults
work out of the box.
2026-04-07 01:41:22 -05:00
John Lightner
cb4af5f707 MAESTRO: Create full directory structure with placeholder files
Set up all directories from the spec's Project Structure section:
- backend/ with routers/, engine/adapters/, engine/scorers/, mcp/,
  websocket/, tests/ (all with __init__.py)
- frontend/src/ with pages/, components/, api/ (.gitkeep)
- docker/ (.gitkeep)
- alembic/versions/ (.gitkeep)
2026-04-07 01:40:27 -05:00