Commit graph

17 commits

Author SHA1 Message Date
John Lightner
0f64dfbb02 MAESTRO: Implement webhook CRUD router, async dispatch with retry logic, and delivery logging
Full webhook system: CRUD endpoints (list/filter/get/create/update/delete),
WebhookDelivery model for delivery audit trail, dispatch engine with 3-attempt
retry and exponential backoff, Celery task integration with sync fallback,
and webhook firing hooks in runner.py and sweep.py event paths.
2026-04-07 03:41:04 -05:00
John Lightner
30fd15ec7a MAESTRO: Implement WebSocket connection manager with per-experiment routing, Redis pub/sub bridge, and message replay
- WebSocketManager in backend/websocket/manager.py with per-experiment and global subscriptions
- Redis pub/sub bridge (sync + async) broadcasting events to relevant WebSocket clients
- Deque-based replay buffers with since_ts/limit filtering for reconnection support
- Runtime subscribe/unsubscribe and stats API
- Enhanced /ws endpoint in main.py with subscribe/unsubscribe/replay actions
- 35 tests in test_ws_manager.py, all passing
2026-04-07 03:34:21 -05:00
John Lightner
e42117c8ee MAESTRO: Implement export router with JSON, .env, YAML, and markdown report endpoints
Four fully authenticated endpoints at /api/export/experiments/{id}/:
- /best: Returns best config as JSON with weighted score and metadata
- /env: Flattened KEY=VALUE format with metadata comments
- /yaml: Simple YAML serialization (no external dependency)
- /report: Full markdown report with config space, top N configs,
  score distributions, token usage, and timing stats

34 tests in test_export.py covering all endpoints, auth, 404s, and helpers.
Updated test_routers.py to expect 401 (auth required) instead of 501 (stub).
2026-04-07 03:30:45 -05:00
John Lightner
b3fb8e3063 MAESTRO: Implement runs router with full CRUD, filtering, scoring, and leaderboard
- List runs with filtering by experiment, status, and score range plus pagination
- Get run detail with eager-loaded stage results and scores
- Ad-hoc single run creation with Celery/sync dispatch
- Human scoring endpoint (POST /{id}/score)
- Leaderboard endpoint with configurable weighted scoring from experiment scoring_config
- Added AdHocRunCreate, LeaderboardEntry, LeaderboardResponse schemas
- 25 tests in test_runs.py, all passing (503 total tests passing)
2026-04-07 03:24:56 -05:00
John Lightner
82e97e9dba MAESTRO: Implement experiments router with full CRUD and sweep control endpoints
Add complete experiments API: list (with project filter), get, create, update,
delete, plus sweep lifecycle (start/pause/resume/stop/status). Adds
SweepRequest and SweepStatusResponse schemas. Sweep dispatch routes through
Celery with synchronous fallback for single-container mode. Redis flags control
pause/resume/stop; direct DB updates used when Redis unavailable. 34 tests.
2026-04-07 03:19:43 -05:00
John Lightner
35d72e7fa8 MAESTRO: Implement LLM endpoints router with CRUD, test_connection, and Fernet-encrypted API key storage
- Add LLMEndpoint model to models.py with encrypted api_key field
- Create encryption.py with Fernet symmetric encryption (key derived from JWT_SECRET via PBKDF2)
- Implement full endpoints router: list, get, create, update, delete + test_connection
- Test endpoint calls adapter.test_connection() and list_models()
- API keys never exposed in responses; has_api_key boolean flag added
- 25 tests in test_endpoints.py, all 444 tests passing
2026-04-07 03:13:52 -05:00
John Lightner
b16454994e MAESTRO: Implement Celery tasks (execute_run, execute_sweep) with synchronous fallback for single-container mode
Created engine/tasks.py with:
- execute_run and execute_sweep Celery tasks registered via autodiscover
- SyncTaskResult class mimicking Celery AsyncResult for in-process mode
- dispatch_run/dispatch_sweep helpers that route to Celery or sync based on config
- Proper async-to-sync bridging for the async engine functions
- 17 tests covering task execution, sync fallback, error handling, and Celery dispatch
2026-04-07 03:08:41 -05:00
John Lightner
fb78eac1b0 MAESTRO: Implement LLMJudgeScorer with configurable judge prompt, rating parsing, and response caching 2026-04-07 03:05:00 -05:00
John Lightner
0d5a6169c5 MAESTRO: Implement KeywordScorer with presence/absence keyword checking and ratio scoring 2026-04-07 03:02:40 -05:00
John Lightner
bc1d41e3a6 MAESTRO: Implement FormatScorer with json, markdown, length, and structure format checks
Adds format.py scorer supporting four validation modes:
- json: validates parseable JSON
- markdown: checks for headers (0.5) and lists (0.5)
- length: proportional scoring against min/max token bounds
- structure: JSON schema validation via jsonschema library

Includes 38 passing tests covering all format types, edge cases, and async delegation.
2026-04-07 03:00:56 -05:00
John Lightner
3cc1e22e3f MAESTRO: Implement EmbeddingScorer with cosine similarity scoring via OpenAI-compatible embedding API 2026-04-07 02:58:00 -05:00
John Lightner
405bbf8206 MAESTRO: Implement BaseScorer abstract class with sync/async scoring interface
Adds backend/engine/scorers/base.py with abstract name property, score() method,
and score_async() default implementation. Updates scorers __init__.py to export
BaseScorer. Includes 9 tests covering instantiation guards, sync/async dispatch,
context dict usage, and partial implementation rejection.
2026-04-07 02:55:05 -05:00
John Lightner
ba8cb7e2c6 MAESTRO: Implement sweep orchestration engine with grid, random, and guided sweep types
Adds backend/engine/sweep.py with three sweep strategies:
- GridSweep: exhaustive enumeration of all parameter combinations
- RandomSweep: N random samples from parameter ranges (list, min/max, step)
- GuidedSweep: top-K exploitation + random exploration from previous results

Features: bounded parallelism via asyncio.Semaphore, token budget enforcement,
Redis-based pause/resume/stop control flags, sweep-level event publishing.
36 tests in test_sweep.py covering config generation, helpers, and full sweep execution.
2026-04-07 02:53:30 -05:00
John Lightner
d607970f0c MAESTRO: Implement run execution engine with Jinja2 templating, caching, scoring, and event bus
Adds backend/engine/runner.py with run_single() that iterates pipeline stages,
renders Jinja2 prompt templates with stage history context, checks/stores response
cache, calls LLM adapters, runs configured scorers, creates StageResult and Score
records, and publishes progress events via Redis pub/sub or in-process EventBus.
Includes 21 passing tests covering all execution paths.
2026-04-07 02:48:20 -05:00
John Lightner
f60128604f MAESTRO: Implement ResponseCache layer with SHA-256 config hashing and hit-rate tracking 2026-04-07 02:37:58 -05:00
John Lightner
bf1e9d1c84 MAESTRO: Implement OpenAI-compatible LLM adapter with streaming, retries, and tests
Add OpenAICompatAdapter that works with any OpenAI-compatible API endpoint
(OpenWebUI, vLLM, Ollama, OpenAI, Anthropic via proxy). Features:
- Async HTTP calls via httpx with configurable timeout
- Chat completions format with system + user messages
- Token usage parsing from responses
- Exponential backoff retries (configurable, default 3 attempts)
- Both streaming (SSE) and non-streaming modes
- Model listing and connection testing
- 21 tests covering construction, request building, response parsing,
  retry logic, and error handling
2026-04-07 02:35:52 -05:00
John Lightner
9e0dc4e9fe MAESTRO: Implement BaseAdapter abstract class and AdapterResponse dataclass
Define the LLM adapter interface in backend/engine/adapters/base.py with
async methods complete(), list_models(), and test_connection(). The
AdapterResponse dataclass holds response text, token counts, latency,
model name, and raw metadata. Includes 11 tests covering instantiation
guards, concrete subclass behavior, and dataclass semantics.
2026-04-07 02:32:57 -05:00