Custom SVG-based charts with no external dependencies. Scatter plot for score vs parameter value, bar chart for top N configs comparison, line chart for score progression over time. Interactive tooltips, click callbacks, chart type switching, dark mode support. 30 tests added.
55 lines
14 KiB
Markdown
55 lines
14 KiB
Markdown
# Phase 2b — Frontend Dashboard
|
|
|
|
Build the React frontend: setup wizard, experiment builder, real-time observability dashboard, and steering controls. The UI should feel dynamic, responsive, and fun to use — not like a boring admin panel.
|
|
|
|
- [x] Implement the Setup page (frontend/src/pages/Setup.tsx). This is the first-boot experience: detect via /api/v1/auth/me whether an admin exists. If not, show a clean setup form with username + password + confirm password. On submit, call /api/v1/auth/setup. Redirect to Dashboard on success. Make this feel welcoming — this is the user's first impression.
|
|
<!-- Implemented in SetupPage.tsx (existing file convention). Checks auth on mount, shows form with validation, handles 409/network errors, redirects on success. 9 tests added. -->
|
|
|
|
- [x] Implement the Login page (frontend/src/pages/Login.tsx). Simple form with username + password. Call /api/v1/auth/login, store JWT in memory (React context, NOT localStorage). Redirect to Dashboard. Include a subtle "guest access" link if the system has it enabled. Handle error states clearly.
|
|
<!-- Implemented in LoginPage.tsx (existing file convention). Username + password form, calls auth.login() which stores JWT in memory via client.ts, redirects to dashboard on success. Handles 401, server errors, and network errors. Includes guest access link and setup page link. 10 tests added. -->
|
|
|
|
- [x] Build the auth context provider (frontend/src/contexts/AuthContext.tsx). Manage JWT state, provide login/logout functions, expose current user info, handle token expiry with automatic redirect to login. Wrap the entire app in this provider.
|
|
<!-- AuthProvider wraps app in App.tsx. Manages JWT via client.ts token functions, validates session on mount via auth.me(), auto-redirects to /login on expired/missing token, exposes useAuth() hook with user/isAuthenticated/isLoading/login/logout. Public paths (/login, /setup) bypass redirect. 10 tests added. App.test.tsx updated for auth-aware routing. -->
|
|
|
|
- [x] Implement the Projects page (frontend/src/pages/Projects.tsx). Card grid showing all projects with name, description, experiment count, last activity timestamp, and a progress indicator showing best score across all experiments. Include a "New Project" button that opens a creation modal. Click a card to navigate to its experiments.
|
|
<!-- Implemented in ProjectsPage.tsx. Shows loading/error/empty states. Card grid with name, description, last activity timestamp, experiment & best score indicators. "New Project" button opens creation modal with name + description fields, validation, and error handling. Cards navigate to /experiments/:id on click. 12 tests added. -->
|
|
|
|
- [x] Implement the Experiment Builder (frontend/src/pages/Experiment.tsx). This is the most complex page. It has several sections: (1) Basic info (name, description), (2) Sample data input (paste text, upload file, or enter JSON), (3) Pipeline stage builder (add/remove stages, each with a prompt template editor with syntax highlighting, model selector dropdown populated from configured endpoints, and parameter controls), (4) Scoring config (checkboxes for which scorers to enable, weight sliders for each), (5) Parameter space definition (for each parameter, set type: fixed/range/options and values), (6) Action buttons: Save Draft, Run Single, Start Sweep.
|
|
<!-- Implemented in ExperimentPage.tsx (existing file convention). All six sections: BasicInfoSection (name + description), SampleDataSection (text/JSON/file upload modes with JSON validation), PipelineSection (add/remove stages, prompt template editor with variable hints and preview, model selector from endpoints), ScoringSection (5 scorers with enable checkboxes and weight sliders), ParameterSpaceSection (add/remove params, fixed/range/options types), ActionButtons (Save Draft, Run Single, Start Sweep). Supports new + edit modes via route param. Loads endpoints for model selectors. 20 tests added. App.test.tsx updated for new page behavior. -->
|
|
|
|
- [x] Build the prompt template editor component (frontend/src/components/PromptEditor.tsx). Use a code editor library (CodeMirror or Monaco, loaded from CDN). Support Jinja2 template syntax highlighting. Show available template variables in a sidebar (input_data, previous_stage_output, etc.). Include a "Preview" button that renders the template with sample data.
|
|
<!-- Implemented PromptEditor with transparent-textarea-over-highlighted-backdrop approach for Jinja2 syntax highlighting ({{ expressions }}, {% statements %}, {# comments #}). Variable sidebar with clickable insert buttons. Preview panel renders template with sample data substitution. Integrated into ExperimentPage's PipelineStageCard, replacing the old inline textarea. 27 tests added, existing ExperimentPage tests updated. -->
|
|
|
|
- [x] Build the model selector component (frontend/src/components/ModelSelector.tsx). Dropdown grouped by endpoint. Each option shows model name + endpoint label. Include a "refresh models" button that calls the endpoint test API to refresh available models. Show a connectivity indicator (green dot = reachable, red = error).
|
|
<!-- Implemented ModelSelector with optgroup-based endpoint grouping, each option showing model name + endpoint label. Refresh button tests all endpoints in parallel via endpoints.test() API, updates connectivity indicators (green=reachable, red=error, yellow=testing, gray=unknown). Integrated into ExperimentPage PipelineStageCard replacing the inline select. 16 tests added. -->
|
|
|
|
- [x] Implement the Live Observability page (frontend/src/pages/Live.tsx). This is the star of the show — the real-time dashboard during active sweeps. Layout: left column (60%) shows the activity timeline and current run details, right column (40%) shows the leaderboard and steering controls. Connect via WebSocket to /ws/experiments/{id}. Everything updates in real-time without page refresh.
|
|
<!-- Implemented in LivePage.tsx. 60/40 grid layout with: Activity Timeline (color-coded event cards for run.started/completed/failed, new_best_found, cache_hit; event filter dropdown; auto-scroll toggle), Leaderboard (sortable columns, best-run amber highlight, status badges), Steering Controls (pause/resume/stop with confirmation dialogs, progress bar, token/cost/cache-rate stats), WebSocket connection with exponential backoff reconnect and connection status indicator. 35 tests added. App.test.tsx updated. -->
|
|
|
|
- [x] Build the Leaderboard component (frontend/src/components/Leaderboard.tsx). Real-time ranked table of runs. Columns: rank, config summary (model + key params), individual scores, weighted total, status (completed/cached/running). Click a row to expand full details. Sortable by any column. New entries animate in smoothly. Highlight the current best with a subtle glow effect.
|
|
<!-- Extracted from LivePage's inline LeaderboardTable into standalone component. Ranked table with sortable columns (individual scores + weighted total). Click-to-expand detail panel shows scores breakdown with visual bars, run ID, status, duration, token counts, and full config JSON. Best run highlighted with amber background + subtle glow shadow. New entries animate in with slide-in keyframe animation (auto-removed after 600ms). StatusBadge handles completed/running/failed/cached/unknown states. LivePage updated to import from new component. 29 tests added. -->
|
|
|
|
- [x] Build the Activity Timeline component (frontend/src/components/Timeline.tsx). Chronological feed of events received via WebSocket. Each event is a card: run.started (blue), run.completed (green), new_best_found (gold), cache_hit (gray), run.failed (red). Include timestamps and key metrics. Auto-scroll to latest, with a "pause scroll" button. Filterable by event type.
|
|
<!-- Extracted from LivePage's inline implementation into standalone component. Color-coded event cards (blue=started, green=completed, amber=new_best, slate=cache_hit, red=failed, indigo=progress, emerald=sweep_done). Each card shows event label, formatted timestamp, message, and optional detail. Filter dropdown with 6 options (all/started/completed/new best/cache hits/failed). Auto-scroll toggle with visual state indicator. Handles empty states ("Waiting for events…" vs "No matching events"). Entry animation via CSS keyframes. LivePage updated to import and delegate to Timeline component. 33 tests added. -->
|
|
|
|
- [x] Build the Steering Controls component (frontend/src/components/SteeringControls.tsx). Buttons for: Pause (yellow, shows confirmation), Resume (green), Stop (red, shows confirmation), Fork (opens modal to create new experiment from current best), Export Best (dropdown: JSON/YAML/.env). Also show: progress bar (X of Y runs), token counter (running total), estimated cost, cache hit rate percentage, and estimated time remaining.
|
|
<!-- Extracted from LivePage's inline SteeringControls into standalone component. Pause (amber, confirmation dialog), Resume (green, direct action), Stop (red, confirmation dialog). Fork button opens modal to create new experiment from current config (fetches experiment, clones config, calls create). Export Best dropdown with JSON/YAML/.env options using downloadBlob helper. Progress bar with X/Y runs and percentage. Stats grid: token count, estimated cost, cache hit rate, status, and estimated time remaining (computed from elapsedSeconds prop). LivePage updated to import from standalone component. 33 tests added. -->
|
|
|
|
- [x] Build the Run Card component (frontend/src/components/RunCard.tsx). Expandable card showing: config summary, all scores with visual bars, prompt sent (collapsible), raw response (collapsible with copy button), timing breakdown per stage, cache status badge. Used in both the leaderboard detail view and the Compare page.
|
|
<!-- Implemented RunCard as expandable card with: header showing config_summary + CacheStatusBadge + duration; expandable detail with scores (ScoreBar visual bars per scorer), config JSON display, stage timing breakdown (model + latency + tokens per stage), collapsible prompt sections per stage, collapsible response sections per stage with copy-to-clipboard button, and metadata footer (run_id + total tokens). Uses CollapsibleSection sub-component for prompt/response sections. Supports defaultExpanded prop for use in Compare page. 26 tests added. -->
|
|
|
|
- [x] Implement the Compare page (frontend/src/pages/Compare.tsx). Side-by-side comparison of any two runs. Two columns, each with a run selector (dropdown or search). Show: config diff (highlight what changed), response diff (inline text diff with highlights), score comparison (bar chart overlay), and a "pick winner" button for human rating.
|
|
<!-- Implemented in ComparePage.tsx. Two-column run selectors with experiment→run cascading dropdowns (URL state synced via searchParams). Config diff with color-coded entries (same/changed/added/removed). Line-level LCS response diff with added/removed/same highlighting. Score comparison with overlaid indigo/emerald bars per scorer. Full RunCard detail view for each run side by side. "Pick Winner" buttons submit human_preference score via runs.score() API with metadata. Winner state resets on run change. App.test.tsx updated for new page behavior. 15 tests added (5 unit tests for diff helpers + 10 component integration tests). -->
|
|
|
|
- [x] Build the Score Chart component (frontend/src/components/ScoreChart.tsx). Multiple chart types: (1) scatter plot of score vs parameter value (e.g. score vs temperature), (2) bar chart comparing top N configs, (3) line chart showing score progression over time as sweep runs. Use a lightweight charting library (recharts via CDN).
|
|
<!-- Implemented ScoreChart with custom SVG-based charts (zero external dependencies, consistent with project's dependency-minimal approach). Three chart types: ScatterPlot (score vs parameter value with filtered points, grid lines, axis labels), BarChart (top N configs sorted by score, truncated labels, score annotations), LineChart (score progression with area gradient fill, timestamp-sorted, adaptive tick labels). All charts share: interactive hover tooltips, click-to-select callbacks, dark mode support, responsive SVG viewBox. Chart type selector allows switching between views at runtime. Handles edge cases: identical scores, negative values, single data point, missing paramValue. 30 tests added. -->
|
|
|
|
- [ ] Implement the Admin page (frontend/src/pages/Admin.tsx). Settings management: toggle guest access, manage API keys (generate/revoke), configure default endpoint, set token budgets, view system stats (total runs, cache entries, storage usage). Include a section for webhook management (list/create/delete).
|
|
|
|
- [ ] Implement the Dashboard page (frontend/src/pages/Dashboard.tsx). Landing page after login. Show: recent projects with activity, any actively running sweeps (with mini progress bars), global stats (total experiments, total runs, cache hit rate, tokens spent), and quick-action buttons (New Project, New Experiment).
|
|
|
|
- [ ] Build the WebSocket hook (frontend/src/hooks/useExperimentWS.ts). Custom React hook that manages WebSocket connection to /ws/experiments/{id}. Handles connect/disconnect/reconnect, parses incoming events, exposes typed event stream, and provides connection status. Reconnect with exponential backoff on disconnect.
|
|
|
|
- [ ] Style pass — go through every page and component ensuring consistent Tailwind usage, proper dark mode support (use Tailwind dark: prefix), responsive layout (works on tablet+), smooth transitions on state changes, and accessible form inputs. The UI should feel alive and dynamic, not static. Use subtle animations for new data arriving.
|
|
|
|
- [ ] Build a "Wizard" flow for first-time users (frontend/src/components/Wizard.tsx). A guided multi-step flow: (1) Configure your first LLM endpoint, (2) Paste some sample text, (3) Write a prompt, (4) Run it and see the result, (5) Congratulations, now try a sweep! This should be accessible from the Dashboard for new users.
|