feat: Created async search service with embedding+Qdrant+keyword fallba…

- "backend/search_service.py"
- "backend/schemas.py"
- "backend/routers/search.py"
- "backend/routers/techniques.py"
- "backend/routers/topics.py"
- "backend/routers/creators.py"
- "backend/main.py"

GSD-Task: S05/T01
This commit is contained in:
jlightner 2026-03-29 23:55:52 +00:00
parent 2cb0f9c381
commit c0df369018
20 changed files with 1935 additions and 13 deletions

View file

@ -13,3 +13,5 @@
| D005 | | architecture | Embedding/Qdrant failure handling strategy in pipeline | Embedding/Qdrant failures (stage 6) log errors but do not fail the pipeline. Processing_status is set by stages 2-5 only. Embeddings can be regenerated by manual re-trigger. | Qdrant is at 10.0.0.10 on the hypervisor network and may not be reachable during all pipeline runs. Making embedding a non-blocking side-effect ensures core pipeline output (KeyMoments, TechniquePages in PostgreSQL) is never lost due to vector store issues. The manual re-trigger endpoint allows regenerating embeddings at any time. | Yes | agent | | D005 | | architecture | Embedding/Qdrant failure handling strategy in pipeline | Embedding/Qdrant failures (stage 6) log errors but do not fail the pipeline. Processing_status is set by stages 2-5 only. Embeddings can be regenerated by manual re-trigger. | Qdrant is at 10.0.0.10 on the hypervisor network and may not be reachable during all pipeline runs. Making embedding a non-blocking side-effect ensures core pipeline output (KeyMoments, TechniquePages in PostgreSQL) is never lost due to vector store issues. The manual re-trigger endpoint allows regenerating embeddings at any time. | Yes | agent |
| D006 | | requirement | R013 Prompt Template System status | validated | 4 prompt template files in prompts/ directory loaded from configurable settings.prompts_path. Templates use XML-style content fencing. Pipeline stages read templates from disk at runtime, enabling edits without code changes. Manual re-trigger endpoint (POST /api/v1/pipeline/trigger/{video_id}) allows re-processing after prompt edits. | Yes | agent | | D006 | | requirement | R013 Prompt Template System status | validated | 4 prompt template files in prompts/ directory loaded from configurable settings.prompts_path. Templates use XML-style content fencing. Pipeline stages read templates from disk at runtime, enabling edits without code changes. Manual re-trigger endpoint (POST /api/v1/pipeline/trigger/{video_id}) allows re-processing after prompt edits. | Yes | agent |
| D007 | M001/S04 | architecture | Runtime review mode toggle persistence mechanism | Store review mode toggle in Redis key `chrysopedia:review_mode` with async redis client. Fall back to `settings.review_mode` config default when key is absent. | The config.py `review_mode` setting is loaded via lru_cache from environment variables and cannot be mutated at runtime. Redis is already used by the project (Celery broker, stage 4 classification data) so it adds no new infrastructure. A system_settings DB table would work but Redis is simpler for a single boolean toggle on a single-admin tool. The pipeline's stages.py reads settings.review_mode from config — the admin toggle only affects new pipeline runs if stages.py is updated to check Redis too, but that's deferred since the toggle is primarily a UI-level concept for the review queue. | Yes | agent | | D007 | M001/S04 | architecture | Runtime review mode toggle persistence mechanism | Store review mode toggle in Redis key `chrysopedia:review_mode` with async redis client. Fall back to `settings.review_mode` config default when key is absent. | The config.py `review_mode` setting is loaded via lru_cache from environment variables and cannot be mutated at runtime. Redis is already used by the project (Celery broker, stage 4 classification data) so it adds no new infrastructure. A system_settings DB table would work but Redis is simpler for a single boolean toggle on a single-admin tool. The pipeline's stages.py reads settings.review_mode from config — the admin toggle only affects new pipeline runs if stages.py is updated to check Redis too, but that's deferred since the toggle is primarily a UI-level concept for the review queue. | Yes | agent |
| D008 | M001/S04 | requirement | R004 Review Queue UI status | validated | All R004 capabilities delivered and verified: 9 API endpoints (approve, reject, edit, split, merge, queue list, stats, mode get/set) with 24 passing integration tests covering happy paths and error boundaries. React+TypeScript frontend with queue page (filter tabs, stats, pagination), moment detail page (all review actions with modals), and review-vs-auto mode toggle. Frontend builds with zero TypeScript errors. | Yes | agent |
| D009 | M001/S05 | architecture | Async search service pattern for FastAPI request path | Create a separate `SearchService` class using `openai.AsyncOpenAI` and `qdrant_client.AsyncQdrantClient` for the search endpoint. Keep existing sync `EmbeddingClient` and `QdrantManager` for Celery pipeline. Search endpoint has 300ms timeout on embedding API and falls back to SQL ILIKE keyword search on Qdrant/embedding failure. | The existing EmbeddingClient and QdrantManager are sync (using `openai.OpenAI` and `QdrantClient`) because Celery tasks run synchronously. FastAPI request handlers are async — reusing sync clients would block the event loop. Creating a thin async wrapper avoids modifying the battle-tested pipeline code while providing non-blocking search. The 300ms timeout and keyword fallback ensure the search endpoint always returns results, even when Qdrant or the embedding service is degraded. | Yes | agent |

View file

@ -42,6 +42,12 @@
**Fix:** Patch at the source module: `unittest.mock.patch('pipeline.stages.run_pipeline')`. The lazy import will pick up the mock from the source module. This applies to any handler that uses lazy imports to avoid circular dependencies at module load time. **Fix:** Patch at the source module: `unittest.mock.patch('pipeline.stages.run_pipeline')`. The lazy import will pick up the mock from the source module. This applies to any handler that uses lazy imports to avoid circular dependencies at module load time.
## Frontend detail page without a single-resource GET endpoint
**Context:** The review queue backend has `GET /review/queue` (list, paginated) but no `GET /review/moments/{id}` for fetching a single moment. The MomentDetail page needs to display one specific moment by ID from the URL params.
**Fix:** MomentDetail fetches the full queue with `limit=500` and filters client-side by ID. This works for a single-admin tool with small datasets but would not scale. If the moment count grows significantly, add a dedicated `GET /review/moments/{id}` endpoint to the review router.
## Stage 4 classification data stored in Redis (not DB columns) ## Stage 4 classification data stored in Redis (not DB columns)
**Context:** The KeyMoment SQLAlchemy model doesn't have `topic_tags` or `topic_category` columns. Stage 4 classification needs somewhere to store per-moment tag assignments that stage 5 can read. **Context:** The KeyMoment SQLAlchemy model doesn't have `topic_tags` or `topic_category` columns. Stage 4 classification needs somewhere to store per-moment tag assignments that stage 5 can read.

View file

@ -9,5 +9,5 @@ Stand up the complete Chrysopedia stack: Docker Compose deployment on ub01, Post
| S01 | Docker Compose + Database + Whisper Script | low | — | ✅ | docker compose up -d starts all services on ub01; Whisper script transcribes a sample video to JSON | | S01 | Docker Compose + Database + Whisper Script | low | — | ✅ | docker compose up -d starts all services on ub01; Whisper script transcribes a sample video to JSON |
| S02 | Transcript Ingestion API | low | S01 | ✅ | POST a transcript JSON file to the API; Creator and Source Video records appear in PostgreSQL | | S02 | Transcript Ingestion API | low | S01 | ✅ | POST a transcript JSON file to the API; Creator and Source Video records appear in PostgreSQL |
| S03 | LLM Extraction Pipeline + Qdrant Integration | high | S02 | ✅ | A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings. | | S03 | LLM Extraction Pipeline + Qdrant Integration | high | S02 | ✅ | A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings. |
| S04 | Review Queue Admin UI | medium | S03 | | Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode | | S04 | Review Queue Admin UI | medium | S03 | | Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode |
| S05 | Search-First Web UI | medium | S03 | ⬜ | User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links | | S05 | Search-First Web UI | medium | S03 | ⬜ | User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links |

View file

@ -0,0 +1,140 @@
---
id: S04
parent: M001
milestone: M001
provides:
- 9 review queue API endpoints mounted at /api/v1/review/*
- React+Vite+TypeScript frontend with admin UI at /admin/review
- Typed API client (frontend/src/api/client.ts) for all review endpoints
- Reusable StatusBadge and ModeToggle components
- Redis-backed review mode toggle with config fallback
- 24 integration tests for review endpoints
requires:
- slice: S03
provides: KeyMoment model with review_status field, pipeline that creates moments in DB
affects:
- S05
key_files:
- backend/routers/review.py
- backend/schemas.py
- backend/redis_client.py
- backend/main.py
- backend/tests/test_review.py
- frontend/package.json
- frontend/vite.config.ts
- frontend/tsconfig.json
- frontend/index.html
- frontend/src/main.tsx
- frontend/src/App.tsx
- frontend/src/App.css
- frontend/src/api/client.ts
- frontend/src/pages/ReviewQueue.tsx
- frontend/src/pages/MomentDetail.tsx
- frontend/src/components/StatusBadge.tsx
- frontend/src/components/ModeToggle.tsx
key_decisions:
- Redis mode toggle uses per-request get_redis() with aclose() — no connection pool (D007)
- API client uses bare fetch() with shared request() helper — no external HTTP library
- MomentDetail fetches full queue to find moment by ID since no single-moment GET endpoint exists
- Split creates new moment with '(split)' title suffix; merge combines summaries with double-newline separator
- Split dialog validates timestamp client-side before API call
patterns_established:
- React + Vite + TypeScript frontend pattern: strict TS config, Vite dev proxy to backend, typed API client with fetch()-based request helper
- Reusable component extraction (StatusBadge, ModeToggle) for consistent styling across admin pages
- Review router pattern: async SQLAlchemy with joined loads for cross-table data (moment + video + creator)
- Redis as runtime config store with config.py fallback for settings that need to be mutable at runtime
observability_surfaces:
- GET /api/v1/review/stats — returns pending/approved/edited/rejected counts
- GET /api/v1/review/mode — returns current review/auto mode
- INFO log on each review action (approve/reject/edit/split/merge) with moment_id
- 404 responses include moment_id not found; 400 responses include validation details
drill_down_paths:
- .gsd/milestones/M001/slices/S04/tasks/T01-SUMMARY.md
- .gsd/milestones/M001/slices/S04/tasks/T02-SUMMARY.md
- .gsd/milestones/M001/slices/S04/tasks/T03-SUMMARY.md
duration: ""
verification_result: passed
completed_at: 2026-03-29T23:35:54.561Z
blocker_discovered: false
---
# S04: Review Queue Admin UI
**Delivered the complete review queue admin UI: 9 backend API endpoints with 24 integration tests, a React+Vite+TypeScript frontend with typed API client, and full admin pages for queue browsing, moment review/edit/split/merge, and review-vs-auto mode toggle.**
## What Happened
This slice built the admin review interface across three tasks spanning backend API, frontend scaffold, and full UI implementation.
**T01 — Review Queue Backend API.** Added 8 Pydantic schemas and 9 async endpoints to a new `backend/routers/review.py`: list queue (paginated, status-filtered), stats (counts by status), approve, reject, edit (with review_status=edited), split (validates split_time within range), merge (validates same source video), and get/set review mode. Review mode is persisted in Redis (`chrysopedia:review_mode` key) with fallback to `settings.review_mode` config default. A `backend/redis_client.py` module provides `get_redis()` for per-request connections. The router is mounted in `main.py` under `/api/v1`. 24 integration tests cover happy paths, 404s, 400s for boundary conditions (split outside range, merge across videos, merge with self), and Redis mock tests for mode toggle (including error fallback).
**T02 — Frontend Scaffold.** Replaced the placeholder frontend with React 18 + Vite 6 + TypeScript 5.6. Created `vite.config.ts` with dev proxy (`/api``localhost:8001`), strict TypeScript config, BrowserRouter routes (`/admin/review` → ReviewQueue, `/admin/review/:momentId` → MomentDetail), and a fully typed `src/api/client.ts` with 9 exported functions and TypeScript interfaces matching the backend schemas.
**T03 — Admin UI Pages.** Built the full queue page with stats bar (counts per status), 5 filter tabs, paginated card list, and ModeToggle in header. Built the detail page with full moment display, action buttons (Approve/Reject navigate back, Edit toggles inline editing, Split opens modal with timestamp validation, Merge opens modal with same-video dropdown), and loading/error states. Created reusable StatusBadge and ModeToggle components. Comprehensive CSS with responsive layout.
## Verification
**All slice-level verification checks passed:**
1. `cd backend && python -m pytest tests/test_review.py -v` — 24/24 passed (11.4s)
2. `cd backend && python -m pytest tests/ -v` — 40/40 passed, zero regressions (133.5s)
3. `cd backend && python -c "from routers.review import router; print(len(router.routes))"` — prints 9
4. `cd frontend && npm run build && test -f dist/index.html` — build succeeds, dist/index.html exists
5. `cd frontend && npx tsc --noEmit` — zero TypeScript errors
6. `grep -q 'fetchQueue|approveMoment|getReviewMode' frontend/src/api/client.ts` — API client has all key functions
7. `grep -q 'StatusBadge|ModeToggle' frontend/src/pages/ReviewQueue.tsx` — components integrated
8. `grep -q 'approve|reject|split|merge' frontend/src/pages/MomentDetail.tsx` — all actions present
## Requirements Advanced
- R004 — All R004 capabilities delivered: approve, edit+approve, split, merge, reject actions via API and UI; mode toggle for review vs auto-publish; queue organized with status filters and stats
## Requirements Validated
- R004 — 24 backend integration tests verify all review actions (approve, reject, edit, split, merge) with correct status transitions, boundary validation (split_time range, same-video merge), and mode toggle. Frontend builds with TypeScript and renders queue list with filters, detail page with all action buttons/modals.
## New Requirements Surfaced
None.
## Requirements Invalidated or Re-scoped
None.
## Deviations
1. MomentDetail page fetches the full queue (limit=500) to find an individual moment by ID, since no dedicated single-moment GET endpoint was built. Acceptable for a single-admin tool.
2. T02 added `src/vite-env.d.ts` for Vite type declarations (not in plan, required by TypeScript).
3. ReviewQueue page fetches real data with loading/error states on mount instead of bare placeholder text (exceeded plan expectations).
## Known Limitations
1. No dedicated `GET /review/moments/{id}` endpoint — MomentDetail works around this by fetching the full queue. Fine for single-admin scale.
2. Redis mode toggle is a UI-level concept — the pipeline's `stages.py` still reads `settings.review_mode` from config. To fully enforce mode, stages.py would need a Redis check (deferred).
3. No authentication/authorization on review endpoints — acceptable for internal admin tool on private network.
## Follow-ups
1. Consider adding a `GET /review/moments/{id}` endpoint to avoid the queue-scan workaround in MomentDetail.
2. Wire the pipeline's `stages.py` to check Redis `chrysopedia:review_mode` so the toggle actually controls whether new moments are auto-approved or queued for review.
3. Add basic auth or API key protection to review endpoints before exposing beyond local network.
## Files Created/Modified
- `backend/routers/review.py` — New: 9 async review queue endpoints (354 lines)
- `backend/schemas.py` — Added 8 Pydantic schemas for review queue (ReviewQueueItem, ReviewQueueResponse, ReviewStatsResponse, MomentEditRequest, MomentSplitRequest, MomentMergeRequest, ReviewModeResponse, ReviewModeUpdate)
- `backend/redis_client.py` — New: async Redis client helper with get_redis()
- `backend/main.py` — Mounted review router under /api/v1
- `backend/tests/test_review.py` — New: 24 integration tests for review endpoints (495 lines)
- `frontend/package.json` — New: React 18 + Vite 6 + TypeScript 5.6 dependencies
- `frontend/vite.config.ts` — New: Vite config with React plugin and /api dev proxy
- `frontend/tsconfig.json` — New: strict TypeScript config
- `frontend/index.html` — New: Vite entry point
- `frontend/src/main.tsx` — New: React app entry with BrowserRouter
- `frontend/src/App.tsx` — New: App shell with routes and nav header
- `frontend/src/App.css` — New: Comprehensive admin CSS (620 lines)
- `frontend/src/api/client.ts` — New: Typed API client with 9 functions and TypeScript interfaces (187 lines)
- `frontend/src/pages/ReviewQueue.tsx` — New: Queue list page with stats bar, filter tabs, pagination, mode toggle
- `frontend/src/pages/MomentDetail.tsx` — New: Moment detail page with approve/reject/edit/split/merge actions (458 lines)
- `frontend/src/components/StatusBadge.tsx` — New: Reusable status badge with color coding
- `frontend/src/components/ModeToggle.tsx` — New: Review/auto mode toggle component

View file

@ -0,0 +1,93 @@
# S04: Review Queue Admin UI — UAT
**Milestone:** M001
**Written:** 2026-03-29T23:35:54.561Z
## UAT: S04 — Review Queue Admin UI
### Preconditions
- Backend running with PostgreSQL and Redis available
- At least one video processed through the pipeline (stages 2-5) with KeyMoments in DB
- Frontend built and served (or `npm run dev` for development)
---
### Test 1: Queue Page Loads with Stats
1. Navigate to `/admin/review`
2. **Expected:** Stats bar shows counts for Pending, Approved, Edited, Rejected
3. **Expected:** Queue cards display moment title, summary excerpt, video filename, creator name, status badge
4. **Expected:** Default filter shows Pending moments
### Test 2: Status Filter Tabs
1. On queue page, click "Approved" tab
2. **Expected:** Only moments with status=approved are shown
3. Click "All" tab
4. **Expected:** All moments across all statuses are shown
5. Click "Rejected" tab with no rejected moments
6. **Expected:** Empty state message displayed
### Test 3: Pagination
1. Ensure >20 moments exist in queue
2. Navigate to `/admin/review`
3. **Expected:** First 20 moments shown, "Next" button visible, "Previous" disabled
4. Click "Next"
5. **Expected:** Next page of moments shown, "Previous" now enabled
### Test 4: Approve Moment
1. Click a pending moment card to navigate to detail page
2. Verify moment displays: title, summary, content_type, timestamps (mm:ss format), status badge
3. Click "Approve" button
4. **Expected:** Navigated back to queue, moment now shows "approved" status badge
5. Stats bar pending count decreased by 1, approved count increased by 1
### Test 5: Reject Moment
1. Click a pending moment card
2. Click "Reject" button
3. **Expected:** Navigated back to queue, moment now shows "rejected" status badge
### Test 6: Edit Moment
1. Click a pending moment card
2. Click "Edit" button
3. **Expected:** Title, summary, content_type fields become editable inline
4. Change the title to "Edited Test Title"
5. Click "Save"
6. **Expected:** Moment title updated, status changes to "edited"
7. Click "Cancel" during edit
8. **Expected:** Fields revert to original values
### Test 7: Split Moment
1. Click a moment with start_time=10.0, end_time=60.0
2. Click "Split" button
3. **Expected:** Modal opens with timestamp input field
4. Enter split time: 35.0, click "Split"
5. **Expected:** Two moments created: [10.0, 35.0) and [35.0, 60.0]. Second has "(split)" suffix in title.
6. Retry with split_time=5.0 (below start_time)
7. **Expected:** Error message — split time must be between start and end
### Test 8: Merge Moments
1. Click a moment from video "example.mp4"
2. Click "Merge" button
3. **Expected:** Modal opens with dropdown showing other moments from same video
4. Select a target moment, click "Merge"
5. **Expected:** Merged moment has combined summary, min(start_time), max(end_time). Target moment deleted.
### Test 9: Mode Toggle
1. On queue page, observe mode toggle in header
2. **Expected:** Shows current mode with colored dot indicator (green=review, amber=auto)
3. Click toggle to switch mode
4. **Expected:** Mode changes, API confirms new mode via GET /api/v1/review/mode
5. Refresh page
6. **Expected:** Mode persists (stored in Redis)
### Test 10: Error Handling
1. Navigate to `/admin/review/99999` (nonexistent moment)
2. **Expected:** Error state displayed (moment not found)
3. Attempt to merge moments from different source videos via API
4. **Expected:** 400 response with validation error message
### Edge Cases
- **Empty queue:** Navigate to `/admin/review` with no moments — empty state message shown
- **Split at boundary:** Try split_time = start_time or end_time — 400 error returned
- **Merge with self:** Try merging moment with itself — 400 error returned
- **Redis down:** Mode toggle falls back to config default; mode set returns 503
- **Concurrent actions:** Approve then immediately approve again — idempotent, no error

View file

@ -0,0 +1,36 @@
{
"schemaVersion": 1,
"taskId": "T03",
"unitId": "M001/S04/T03",
"timestamp": 1774826941339,
"passed": false,
"discoverySource": "task-plan",
"checks": [
{
"command": "cd frontend",
"exitCode": 0,
"durationMs": 4,
"verdict": "pass"
},
{
"command": "npm run build",
"exitCode": 254,
"durationMs": 89,
"verdict": "fail"
},
{
"command": "test -f dist/index.html",
"exitCode": 1,
"durationMs": 6,
"verdict": "fail"
},
{
"command": "npx tsc --noEmit",
"exitCode": 1,
"durationMs": 758,
"verdict": "fail"
}
],
"retryAttempt": 1,
"maxRetries": 2
}

View file

@ -1,6 +1,276 @@
# S05: Search-First Web UI # S05: Search-First Web UI
**Goal:** Complete public-facing UI: landing page, live search, technique pages, creators browse, topics browse **Goal:** User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links. Browse pages for creators (randomized default sort) and topics (two-level hierarchy) are navigable.
**Demo:** After this: User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links **Demo:** After this: User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links
## Tasks ## Tasks
- [x] **T01: Created async search service with embedding+Qdrant+keyword fallback and all public API endpoints (search, techniques, topics, enhanced creators) mounted at /api/v1** — ## Description
Create the backend API surface for S05: the async search service (embedding + Qdrant), search endpoint, technique pages CRUD, topics hierarchy, and enhanced creators endpoint. This is the highest-risk task because it introduces async embedding/Qdrant clients for the FastAPI request path (existing ones are sync for Celery).
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| Embedding API (AsyncOpenAI) | Fall back to keyword-only search | 300ms timeout → keyword fallback | Return empty vectors → keyword fallback |
| Qdrant (AsyncQdrantClient) | Fall back to keyword-only search | 300ms timeout → keyword fallback | Log warning, return empty results → keyword fallback |
| PostgreSQL | Return 500 (standard FastAPI error handling) | Connection pool timeout → 500 | N/A (SQLAlchemy typed) |
## Load Profile
- **Shared resources**: AsyncQdrantClient connection pool, AsyncOpenAI HTTP pool, SQLAlchemy async session pool
- **Per-operation cost**: Search = 1 embedding API call + 1 Qdrant query + 1-3 SQL queries for enrichment. Read endpoints = 1-2 SQL queries each.
- **10x breakpoint**: Embedding API rate limiting (external dependency). Mitigated by client-side debounce (300ms) reducing request rate.
## Negative Tests
- **Malformed inputs**: Empty search query → return empty results. Query > 500 chars → truncate to 500. Invalid scope parameter → default to 'all'.
- **Error paths**: Embedding API unreachable → keyword fallback. Qdrant unreachable → keyword fallback. Invalid slug → 404.
- **Boundary conditions**: Empty Qdrant collection → keyword-only results. Zero matching techniques/creators → empty list.
## Steps
1. Create `backend/search_service.py` with `SearchService` class:
- `__init__` takes Settings, creates `openai.AsyncOpenAI` client and `qdrant_client.AsyncQdrantClient`
- `async embed_query(text: str) -> list[float] | None` — embeds query text with 300ms timeout, returns None on failure
- `async search_qdrant(vector: list[float], limit: int, type_filter: str | None) -> list[dict]` — queries Qdrant with optional payload type filter, returns scored results with payloads
- `async keyword_search(query: str, scope: str, limit: int, db: AsyncSession) -> list[dict]` — ILIKE search on technique_pages.title, key_moments.title, creators.name
- `async search(query: str, scope: str, limit: int, db: AsyncSession) -> dict` — orchestrates: embed → Qdrant → enrich with DB metadata → fallback to keyword if needed
2. Add new Pydantic response schemas to `backend/schemas.py`:
- `SearchResultItem(title, slug, type, score, summary, creator_name, creator_slug, topic_category, topic_tags)`
- `SearchResponse(items: list[SearchResultItem], total: int, query: str, fallback_used: bool)`
- `TechniquePageDetail` (extends TechniquePageRead with nested key_moments, creator info, related links)
- `TopicCategory(name, description, sub_topics: list[TopicSubTopic])` and `TopicSubTopic(name, technique_count, creator_count)`
- `CreatorBrowseItem` (extends CreatorRead with technique_count, video_count)
3. Create `backend/routers/search.py`:
- `GET /search?q=...&scope=all|topics|creators&limit=20`
- Instantiate SearchService from get_settings(), call search(), return SearchResponse
- Log query, latency_ms, result_count, fallback_used at INFO level
4. Create `backend/routers/techniques.py`:
- `GET /techniques` — list technique pages with optional `category`, `creator_slug` query filters, pagination
- `GET /techniques/{slug}` — full detail with eager-loaded key_moments (ordered by start_time), creator info, outgoing+incoming related links
- Return 404 for unknown slug
5. Create `backend/routers/topics.py`:
- `GET /topics` — load `canonical_tags.yaml`, for each category aggregate technique_count and creator_count per sub_topic from DB
- `GET /topics/{category_slug}` — return technique pages filtered by topic_category
6. Extend `backend/routers/creators.py`:
- Add `sort` query param: `random` (default), `alpha`, `views`
- Add `genre` query param for filtering by genre
- Add technique_count and video_count subqueries to list endpoint
- For `sort=random`, use `func.random()` ORDER BY (dataset is small, <100 creators)
7. Mount all new routers in `backend/main.py`:
- `from routers import search, techniques, topics`
- `app.include_router(search.router, prefix="/api/v1")`
- `app.include_router(techniques.router, prefix="/api/v1")`
- `app.include_router(topics.router, prefix="/api/v1")`
## Must-Haves
- [ ] SearchService with async embedding + Qdrant + keyword fallback
- [ ] GET /api/v1/search returns SearchResponse with enriched results
- [ ] GET /api/v1/techniques and GET /api/v1/techniques/{slug} with full detail
- [ ] GET /api/v1/topics returns category hierarchy with counts
- [ ] GET /api/v1/creators supports sort=random (default), genre filter, technique/video counts
- [ ] All new routers mounted in main.py
- [ ] Embedding/Qdrant failures gracefully degrade to keyword search
## Verification
- `cd backend && python -c "from search_service import SearchService; print('OK')"` — imports clean
- `cd backend && python -c "from routers.search import router; print(router.routes)"` — search router has routes
- `cd backend && python -c "from routers.techniques import router; print(router.routes)"` — techniques router has routes
- `cd backend && python -c "from routers.topics import router; print(router.routes)"` — topics router has routes
- `cd backend && python -c "from main import app; routes = [r.path for r in app.routes]; assert '/api/v1/search' in str(routes) or any('search' in str(r.path) for r in app.routes); print('Mounted')"` — routers mounted
## Observability Impact
- Signals added: INFO log per search query with latency_ms, result_count, fallback_used. WARNING on embedding/Qdrant failure with error details.
- How a future agent inspects this: `curl localhost:8001/api/v1/search?q=test` returns structured JSON with timing data
- Failure state exposed: fallback_used=true in search response indicates Qdrant/embedding degradation
- Estimate: 2h
- Files: backend/search_service.py, backend/schemas.py, backend/routers/search.py, backend/routers/techniques.py, backend/routers/topics.py, backend/routers/creators.py, backend/main.py
- Verify: cd backend && python -c "from search_service import SearchService; from routers.search import router as sr; from routers.techniques import router as tr; from routers.topics import router as tpr; print('All imports OK')" && python -c "from main import app; print([r.path for r in app.routes])"
- [ ] **T02: Add integration tests for search and public API endpoints** — ## Description
Write integration tests for all new S05 backend endpoints: search (with mocked embedding API and Qdrant), techniques list/detail, topics hierarchy, and enhanced creators (randomized sort, genre filter, counts). Tests run against real PostgreSQL with the existing conftest.py fixtures. All 40 existing tests must continue to pass.
## Negative Tests
- **Malformed inputs**: Empty search query returns empty results. Invalid technique slug returns 404. Invalid topic category returns empty list.
- **Error paths**: Search with mocked embedding failure → keyword fallback results returned. Search with mocked Qdrant failure → keyword fallback.
- **Boundary conditions**: Search with no matching results → empty items list. Topics with no technique pages → zero counts. Creators list with no creators → empty list.
## Steps
1. Create `backend/tests/test_search.py`:
- Fixture: seed DB with 2 creators, 3 technique pages (different categories/tags), 5 key moments
- Test search endpoint with mocked SearchService that returns canned results → verify response shape (items, total, query, fallback_used)
- Test search with empty query → returns empty results or validation error
- Test search keyword fallback: mock embedding to return None → verify keyword results returned and fallback_used=true
- Test search scope filtering (scope=topics returns only technique_page type results)
2. Create `backend/tests/test_public_api.py`:
- Test GET /api/v1/techniques — returns list of technique pages, supports category filter
- Test GET /api/v1/techniques/{slug} — returns full detail with key_moments, creator info, related links
- Test GET /api/v1/techniques/{slug} with invalid slug → 404
- Test GET /api/v1/topics — returns category hierarchy with counts matching seeded data
- Test GET /api/v1/creators?sort=random — returns creators (verify all returned, order may vary)
- Test GET /api/v1/creators?sort=alpha — returns creators in alphabetical order
- Test GET /api/v1/creators?genre=Bass+music — returns only matching creators
- Test GET /api/v1/creators/{slug} — returns detail with technique_count, video_count
3. Run full test suite: `cd backend && python -m pytest tests/ -v` — all 40 existing + new tests pass
## Must-Haves
- [ ] test_search.py with ≥4 tests covering happy path, empty query, keyword fallback, scope filter
- [ ] test_public_api.py with ≥8 tests covering techniques list/detail/404, topics hierarchy, creators sort/filter/detail
- [ ] All 40 existing tests still pass (regression)
- [ ] Tests use real PostgreSQL with seeded data (not mocked DB)
## Verification
- `cd backend && python -m pytest tests/test_search.py tests/test_public_api.py -v` — all new tests pass
- `cd backend && python -m pytest tests/ -v` — all tests pass (40 existing + new)
- Estimate: 1.5h
- Files: backend/tests/test_search.py, backend/tests/test_public_api.py, backend/tests/conftest.py
- Verify: cd backend && python -m pytest tests/test_search.py tests/test_public_api.py -v && python -m pytest tests/ -v
- [ ] **T03: Build frontend search flow: landing page, search results, and technique page** — ## Description
Build the primary user flow: landing page with search bar → search results page → technique page detail. This is the R005/R006/R015 critical path. Includes the new typed API client for public endpoints, App.tsx routing with both admin and public routes, and 3 new page components with CSS.
The frontend uses React 18 + Vite + TypeScript with strict mode (`noUnusedLocals`, `noUnusedParameters`, `noUncheckedIndexedAccess`). Existing pattern: plain CSS in `App.css`, typed `fetch()` API client, React Router v6.
## Steps
1. Create `frontend/src/api/public-client.ts` — typed API client for public endpoints:
- Types: `SearchResultItem`, `SearchResponse`, `TechniquePageDetail`, `KeyMomentSummary`, `CreatorInfo`, `RelatedLink`, `TopicCategory`, `TopicSubTopic`, `CreatorBrowseItem`
- Functions: `searchApi(q, scope?, limit?)`, `fetchTechnique(slug)`, `fetchTechniques(params?)`, `fetchTopics()`, `fetchCreators(params?)`, `fetchCreator(slug)`
- Reuse the `request<T>` helper pattern from existing `client.ts` (or extract shared helper)
2. Create `frontend/src/pages/Home.tsx` — landing page:
- Prominent search bar (auto-focus on mount) with debounce (300ms)
- Live typeahead: after 2+ chars, show top 5 results in dropdown below search bar
- On Enter or "See all results" link, navigate to `/search?q=...`
- Two navigation cards: "Topics" (links to `/topics`) and "Creators" (links to `/creators`)
- "Recently Added" section showing last 5 technique pages (fetch from `/api/v1/techniques?limit=5`)
3. Create `frontend/src/pages/SearchResults.tsx` — full search results page:
- Read `q` from URL search params
- Display results grouped by type (technique_pages first, then key_moments)
- Each result: title (linked to technique page), summary snippet, creator name, category/tags
- Show "No results found" for empty results, "Showing keyword results" when fallback_used=true
- Search bar at top for refining query
4. Create `frontend/src/pages/TechniquePage.tsx` — technique page display (R006):
- Fetch technique by slug from URL params via `fetchTechnique(slug)`
- Header: title, topic_category badge, topic_tags pills, creator name (linked to `/creators/{slug}`), source_quality indicator
- Amber banner if source_quality === 'unstructured' (livestream-sourced)
- Study guide prose: render `body_sections` JSONB — iterate Object.entries, render each section as `<h2>` + paragraph. Handle both string and object values gracefully.
- Key moments index: ordered list with title, start_time→end_time, content_type badge, summary
- Signal chains section (if present): render each chain as name + ordered steps
- Plugins referenced (if present): pill list
- Related techniques (if present): linked list
- Loading state and 404 error state
5. Update `frontend/src/App.tsx` routing:
- Import new pages
- Add public routes: `/` → Home, `/search` → SearchResults, `/techniques/:slug` → TechniquePage
- Keep admin routes at `/admin/*`
- Update header: "Chrysopedia" title (not "Chrysopedia Admin"), nav links to Home, Topics, Creators, and Admin
6. Add CSS to `frontend/src/App.css` for new pages:
- Search bar styles (large, centered on home, inline on results page)
- Typeahead dropdown styles
- Navigation cards (grid layout)
- Technique page layout (readable prose width, section spacing)
- Search result items (hover state, meta info)
- Tag/badge pill styles
- Loading and error states
## Must-Haves
- [ ] Typed public API client with all endpoint functions
- [ ] Landing page with search bar, typeahead, navigation cards, recently added
- [ ] Search results page with grouped results
- [ ] Technique page with all sections (header, prose, key moments, related links)
- [ ] App.tsx routes both public and admin paths
- [ ] `cd frontend && npx tsc -b` passes with zero errors
## Verification
- `cd frontend && npx tsc -b` — zero TypeScript errors
- `cd frontend && npm run build` — clean production build
- Verify files exist: `test -f frontend/src/api/public-client.ts && test -f frontend/src/pages/Home.tsx && test -f frontend/src/pages/SearchResults.tsx && test -f frontend/src/pages/TechniquePage.tsx`
- Estimate: 2h
- Files: frontend/src/api/public-client.ts, frontend/src/pages/Home.tsx, frontend/src/pages/SearchResults.tsx, frontend/src/pages/TechniquePage.tsx, frontend/src/App.tsx, frontend/src/App.css
- Verify: cd frontend && npx tsc -b && npm run build && echo 'Frontend build OK'
- [ ] **T04: Build frontend browse pages (creators, topics) and verify full build** — ## Description
Build the remaining browse pages: CreatorsBrowse (R007, R014 creator equity with randomized default sort), CreatorDetail, and TopicsBrowse (R008 two-level hierarchy). Then run final verification to confirm the full frontend builds cleanly and all requirements are covered.
## Steps
1. Create `frontend/src/pages/CreatorsBrowse.tsx` — creators browse page (R007, R014):
- Fetch creators from `/api/v1/creators?sort=random` (default) via `fetchCreators()`
- Genre filter pills at top (fetch unique genres from creator data, or hardcode from canonical_tags.yaml genres)
- Type-to-narrow input that filters displayed creators client-side by name
- Sort toggle: Random (default), Alphabetical, Views — each triggers re-fetch with `sort=random|alpha|views`
- Each creator row: name, genre tags (pills), technique_count, video_count, view_count
- Click row → navigate to `/creators/{slug}`
- All creators get equal visual weight (no featured/highlighted creators) per R014
2. Create `frontend/src/pages/CreatorDetail.tsx` — creator detail page:
- Fetch creator by slug via `fetchCreator(slug)` — shows name, genres, video_count, technique_count
- Fetch creator's technique pages via `fetchTechniques({ creator_slug: slug })`
- List technique pages with title (linked to `/techniques/{slug}`), category, tags, summary
- Loading state and 404 error state
3. Create `frontend/src/pages/TopicsBrowse.tsx` — topics browse page (R008):
- Fetch topics from `/api/v1/topics` via `fetchTopics()`
- Two-level hierarchy: 6 top-level categories (Sound design, Mixing, Synthesis, Arrangement, Workflow, Mastering)
- Each category expandable/collapsible, showing sub-topics
- Each sub-topic shows technique_count and creator_count
- Click sub-topic → navigate to `/search?q={sub_topic_name}&scope=topics` or filter technique list
- Filter input at top to narrow categories/sub-topics
4. Update `frontend/src/App.tsx` — add routes for browse pages:
- `/creators` → CreatorsBrowse
- `/creators/:slug` → CreatorDetail
- `/topics` → TopicsBrowse
- Import new page components
5. Add CSS to `frontend/src/App.css` for browse pages:
- Creator list styles (rows, genre pills, counts)
- Creator detail page layout
- Topics hierarchy styles (collapsible sections, sub-topic rows, counts)
- Filter input styles
- Sort toggle button group
6. Final verification:
- `cd frontend && npx tsc -b` — zero errors
- `cd frontend && npm run build` — clean build
- Verify all 6 page files exist
## Must-Haves
- [ ] Creators browse page with randomized default sort (R014), genre filter, type-to-narrow, sort toggle (R007)
- [ ] Creator detail page showing creator info and their technique pages
- [ ] Topics browse page with two-level hierarchy, counts, clickable sub-topics (R008)
- [ ] All routes registered in App.tsx
- [ ] `cd frontend && npx tsc -b` passes with zero errors
- [ ] `cd frontend && npm run build` succeeds
## Verification
- `cd frontend && npx tsc -b && npm run build && echo 'Build OK'`
- `test -f frontend/src/pages/CreatorsBrowse.tsx && test -f frontend/src/pages/CreatorDetail.tsx && test -f frontend/src/pages/TopicsBrowse.tsx && echo 'All pages exist'`
- Estimate: 1.5h
- Files: frontend/src/pages/CreatorsBrowse.tsx, frontend/src/pages/CreatorDetail.tsx, frontend/src/pages/TopicsBrowse.tsx, frontend/src/App.tsx, frontend/src/App.css
- Verify: cd frontend && npx tsc -b && npm run build && test -f src/pages/CreatorsBrowse.tsx && test -f src/pages/CreatorDetail.tsx && test -f src/pages/TopicsBrowse.tsx && echo 'All browse pages built OK'

View file

@ -0,0 +1,101 @@
# S05 — Search-First Web UI — Research
**Date:** 2026-03-29
## Summary
S05 builds the public-facing web UI for Chrysopedia — the search-first landing page, technique page display, creators browse, and topics browse. This is a medium-complexity frontend + backend slice that wires together existing infrastructure (Qdrant vectors, PostgreSQL models, React+Vite+TypeScript frontend) into new pages and API endpoints. The backend needs a search endpoint that embeds query text via the embedding API, queries Qdrant for semantic results, and enriches them with DB metadata. The frontend needs 6 new pages/views and a new API client module.
The riskiest piece is the search endpoint: it requires an async embedding call (to convert query text to a vector) followed by an async Qdrant search, both of which must complete within 500ms (R015). The existing `EmbeddingClient` is sync (designed for Celery), so the search endpoint needs an async variant using `openai.AsyncOpenAI` and `AsyncQdrantClient`. Everything else — technique page display, creators browse, topics browse — is standard CRUD over existing DB models.
The frontend is a React 18 + Vite + TypeScript SPA. S04 established the pattern: typed API client with `fetch()`, React Router v6, plain CSS (no Tailwind/UI library). S05 extends this with new routes and pages. The existing `App.tsx` routes everything to `/admin/*` — S05 adds the public UI at root paths (`/`, `/search`, `/techniques/:slug`, `/creators`, `/creators/:slug`, `/topics`).
## Recommendation
Build backend-first: start with the search API endpoint (highest risk — requires async embedding + Qdrant integration), then the read-only data endpoints (technique pages, creators list with counts, topics hierarchy), then the frontend pages. The search endpoint is the critical path — everything else is well-understood CRUD.
Use `AsyncQdrantClient` from `qdrant-client` (already in requirements.txt, same package provides both sync and async) and `openai.AsyncOpenAI` (already in requirements.txt) for the search endpoint. Create a new `backend/routers/search.py` for the search API and new Pydantic response schemas. Keep the existing sync `EmbeddingClient` and `QdrantManager` for Celery — the async search service is a separate, thin layer for the FastAPI request path.
For keyword fallback (R005 says "semantic where possible, with keyword fallback"), use PostgreSQL `ILIKE` queries on technique_page.title, key_moment.title, and creator.name. This handles the case where the embedding service is unavailable or returns empty results.
## Implementation Landscape
### Key Files
**Backend — existing (read, extend):**
- `backend/config.py` — Settings with `qdrant_url`, `qdrant_collection`, `embedding_api_url`, `embedding_model`, `embedding_dimensions`. No changes needed.
- `backend/models.py` — All 7 models already exist: Creator, SourceVideo, TranscriptSegment, KeyMoment, TechniquePage, RelatedTechniqueLink, Tag. TechniquePage has `body_sections` (JSONB), `signal_chains` (JSONB), `topic_tags` (ARRAY), `topic_category`, `plugins` (ARRAY). No schema changes needed.
- `backend/schemas.py` — Has `CreatorRead`, `TechniquePageRead`, `KeyMomentRead`. Needs new response schemas for search results and enriched technique page detail.
- `backend/database.py` — Async engine + `get_session` dependency. Used as-is.
- `backend/main.py` — Mount new routers here (search, techniques, topics).
- `backend/routers/creators.py` — Has `list_creators` (alphabetical order) and `get_creator` (by slug). Needs extension: randomized sort, genre filter, technique count, video count per creator.
- `backend/pipeline/qdrant_client.py` — Sync `QdrantManager` for Celery pipeline write-path. Read-path (search) needs new async code.
- `backend/pipeline/embedding_client.py` — Sync `EmbeddingClient`. Search needs async variant.
**Backend — new files to create:**
- `backend/routers/search.py``GET /api/v1/search?q=...&scope=all|topics|creators&limit=20` — orchestrates async embedding + Qdrant search + keyword fallback + DB enrichment.
- `backend/routers/techniques.py``GET /api/v1/techniques` (list, filterable by category/creator), `GET /api/v1/techniques/{slug}` (full detail with key_moments, related links, creator info).
- `backend/routers/topics.py``GET /api/v1/topics` (hierarchy: top-level categories with sub-topics, counts of technique pages and creators per sub-topic). Reads from `canonical_tags.yaml` + DB aggregation.
- `backend/search_service.py` — Async search service class: `AsyncOpenAI` for embedding, `AsyncQdrantClient` for vector search. Thin wrapper, initialized from Settings.
**Frontend — existing (modify):**
- `frontend/src/App.tsx` — Add new routes for public pages (`/`, `/search`, `/techniques/:slug`, `/creators`, `/creators/:slug`, `/topics`). Keep admin routes as-is.
- `frontend/src/App.css` — Extend with styles for new pages (search bar, technique page, browse lists).
- `frontend/src/api/client.ts` — Extend (or create parallel `search-client.ts`) with typed functions for search, techniques, creators, topics endpoints.
**Frontend — new files to create:**
- `frontend/src/pages/Home.tsx` — Landing page: search bar, Topics card, Creators card, recently added section.
- `frontend/src/pages/SearchResults.tsx` — Full search results page after Enter or "See all results".
- `frontend/src/pages/TechniquePage.tsx` — Full technique page display (R006): header, study guide prose, key moments index, related techniques, plugins.
- `frontend/src/pages/CreatorsBrowse.tsx` — Creators list (R007): genre filter pills, type-to-narrow, sort toggle (random/alpha/views), creator rows with counts.
- `frontend/src/pages/CreatorDetail.tsx` — Creator's technique page list.
- `frontend/src/pages/TopicsBrowse.tsx` — Two-level topic hierarchy (R008): top-level categories → sub-topics with counts.
### Build Order
1. **Search service + search endpoint** (highest risk, unblocks frontend search). Create `backend/search_service.py` with async embedding + Qdrant query. Create `backend/routers/search.py`. This proves the semantic search path works end-to-end. Verify with `curl`.
2. **Technique page + topics + creators API endpoints** (unblocks all frontend pages). Create `backend/routers/techniques.py`, `backend/routers/topics.py`, extend `backend/routers/creators.py`. Add enriched response schemas to `backend/schemas.py`. Mount all new routers in `main.py`.
3. **Frontend: Landing page + search + navigation shell** (proves the primary interaction). Build `Home.tsx` with search bar and live typeahead, `SearchResults.tsx` for full results. This is the highest-value UI — it's the R015 30-second retrieval path.
4. **Frontend: Technique page display** (R006 — the core content unit). Build `TechniquePage.tsx` rendering all sections: header with tags, study guide prose from `body_sections` JSONB, key moments index, related links, plugins.
5. **Frontend: Browse pages** (R007, R008, R014). Build `CreatorsBrowse.tsx` (randomized default sort for R014), `CreatorDetail.tsx`, `TopicsBrowse.tsx`.
### Verification Approach
**Backend verification:**
- `cd backend && python -m pytest tests/ -v` — all existing 40 tests still pass (regression).
- New integration tests for search endpoint: mock the embedding API and Qdrant, verify response shape and <500ms timing.
- New integration tests for techniques/topics/creators endpoints: verify data shape, pagination, filtering.
- `curl` smoke tests against running API: `GET /api/v1/search?q=snare`, `GET /api/v1/techniques`, `GET /api/v1/topics`, `GET /api/v1/creators?sort=random`.
**Frontend verification:**
- `cd frontend && npx tsc -b` — zero TypeScript errors.
- `cd frontend && npm run build` — clean production build.
- Manual browser verification: navigate to each page, confirm data renders, search returns results, technique page displays all sections.
## Constraints
- **Existing frontend uses plain CSS, no UI library.** S04 established this pattern — continue with plain CSS. Adding Tailwind or shadcn/ui would be a divergence. The existing `App.css` has ~350 lines of well-structured styles.
- **EmbeddingClient is sync (Celery).** The search endpoint runs in async FastAPI — cannot reuse `EmbeddingClient` directly. Must create async embedding using `openai.AsyncOpenAI`.
- **QdrantManager is sync (Celery).** Same issue — use `AsyncQdrantClient` for the search read path.
- **Qdrant payloads have limited metadata.** Technique page payloads contain: `type`, `page_id`, `creator_id`, `title`, `topic_category`, `topic_tags`, `summary`. Key moment payloads contain: `type`, `moment_id`, `source_video_id`, `title`, `start_time`, `end_time`, `content_type`. The search endpoint must enrich results with DB data (creator names, slugs, etc.) after Qdrant returns IDs.
- **Stage 4 classification in Redis, not DB.** The `KeyMoment` model lacks `topic_tags`/`topic_category` columns. However, `TechniquePage` has both. For search, technique pages are the primary entity — this is fine.
- **No existing single-resource endpoints for technique pages or key moments.** These need to be created.
- **TypeScript strict mode**`noUnusedLocals`, `noUnusedParameters`, `noUncheckedIndexedAccess` are all enabled. Frontend code must be clean.
## Common Pitfalls
- **Async embedding timeout in search path.** The embedding API call adds latency to every search request. If the embedding service is slow (>200ms), the 500ms target becomes tight. Mitigation: set a short timeout (300ms) on the embedding call and fall back to keyword search on timeout.
- **Qdrant connection failure in search.** Unlike the pipeline (where embedding/Qdrant failures are non-blocking side-effects), search failures are user-visible. The search endpoint must gracefully degrade to keyword-only search when Qdrant is unavailable.
- **Random sort on creators page (R014) with pagination.** `ORDER BY random()` in SQL gives different results per page load, making offset-based pagination inconsistent. Since the creators list is likely <100 entries, fetch all and shuffle server-side, or use a seed-based random sort (`ORDER BY md5(id::text || seed)`) with the seed passed from the client.
- **body_sections JSONB structure is undefined.** The `TechniquePage.body_sections` column stores the study guide prose as JSONB, but the exact schema depends on what stage 5 (synthesis) produces. The frontend must handle variable structures gracefully. Check `stage5_synthesis.txt` prompt to understand the expected format.
- **Vite dev proxy only covers `/api`.** The existing proxy in `vite.config.ts` proxies `/api` to `http://localhost:8001`. This is sufficient for all new endpoints since they're all under `/api/v1/`.
## Open Risks
- **body_sections JSONB format is unknown until real pipeline output is examined.** The synthesis prompt (stage 5) defines the structure, but no real pipeline run has been done. The technique page frontend component must be flexible enough to handle whatever JSONB shape the LLM produces. Fallback: render as plain text if structure is unrecognized.
- **Qdrant collection may be empty** if no pipeline runs have completed embedding. The search endpoint must handle empty results gracefully and fall back to keyword search.
- **Embedding API latency** is unknown — it depends on the deployment (local Ollama vs DGX Sparks). The 500ms search target may not be achievable with slow embedding. Client-side debounce (300ms) helps, but server-side latency must be measured.

View file

@ -0,0 +1,123 @@
---
estimated_steps: 68
estimated_files: 7
skills_used: []
---
# T01: Build async search service and all public API endpoints
## Description
Create the backend API surface for S05: the async search service (embedding + Qdrant), search endpoint, technique pages CRUD, topics hierarchy, and enhanced creators endpoint. This is the highest-risk task because it introduces async embedding/Qdrant clients for the FastAPI request path (existing ones are sync for Celery).
## Failure Modes
| Dependency | On error | On timeout | On malformed response |
|------------|----------|-----------|----------------------|
| Embedding API (AsyncOpenAI) | Fall back to keyword-only search | 300ms timeout → keyword fallback | Return empty vectors → keyword fallback |
| Qdrant (AsyncQdrantClient) | Fall back to keyword-only search | 300ms timeout → keyword fallback | Log warning, return empty results → keyword fallback |
| PostgreSQL | Return 500 (standard FastAPI error handling) | Connection pool timeout → 500 | N/A (SQLAlchemy typed) |
## Load Profile
- **Shared resources**: AsyncQdrantClient connection pool, AsyncOpenAI HTTP pool, SQLAlchemy async session pool
- **Per-operation cost**: Search = 1 embedding API call + 1 Qdrant query + 1-3 SQL queries for enrichment. Read endpoints = 1-2 SQL queries each.
- **10x breakpoint**: Embedding API rate limiting (external dependency). Mitigated by client-side debounce (300ms) reducing request rate.
## Negative Tests
- **Malformed inputs**: Empty search query → return empty results. Query > 500 chars → truncate to 500. Invalid scope parameter → default to 'all'.
- **Error paths**: Embedding API unreachable → keyword fallback. Qdrant unreachable → keyword fallback. Invalid slug → 404.
- **Boundary conditions**: Empty Qdrant collection → keyword-only results. Zero matching techniques/creators → empty list.
## Steps
1. Create `backend/search_service.py` with `SearchService` class:
- `__init__` takes Settings, creates `openai.AsyncOpenAI` client and `qdrant_client.AsyncQdrantClient`
- `async embed_query(text: str) -> list[float] | None` — embeds query text with 300ms timeout, returns None on failure
- `async search_qdrant(vector: list[float], limit: int, type_filter: str | None) -> list[dict]` — queries Qdrant with optional payload type filter, returns scored results with payloads
- `async keyword_search(query: str, scope: str, limit: int, db: AsyncSession) -> list[dict]` — ILIKE search on technique_pages.title, key_moments.title, creators.name
- `async search(query: str, scope: str, limit: int, db: AsyncSession) -> dict` — orchestrates: embed → Qdrant → enrich with DB metadata → fallback to keyword if needed
2. Add new Pydantic response schemas to `backend/schemas.py`:
- `SearchResultItem(title, slug, type, score, summary, creator_name, creator_slug, topic_category, topic_tags)`
- `SearchResponse(items: list[SearchResultItem], total: int, query: str, fallback_used: bool)`
- `TechniquePageDetail` (extends TechniquePageRead with nested key_moments, creator info, related links)
- `TopicCategory(name, description, sub_topics: list[TopicSubTopic])` and `TopicSubTopic(name, technique_count, creator_count)`
- `CreatorBrowseItem` (extends CreatorRead with technique_count, video_count)
3. Create `backend/routers/search.py`:
- `GET /search?q=...&scope=all|topics|creators&limit=20`
- Instantiate SearchService from get_settings(), call search(), return SearchResponse
- Log query, latency_ms, result_count, fallback_used at INFO level
4. Create `backend/routers/techniques.py`:
- `GET /techniques` — list technique pages with optional `category`, `creator_slug` query filters, pagination
- `GET /techniques/{slug}` — full detail with eager-loaded key_moments (ordered by start_time), creator info, outgoing+incoming related links
- Return 404 for unknown slug
5. Create `backend/routers/topics.py`:
- `GET /topics` — load `canonical_tags.yaml`, for each category aggregate technique_count and creator_count per sub_topic from DB
- `GET /topics/{category_slug}` — return technique pages filtered by topic_category
6. Extend `backend/routers/creators.py`:
- Add `sort` query param: `random` (default), `alpha`, `views`
- Add `genre` query param for filtering by genre
- Add technique_count and video_count subqueries to list endpoint
- For `sort=random`, use `func.random()` ORDER BY (dataset is small, <100 creators)
7. Mount all new routers in `backend/main.py`:
- `from routers import search, techniques, topics`
- `app.include_router(search.router, prefix="/api/v1")`
- `app.include_router(techniques.router, prefix="/api/v1")`
- `app.include_router(topics.router, prefix="/api/v1")`
## Must-Haves
- [ ] SearchService with async embedding + Qdrant + keyword fallback
- [ ] GET /api/v1/search returns SearchResponse with enriched results
- [ ] GET /api/v1/techniques and GET /api/v1/techniques/{slug} with full detail
- [ ] GET /api/v1/topics returns category hierarchy with counts
- [ ] GET /api/v1/creators supports sort=random (default), genre filter, technique/video counts
- [ ] All new routers mounted in main.py
- [ ] Embedding/Qdrant failures gracefully degrade to keyword search
## Verification
- `cd backend && python -c "from search_service import SearchService; print('OK')"` — imports clean
- `cd backend && python -c "from routers.search import router; print(router.routes)"` — search router has routes
- `cd backend && python -c "from routers.techniques import router; print(router.routes)"` — techniques router has routes
- `cd backend && python -c "from routers.topics import router; print(router.routes)"` — topics router has routes
- `cd backend && python -c "from main import app; routes = [r.path for r in app.routes]; assert '/api/v1/search' in str(routes) or any('search' in str(r.path) for r in app.routes); print('Mounted')"` — routers mounted
## Observability Impact
- Signals added: INFO log per search query with latency_ms, result_count, fallback_used. WARNING on embedding/Qdrant failure with error details.
- How a future agent inspects this: `curl localhost:8001/api/v1/search?q=test` returns structured JSON with timing data
- Failure state exposed: fallback_used=true in search response indicates Qdrant/embedding degradation
## Inputs
- ``backend/models.py` — all 7 ORM models (Creator, SourceVideo, TranscriptSegment, KeyMoment, TechniquePage, RelatedTechniqueLink, Tag)`
- ``backend/database.py` — async engine, get_session dependency`
- ``backend/config.py` — Settings with embedding_api_url, embedding_model, embedding_dimensions, qdrant_url, qdrant_collection`
- ``backend/schemas.py` — existing Pydantic schemas to extend`
- ``backend/routers/creators.py` — existing creators router to enhance`
- ``backend/main.py` — existing router mounting to extend`
- ``backend/pipeline/embedding_client.py` — reference for sync embedding pattern (async variant needed)`
- ``backend/pipeline/qdrant_client.py` — reference for sync Qdrant pattern (async variant needed)`
- ``config/canonical_tags.yaml` — tag taxonomy for topics endpoint`
## Expected Output
- ``backend/search_service.py` — async SearchService with embed_query, search_qdrant, keyword_search, search methods`
- ``backend/schemas.py` — extended with SearchResultItem, SearchResponse, TechniquePageDetail, TopicCategory, TopicSubTopic, CreatorBrowseItem`
- ``backend/routers/search.py` — GET /search endpoint with semantic + keyword fallback`
- ``backend/routers/techniques.py` — GET /techniques and GET /techniques/{slug} endpoints`
- ``backend/routers/topics.py` — GET /topics endpoint with category hierarchy`
- ``backend/routers/creators.py` — enhanced with sort=random, genre filter, counts`
- ``backend/main.py` — all new routers mounted`
## Verification
cd backend && python -c "from search_service import SearchService; from routers.search import router as sr; from routers.techniques import router as tr; from routers.topics import router as tpr; print('All imports OK')" && python -c "from main import app; print([r.path for r in app.routes])"

View file

@ -0,0 +1,94 @@
---
id: T01
parent: S05
milestone: M001
provides: []
requires: []
affects: []
key_files: ["backend/search_service.py", "backend/schemas.py", "backend/routers/search.py", "backend/routers/techniques.py", "backend/routers/topics.py", "backend/routers/creators.py", "backend/main.py"]
key_decisions: ["Used asyncio.wait_for with 300ms timeout on both embedding and Qdrant calls for graceful degradation", "Qdrant query uses query_points() API with Filter for type-based scoping", "Topics endpoint loads canonical_tags.yaml at request time and counts tag matches from DB", "Creator list returns CreatorBrowseItem with correlated subqueries for technique/video counts"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "All five slice verification checks pass: SearchService imports clean, search/techniques/topics routers have routes, all routers mounted in main.py with correct paths. Full existing test suite (40 tests) passes."
completed_at: 2026-03-29T23:55:42.018Z
blocker_discovered: false
---
# T01: Created async search service with embedding+Qdrant+keyword fallback and all public API endpoints (search, techniques, topics, enhanced creators) mounted at /api/v1
> Created async search service with embedding+Qdrant+keyword fallback and all public API endpoints (search, techniques, topics, enhanced creators) mounted at /api/v1
## What Happened
---
id: T01
parent: S05
milestone: M001
key_files:
- backend/search_service.py
- backend/schemas.py
- backend/routers/search.py
- backend/routers/techniques.py
- backend/routers/topics.py
- backend/routers/creators.py
- backend/main.py
key_decisions:
- Used asyncio.wait_for with 300ms timeout on both embedding and Qdrant calls for graceful degradation
- Qdrant query uses query_points() API with Filter for type-based scoping
- Topics endpoint loads canonical_tags.yaml at request time and counts tag matches from DB
- Creator list returns CreatorBrowseItem with correlated subqueries for technique/video counts
duration: ""
verification_result: passed
completed_at: 2026-03-29T23:55:42.018Z
blocker_discovered: false
---
# T01: Created async search service with embedding+Qdrant+keyword fallback and all public API endpoints (search, techniques, topics, enhanced creators) mounted at /api/v1
**Created async search service with embedding+Qdrant+keyword fallback and all public API endpoints (search, techniques, topics, enhanced creators) mounted at /api/v1**
## What Happened
Built the complete backend API surface for S05: SearchService with async embedding (300ms timeout) + Qdrant vector search + keyword ILIKE fallback, search/techniques/topics routers, enhanced creators router with sort=random (R014), genre filter, and technique/video counts. All mounted in main.py. Input validation handles empty queries, long queries (truncated to 500), and invalid scope (defaults to "all"). All 40 existing tests pass with zero regressions.
## Verification
All five slice verification checks pass: SearchService imports clean, search/techniques/topics routers have routes, all routers mounted in main.py with correct paths. Full existing test suite (40 tests) passes.
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `cd backend && python -c "from search_service import SearchService; print('OK')"` | 0 | ✅ pass | 500ms |
| 2 | `cd backend && python -c "from routers.search import router; print(router.routes)"` | 0 | ✅ pass | 500ms |
| 3 | `cd backend && python -c "from routers.techniques import router; print(router.routes)"` | 0 | ✅ pass | 500ms |
| 4 | `cd backend && python -c "from routers.topics import router; print(router.routes)"` | 0 | ✅ pass | 500ms |
| 5 | `cd backend && python -c "from main import app; routes = [r.path for r in app.routes]; assert '/api/v1/search' in str(routes); print('Mounted')"` | 0 | ✅ pass | 500ms |
| 6 | `cd backend && python -m pytest tests/ -v` | 0 | ✅ pass (40/40) | 132000ms |
## Deviations
None.
## Known Issues
None.
## Files Created/Modified
- `backend/search_service.py`
- `backend/schemas.py`
- `backend/routers/search.py`
- `backend/routers/techniques.py`
- `backend/routers/topics.py`
- `backend/routers/creators.py`
- `backend/main.py`
## Deviations
None.
## Known Issues
None.

View file

@ -0,0 +1,72 @@
---
estimated_steps: 31
estimated_files: 3
skills_used: []
---
# T02: Add integration tests for search and public API endpoints
## Description
Write integration tests for all new S05 backend endpoints: search (with mocked embedding API and Qdrant), techniques list/detail, topics hierarchy, and enhanced creators (randomized sort, genre filter, counts). Tests run against real PostgreSQL with the existing conftest.py fixtures. All 40 existing tests must continue to pass.
## Negative Tests
- **Malformed inputs**: Empty search query returns empty results. Invalid technique slug returns 404. Invalid topic category returns empty list.
- **Error paths**: Search with mocked embedding failure → keyword fallback results returned. Search with mocked Qdrant failure → keyword fallback.
- **Boundary conditions**: Search with no matching results → empty items list. Topics with no technique pages → zero counts. Creators list with no creators → empty list.
## Steps
1. Create `backend/tests/test_search.py`:
- Fixture: seed DB with 2 creators, 3 technique pages (different categories/tags), 5 key moments
- Test search endpoint with mocked SearchService that returns canned results → verify response shape (items, total, query, fallback_used)
- Test search with empty query → returns empty results or validation error
- Test search keyword fallback: mock embedding to return None → verify keyword results returned and fallback_used=true
- Test search scope filtering (scope=topics returns only technique_page type results)
2. Create `backend/tests/test_public_api.py`:
- Test GET /api/v1/techniques — returns list of technique pages, supports category filter
- Test GET /api/v1/techniques/{slug} — returns full detail with key_moments, creator info, related links
- Test GET /api/v1/techniques/{slug} with invalid slug → 404
- Test GET /api/v1/topics — returns category hierarchy with counts matching seeded data
- Test GET /api/v1/creators?sort=random — returns creators (verify all returned, order may vary)
- Test GET /api/v1/creators?sort=alpha — returns creators in alphabetical order
- Test GET /api/v1/creators?genre=Bass+music — returns only matching creators
- Test GET /api/v1/creators/{slug} — returns detail with technique_count, video_count
3. Run full test suite: `cd backend && python -m pytest tests/ -v` — all 40 existing + new tests pass
## Must-Haves
- [ ] test_search.py with ≥4 tests covering happy path, empty query, keyword fallback, scope filter
- [ ] test_public_api.py with ≥8 tests covering techniques list/detail/404, topics hierarchy, creators sort/filter/detail
- [ ] All 40 existing tests still pass (regression)
- [ ] Tests use real PostgreSQL with seeded data (not mocked DB)
## Verification
- `cd backend && python -m pytest tests/test_search.py tests/test_public_api.py -v` — all new tests pass
- `cd backend && python -m pytest tests/ -v` — all tests pass (40 existing + new)
## Inputs
- ``backend/search_service.py` — SearchService class to mock for search tests`
- ``backend/routers/search.py` — search endpoint under test`
- ``backend/routers/techniques.py` — techniques endpoints under test`
- ``backend/routers/topics.py` — topics endpoint under test`
- ``backend/routers/creators.py` — enhanced creators endpoint under test`
- ``backend/schemas.py` — response schemas for assertion shapes`
- ``backend/models.py` — ORM models for seeding test data`
- ``backend/tests/conftest.py` — existing test fixtures (db_engine, client)`
- ``config/canonical_tags.yaml` — expected tag structure for topics test assertions`
## Expected Output
- ``backend/tests/test_search.py` — ≥4 integration tests for search endpoint`
- ``backend/tests/test_public_api.py` — ≥8 integration tests for techniques, topics, creators endpoints`
- ``backend/tests/conftest.py` — possibly extended with shared seed fixtures for S05 tests`
## Verification
cd backend && python -m pytest tests/test_search.py tests/test_public_api.py -v && python -m pytest tests/ -v

View file

@ -0,0 +1,98 @@
---
estimated_steps: 54
estimated_files: 6
skills_used: []
---
# T03: Build frontend search flow: landing page, search results, and technique page
## Description
Build the primary user flow: landing page with search bar → search results page → technique page detail. This is the R005/R006/R015 critical path. Includes the new typed API client for public endpoints, App.tsx routing with both admin and public routes, and 3 new page components with CSS.
The frontend uses React 18 + Vite + TypeScript with strict mode (`noUnusedLocals`, `noUnusedParameters`, `noUncheckedIndexedAccess`). Existing pattern: plain CSS in `App.css`, typed `fetch()` API client, React Router v6.
## Steps
1. Create `frontend/src/api/public-client.ts` — typed API client for public endpoints:
- Types: `SearchResultItem`, `SearchResponse`, `TechniquePageDetail`, `KeyMomentSummary`, `CreatorInfo`, `RelatedLink`, `TopicCategory`, `TopicSubTopic`, `CreatorBrowseItem`
- Functions: `searchApi(q, scope?, limit?)`, `fetchTechnique(slug)`, `fetchTechniques(params?)`, `fetchTopics()`, `fetchCreators(params?)`, `fetchCreator(slug)`
- Reuse the `request<T>` helper pattern from existing `client.ts` (or extract shared helper)
2. Create `frontend/src/pages/Home.tsx` — landing page:
- Prominent search bar (auto-focus on mount) with debounce (300ms)
- Live typeahead: after 2+ chars, show top 5 results in dropdown below search bar
- On Enter or "See all results" link, navigate to `/search?q=...`
- Two navigation cards: "Topics" (links to `/topics`) and "Creators" (links to `/creators`)
- "Recently Added" section showing last 5 technique pages (fetch from `/api/v1/techniques?limit=5`)
3. Create `frontend/src/pages/SearchResults.tsx` — full search results page:
- Read `q` from URL search params
- Display results grouped by type (technique_pages first, then key_moments)
- Each result: title (linked to technique page), summary snippet, creator name, category/tags
- Show "No results found" for empty results, "Showing keyword results" when fallback_used=true
- Search bar at top for refining query
4. Create `frontend/src/pages/TechniquePage.tsx` — technique page display (R006):
- Fetch technique by slug from URL params via `fetchTechnique(slug)`
- Header: title, topic_category badge, topic_tags pills, creator name (linked to `/creators/{slug}`), source_quality indicator
- Amber banner if source_quality === 'unstructured' (livestream-sourced)
- Study guide prose: render `body_sections` JSONB — iterate Object.entries, render each section as `<h2>` + paragraph. Handle both string and object values gracefully.
- Key moments index: ordered list with title, start_time→end_time, content_type badge, summary
- Signal chains section (if present): render each chain as name + ordered steps
- Plugins referenced (if present): pill list
- Related techniques (if present): linked list
- Loading state and 404 error state
5. Update `frontend/src/App.tsx` routing:
- Import new pages
- Add public routes: `/` → Home, `/search` → SearchResults, `/techniques/:slug` → TechniquePage
- Keep admin routes at `/admin/*`
- Update header: "Chrysopedia" title (not "Chrysopedia Admin"), nav links to Home, Topics, Creators, and Admin
6. Add CSS to `frontend/src/App.css` for new pages:
- Search bar styles (large, centered on home, inline on results page)
- Typeahead dropdown styles
- Navigation cards (grid layout)
- Technique page layout (readable prose width, section spacing)
- Search result items (hover state, meta info)
- Tag/badge pill styles
- Loading and error states
## Must-Haves
- [ ] Typed public API client with all endpoint functions
- [ ] Landing page with search bar, typeahead, navigation cards, recently added
- [ ] Search results page with grouped results
- [ ] Technique page with all sections (header, prose, key moments, related links)
- [ ] App.tsx routes both public and admin paths
- [ ] `cd frontend && npx tsc -b` passes with zero errors
## Verification
- `cd frontend && npx tsc -b` — zero TypeScript errors
- `cd frontend && npm run build` — clean production build
- Verify files exist: `test -f frontend/src/api/public-client.ts && test -f frontend/src/pages/Home.tsx && test -f frontend/src/pages/SearchResults.tsx && test -f frontend/src/pages/TechniquePage.tsx`
## Inputs
- ``frontend/src/api/client.ts` — existing API client pattern (request helper, typed functions)`
- ``frontend/src/App.tsx` — existing routing structure to extend`
- ``frontend/src/App.css` — existing styles to extend (620 lines)`
- ``frontend/src/main.tsx` — entry point (BrowserRouter already configured)`
- ``frontend/tsconfig.app.json` — strict TS config (noUnusedLocals, noUnusedParameters, noUncheckedIndexedAccess)`
- ``frontend/package.json` — dependencies (react 18, react-router-dom 6, vite 6)`
- ``backend/schemas.py` — response schemas defining API contract shapes (SearchResponse, TechniquePageDetail, etc.)`
## Expected Output
- ``frontend/src/api/public-client.ts` — typed API client for search, techniques, topics, creators endpoints`
- ``frontend/src/pages/Home.tsx` — landing page with search bar, typeahead, nav cards, recently added`
- ``frontend/src/pages/SearchResults.tsx` — search results page with grouped results`
- ``frontend/src/pages/TechniquePage.tsx` — full technique page display with all sections`
- ``frontend/src/App.tsx` — updated with public + admin routes and navigation`
- ``frontend/src/App.css` — extended with styles for all new components`
## Verification
cd frontend && npx tsc -b && npm run build && echo 'Frontend build OK'

View file

@ -0,0 +1,88 @@
---
estimated_steps: 48
estimated_files: 5
skills_used: []
---
# T04: Build frontend browse pages (creators, topics) and verify full build
## Description
Build the remaining browse pages: CreatorsBrowse (R007, R014 creator equity with randomized default sort), CreatorDetail, and TopicsBrowse (R008 two-level hierarchy). Then run final verification to confirm the full frontend builds cleanly and all requirements are covered.
## Steps
1. Create `frontend/src/pages/CreatorsBrowse.tsx` — creators browse page (R007, R014):
- Fetch creators from `/api/v1/creators?sort=random` (default) via `fetchCreators()`
- Genre filter pills at top (fetch unique genres from creator data, or hardcode from canonical_tags.yaml genres)
- Type-to-narrow input that filters displayed creators client-side by name
- Sort toggle: Random (default), Alphabetical, Views — each triggers re-fetch with `sort=random|alpha|views`
- Each creator row: name, genre tags (pills), technique_count, video_count, view_count
- Click row → navigate to `/creators/{slug}`
- All creators get equal visual weight (no featured/highlighted creators) per R014
2. Create `frontend/src/pages/CreatorDetail.tsx` — creator detail page:
- Fetch creator by slug via `fetchCreator(slug)` — shows name, genres, video_count, technique_count
- Fetch creator's technique pages via `fetchTechniques({ creator_slug: slug })`
- List technique pages with title (linked to `/techniques/{slug}`), category, tags, summary
- Loading state and 404 error state
3. Create `frontend/src/pages/TopicsBrowse.tsx` — topics browse page (R008):
- Fetch topics from `/api/v1/topics` via `fetchTopics()`
- Two-level hierarchy: 6 top-level categories (Sound design, Mixing, Synthesis, Arrangement, Workflow, Mastering)
- Each category expandable/collapsible, showing sub-topics
- Each sub-topic shows technique_count and creator_count
- Click sub-topic → navigate to `/search?q={sub_topic_name}&scope=topics` or filter technique list
- Filter input at top to narrow categories/sub-topics
4. Update `frontend/src/App.tsx` — add routes for browse pages:
- `/creators` → CreatorsBrowse
- `/creators/:slug` → CreatorDetail
- `/topics` → TopicsBrowse
- Import new page components
5. Add CSS to `frontend/src/App.css` for browse pages:
- Creator list styles (rows, genre pills, counts)
- Creator detail page layout
- Topics hierarchy styles (collapsible sections, sub-topic rows, counts)
- Filter input styles
- Sort toggle button group
6. Final verification:
- `cd frontend && npx tsc -b` — zero errors
- `cd frontend && npm run build` — clean build
- Verify all 6 page files exist
## Must-Haves
- [ ] Creators browse page with randomized default sort (R014), genre filter, type-to-narrow, sort toggle (R007)
- [ ] Creator detail page showing creator info and their technique pages
- [ ] Topics browse page with two-level hierarchy, counts, clickable sub-topics (R008)
- [ ] All routes registered in App.tsx
- [ ] `cd frontend && npx tsc -b` passes with zero errors
- [ ] `cd frontend && npm run build` succeeds
## Verification
- `cd frontend && npx tsc -b && npm run build && echo 'Build OK'`
- `test -f frontend/src/pages/CreatorsBrowse.tsx && test -f frontend/src/pages/CreatorDetail.tsx && test -f frontend/src/pages/TopicsBrowse.tsx && echo 'All pages exist'`
## Inputs
- ``frontend/src/api/public-client.ts` — typed API client with fetchCreators, fetchCreator, fetchTopics, fetchTechniques`
- ``frontend/src/App.tsx` — routing structure from T03 to extend with browse routes`
- ``frontend/src/App.css` — styles from T03 to extend with browse page styles`
- ``frontend/src/pages/Home.tsx` — reference for component patterns established in T03`
- ``config/canonical_tags.yaml` — genre list for creator filter pills (Bass music, Drum & bass, etc.)`
## Expected Output
- ``frontend/src/pages/CreatorsBrowse.tsx` — creators browse page with randomized sort, genre filter, type-to-narrow`
- ``frontend/src/pages/CreatorDetail.tsx` — creator detail page with technique page list`
- ``frontend/src/pages/TopicsBrowse.tsx` — two-level topic hierarchy with counts`
- ``frontend/src/App.tsx` — all 9 routes registered (3 public + 3 browse + 2 admin + fallback)`
- ``frontend/src/App.css` — complete CSS for all S05 pages`
## Verification
cd frontend && npx tsc -b && npm run build && test -f src/pages/CreatorsBrowse.tsx && test -f src/pages/CreatorDetail.tsx && test -f src/pages/TopicsBrowse.tsx && echo 'All browse pages built OK'

View file

@ -12,7 +12,7 @@ from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from config import get_settings from config import get_settings
from routers import creators, health, ingest, pipeline, review, videos from routers import creators, health, ingest, pipeline, review, search, techniques, topics, videos
def _setup_logging() -> None: def _setup_logging() -> None:
@ -82,6 +82,9 @@ app.include_router(creators.router, prefix="/api/v1")
app.include_router(ingest.router, prefix="/api/v1") app.include_router(ingest.router, prefix="/api/v1")
app.include_router(pipeline.router, prefix="/api/v1") app.include_router(pipeline.router, prefix="/api/v1")
app.include_router(review.router, prefix="/api/v1") app.include_router(review.router, prefix="/api/v1")
app.include_router(search.router, prefix="/api/v1")
app.include_router(techniques.router, prefix="/api/v1")
app.include_router(topics.router, prefix="/api/v1")
app.include_router(videos.router, prefix="/api/v1") app.include_router(videos.router, prefix="/api/v1")

View file

@ -1,4 +1,8 @@
"""Creator endpoints for Chrysopedia API.""" """Creator endpoints for Chrysopedia API.
Enhanced with sort (random default per R014), genre filter, and
technique/video counts for browse pages.
"""
import logging import logging
from typing import Annotated from typing import Annotated
@ -8,26 +12,79 @@ from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from database import get_session from database import get_session
from models import Creator, SourceVideo from models import Creator, SourceVideo, TechniquePage
from schemas import CreatorDetail, CreatorRead from schemas import CreatorBrowseItem, CreatorDetail, CreatorRead
logger = logging.getLogger("chrysopedia.creators") logger = logging.getLogger("chrysopedia.creators")
router = APIRouter(prefix="/creators", tags=["creators"]) router = APIRouter(prefix="/creators", tags=["creators"])
@router.get("", response_model=list[CreatorRead]) @router.get("", response_model=list[CreatorBrowseItem])
async def list_creators( async def list_creators(
sort: Annotated[str, Query()] = "random",
genre: Annotated[str | None, Query()] = None,
offset: Annotated[int, Query(ge=0)] = 0, offset: Annotated[int, Query(ge=0)] = 0,
limit: Annotated[int, Query(ge=1, le=100)] = 50, limit: Annotated[int, Query(ge=1, le=100)] = 50,
db: AsyncSession = Depends(get_session), db: AsyncSession = Depends(get_session),
) -> list[CreatorRead]: ) -> list[CreatorBrowseItem]:
"""List all creators with pagination.""" """List creators with sort, genre filter, and technique/video counts.
stmt = select(Creator).order_by(Creator.name).offset(offset).limit(limit)
- **sort**: ``random`` (default, R014 creator equity), ``alpha``, ``views``
- **genre**: filter by genre (matches against ARRAY column)
"""
# Subqueries for counts
technique_count_sq = (
select(func.count())
.where(TechniquePage.creator_id == Creator.id)
.correlate(Creator)
.scalar_subquery()
)
video_count_sq = (
select(func.count())
.where(SourceVideo.creator_id == Creator.id)
.correlate(Creator)
.scalar_subquery()
)
stmt = select(
Creator,
technique_count_sq.label("technique_count"),
video_count_sq.label("video_count"),
)
# Genre filter
if genre:
stmt = stmt.where(Creator.genres.any(genre))
# Sorting
if sort == "alpha":
stmt = stmt.order_by(Creator.name)
elif sort == "views":
stmt = stmt.order_by(Creator.view_count.desc())
else:
# Default: random (small dataset <100, func.random() is fine)
stmt = stmt.order_by(func.random())
stmt = stmt.offset(offset).limit(limit)
result = await db.execute(stmt) result = await db.execute(stmt)
creators = result.scalars().all() rows = result.all()
logger.debug("Listed %d creators (offset=%d, limit=%d)", len(creators), offset, limit)
return [CreatorRead.model_validate(c) for c in creators] items: list[CreatorBrowseItem] = []
for row in rows:
creator = row[0]
tc = row[1] or 0
vc = row[2] or 0
base = CreatorRead.model_validate(creator)
items.append(
CreatorBrowseItem(**base.model_dump(), technique_count=tc, video_count=vc)
)
logger.debug(
"Listed %d creators (sort=%s, genre=%s, offset=%d, limit=%d)",
len(items), sort, genre, offset, limit,
)
return items
@router.get("/{slug}", response_model=CreatorDetail) @router.get("/{slug}", response_model=CreatorDetail)

46
backend/routers/search.py Normal file
View file

@ -0,0 +1,46 @@
"""Search endpoint for semantic + keyword search with graceful fallback."""
from __future__ import annotations
import logging
from typing import Annotated
from fastapi import APIRouter, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from config import get_settings
from database import get_session
from schemas import SearchResponse, SearchResultItem
from search_service import SearchService
logger = logging.getLogger("chrysopedia.search.router")
router = APIRouter(prefix="/search", tags=["search"])
def _get_search_service() -> SearchService:
"""Build a SearchService from current settings."""
return SearchService(get_settings())
@router.get("", response_model=SearchResponse)
async def search(
q: Annotated[str, Query(max_length=500)] = "",
scope: Annotated[str, Query()] = "all",
limit: Annotated[int, Query(ge=1, le=100)] = 20,
db: AsyncSession = Depends(get_session),
) -> SearchResponse:
"""Semantic search with keyword fallback.
- **q**: Search query (max 500 chars). Empty empty results.
- **scope**: ``all`` | ``topics`` | ``creators``. Invalid defaults to ``all``.
- **limit**: Max results (1100, default 20).
"""
svc = _get_search_service()
result = await svc.search(query=q, scope=scope, limit=limit, db=db)
return SearchResponse(
items=[SearchResultItem(**item) for item in result["items"]],
total=result["total"],
query=result["query"],
fallback_used=result["fallback_used"],
)

View file

@ -0,0 +1,134 @@
"""Technique page endpoints — list and detail with eager-loaded relations."""
from __future__ import annotations
import logging
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from database import get_session
from models import Creator, KeyMoment, RelatedTechniqueLink, TechniquePage
from schemas import (
CreatorInfo,
KeyMomentSummary,
PaginatedResponse,
RelatedLinkItem,
TechniquePageDetail,
TechniquePageRead,
)
logger = logging.getLogger("chrysopedia.techniques")
router = APIRouter(prefix="/techniques", tags=["techniques"])
@router.get("", response_model=PaginatedResponse)
async def list_techniques(
category: Annotated[str | None, Query()] = None,
creator_slug: Annotated[str | None, Query()] = None,
offset: Annotated[int, Query(ge=0)] = 0,
limit: Annotated[int, Query(ge=1, le=100)] = 50,
db: AsyncSession = Depends(get_session),
) -> PaginatedResponse:
"""List technique pages with optional category/creator filtering."""
stmt = select(TechniquePage)
if category:
stmt = stmt.where(TechniquePage.topic_category == category)
if creator_slug:
# Join to Creator to filter by slug
stmt = stmt.join(Creator, TechniquePage.creator_id == Creator.id).where(
Creator.slug == creator_slug
)
# Count total before pagination
from sqlalchemy import func
count_stmt = select(func.count()).select_from(stmt.subquery())
count_result = await db.execute(count_stmt)
total = count_result.scalar() or 0
stmt = stmt.order_by(TechniquePage.created_at.desc()).offset(offset).limit(limit)
result = await db.execute(stmt)
pages = result.scalars().all()
return PaginatedResponse(
items=[TechniquePageRead.model_validate(p) for p in pages],
total=total,
offset=offset,
limit=limit,
)
@router.get("/{slug}", response_model=TechniquePageDetail)
async def get_technique(
slug: str,
db: AsyncSession = Depends(get_session),
) -> TechniquePageDetail:
"""Get full technique page detail with key moments, creator, and related links."""
stmt = (
select(TechniquePage)
.where(TechniquePage.slug == slug)
.options(
selectinload(TechniquePage.key_moments),
selectinload(TechniquePage.creator),
selectinload(TechniquePage.outgoing_links).selectinload(
RelatedTechniqueLink.target_page
),
selectinload(TechniquePage.incoming_links).selectinload(
RelatedTechniqueLink.source_page
),
)
)
result = await db.execute(stmt)
page = result.scalar_one_or_none()
if page is None:
raise HTTPException(status_code=404, detail=f"Technique '{slug}' not found")
# Build key moments (ordered by start_time)
key_moments = sorted(page.key_moments, key=lambda km: km.start_time)
key_moment_items = [KeyMomentSummary.model_validate(km) for km in key_moments]
# Build creator info
creator_info = None
if page.creator:
creator_info = CreatorInfo(
name=page.creator.name,
slug=page.creator.slug,
genres=page.creator.genres,
)
# Build related links (outgoing + incoming)
related_links: list[RelatedLinkItem] = []
for link in page.outgoing_links:
if link.target_page:
related_links.append(
RelatedLinkItem(
target_title=link.target_page.title,
target_slug=link.target_page.slug,
relationship=link.relationship.value if hasattr(link.relationship, 'value') else str(link.relationship),
)
)
for link in page.incoming_links:
if link.source_page:
related_links.append(
RelatedLinkItem(
target_title=link.source_page.title,
target_slug=link.source_page.slug,
relationship=link.relationship.value if hasattr(link.relationship, 'value') else str(link.relationship),
)
)
base = TechniquePageRead.model_validate(page)
return TechniquePageDetail(
**base.model_dump(),
key_moments=key_moment_items,
creator_info=creator_info,
related_links=related_links,
)

135
backend/routers/topics.py Normal file
View file

@ -0,0 +1,135 @@
"""Topics endpoint — two-level category hierarchy with aggregated counts."""
from __future__ import annotations
import logging
import os
from typing import Annotated, Any
import yaml
from fastapi import APIRouter, Depends, Query
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from database import get_session
from models import Creator, TechniquePage
from schemas import (
PaginatedResponse,
TechniquePageRead,
TopicCategory,
TopicSubTopic,
)
logger = logging.getLogger("chrysopedia.topics")
router = APIRouter(prefix="/topics", tags=["topics"])
# Path to canonical_tags.yaml relative to the backend directory
_TAGS_PATH = os.path.join(os.path.dirname(__file__), "..", "..", "config", "canonical_tags.yaml")
def _load_canonical_tags() -> list[dict[str, Any]]:
"""Load the canonical tag categories from YAML."""
path = os.path.normpath(_TAGS_PATH)
try:
with open(path) as f:
data = yaml.safe_load(f)
return data.get("categories", [])
except FileNotFoundError:
logger.warning("canonical_tags.yaml not found at %s", path)
return []
@router.get("", response_model=list[TopicCategory])
async def list_topics(
db: AsyncSession = Depends(get_session),
) -> list[TopicCategory]:
"""Return the two-level topic hierarchy with technique/creator counts per sub-topic.
Categories come from ``canonical_tags.yaml``. Counts are computed
from live DB data by matching ``topic_tags`` array contents.
"""
categories = _load_canonical_tags()
# Pre-fetch all technique pages with their tags and creator_ids for counting
tp_stmt = select(
TechniquePage.topic_category,
TechniquePage.topic_tags,
TechniquePage.creator_id,
)
tp_result = await db.execute(tp_stmt)
tp_rows = tp_result.all()
# Build per-sub-topic counts
result: list[TopicCategory] = []
for cat in categories:
cat_name = cat.get("name", "")
cat_desc = cat.get("description", "")
sub_topic_names: list[str] = cat.get("sub_topics", [])
sub_topics: list[TopicSubTopic] = []
for st_name in sub_topic_names:
technique_count = 0
creator_ids: set[str] = set()
for tp_cat, tp_tags, tp_creator_id in tp_rows:
tags = tp_tags or []
# Match if the sub-topic name appears in the technique's tags
# or if the category matches and tag is in sub-topics
if st_name.lower() in [t.lower() for t in tags]:
technique_count += 1
creator_ids.add(str(tp_creator_id))
sub_topics.append(
TopicSubTopic(
name=st_name,
technique_count=technique_count,
creator_count=len(creator_ids),
)
)
result.append(
TopicCategory(
name=cat_name,
description=cat_desc,
sub_topics=sub_topics,
)
)
return result
@router.get("/{category_slug}", response_model=PaginatedResponse)
async def get_topic_techniques(
category_slug: str,
offset: Annotated[int, Query(ge=0)] = 0,
limit: Annotated[int, Query(ge=1, le=100)] = 50,
db: AsyncSession = Depends(get_session),
) -> PaginatedResponse:
"""Return technique pages filtered by topic_category.
The ``category_slug`` is matched case-insensitively against
``technique_pages.topic_category`` (e.g. 'sound-design' matches 'Sound design').
"""
# Normalize slug to category name: replace hyphens with spaces, title-case
category_name = category_slug.replace("-", " ").title()
# Also try exact match on the slug form
stmt = select(TechniquePage).where(
TechniquePage.topic_category.ilike(category_name)
)
count_stmt = select(func.count()).select_from(stmt.subquery())
count_result = await db.execute(count_stmt)
total = count_result.scalar() or 0
stmt = stmt.order_by(TechniquePage.title).offset(offset).limit(limit)
result = await db.execute(stmt)
pages = result.scalars().all()
return PaginatedResponse(
items=[TechniquePageRead.model_validate(p) for p in pages],
total=total,
offset=offset,
limit=limit,
)

View file

@ -248,3 +248,90 @@ class ReviewModeResponse(BaseModel):
class ReviewModeUpdate(BaseModel): class ReviewModeUpdate(BaseModel):
"""Request to update the review mode.""" """Request to update the review mode."""
review_mode: bool review_mode: bool
# ── Search ───────────────────────────────────────────────────────────────────
class SearchResultItem(BaseModel):
"""A single search result."""
title: str
slug: str = ""
type: str = ""
score: float = 0.0
summary: str = ""
creator_name: str = ""
creator_slug: str = ""
topic_category: str = ""
topic_tags: list[str] = Field(default_factory=list)
class SearchResponse(BaseModel):
"""Top-level search response with metadata."""
items: list[SearchResultItem] = Field(default_factory=list)
total: int = 0
query: str = ""
fallback_used: bool = False
# ── Technique Page Detail ────────────────────────────────────────────────────
class KeyMomentSummary(BaseModel):
"""Lightweight key moment for technique page detail."""
model_config = ConfigDict(from_attributes=True)
id: uuid.UUID
title: str
summary: str
start_time: float
end_time: float
content_type: str
plugins: list[str] | None = None
class RelatedLinkItem(BaseModel):
"""A related technique link with target info."""
model_config = ConfigDict(from_attributes=True)
target_title: str = ""
target_slug: str = ""
relationship: str = ""
class CreatorInfo(BaseModel):
"""Minimal creator info embedded in technique detail."""
model_config = ConfigDict(from_attributes=True)
name: str
slug: str
genres: list[str] | None = None
class TechniquePageDetail(TechniquePageRead):
"""Technique page with nested key moments, creator, and related links."""
key_moments: list[KeyMomentSummary] = Field(default_factory=list)
creator_info: CreatorInfo | None = None
related_links: list[RelatedLinkItem] = Field(default_factory=list)
# ── Topics ───────────────────────────────────────────────────────────────────
class TopicSubTopic(BaseModel):
"""A sub-topic with aggregated counts."""
name: str
technique_count: int = 0
creator_count: int = 0
class TopicCategory(BaseModel):
"""A top-level topic category with sub-topics."""
name: str
description: str = ""
sub_topics: list[TopicSubTopic] = Field(default_factory=list)
# ── Creator Browse ───────────────────────────────────────────────────────────
class CreatorBrowseItem(CreatorRead):
"""Creator with technique and video counts for browse pages."""
technique_count: int = 0
video_count: int = 0

337
backend/search_service.py Normal file
View file

@ -0,0 +1,337 @@
"""Async search service for the public search endpoint.
Orchestrates semantic search (embedding + Qdrant) with keyword fallback.
All external calls have timeouts and graceful degradation if embedding
or Qdrant fail, the service falls back to keyword-only (ILIKE) search.
"""
from __future__ import annotations
import asyncio
import logging
import time
from typing import Any
import openai
from qdrant_client import AsyncQdrantClient
from qdrant_client.http import exceptions as qdrant_exceptions
from qdrant_client.models import FieldCondition, Filter, MatchValue
from sqlalchemy import or_, select
from sqlalchemy.ext.asyncio import AsyncSession
from config import Settings
from models import Creator, KeyMoment, TechniquePage
logger = logging.getLogger("chrysopedia.search")
# Timeout for external calls (embedding API, Qdrant) in seconds
_EXTERNAL_TIMEOUT = 0.3 # 300ms per plan
class SearchService:
"""Async search service with semantic + keyword fallback.
Parameters
----------
settings:
Application settings containing embedding and Qdrant config.
"""
def __init__(self, settings: Settings) -> None:
self.settings = settings
self._openai = openai.AsyncOpenAI(
base_url=settings.embedding_api_url,
api_key=settings.llm_api_key,
)
self._qdrant = AsyncQdrantClient(url=settings.qdrant_url)
self._collection = settings.qdrant_collection
# ── Embedding ────────────────────────────────────────────────────────
async def embed_query(self, text: str) -> list[float] | None:
"""Embed a query string into a vector.
Returns None on any failure (timeout, connection, malformed response)
so the caller can fall back to keyword search.
"""
try:
response = await asyncio.wait_for(
self._openai.embeddings.create(
model=self.settings.embedding_model,
input=text,
),
timeout=_EXTERNAL_TIMEOUT,
)
except asyncio.TimeoutError:
logger.warning("Embedding API timeout (%.0fms limit) for query: %.50s", _EXTERNAL_TIMEOUT * 1000, text)
return None
except (openai.APIConnectionError, openai.APITimeoutError) as exc:
logger.warning("Embedding API connection error (%s: %s)", type(exc).__name__, exc)
return None
except openai.APIError as exc:
logger.warning("Embedding API error (%s: %s)", type(exc).__name__, exc)
return None
if not response.data:
logger.warning("Embedding API returned empty data for query: %.50s", text)
return None
vector = response.data[0].embedding
if len(vector) != self.settings.embedding_dimensions:
logger.warning(
"Embedding dimension mismatch: expected %d, got %d",
self.settings.embedding_dimensions,
len(vector),
)
return None
return vector
# ── Qdrant vector search ─────────────────────────────────────────────
async def search_qdrant(
self,
vector: list[float],
limit: int = 20,
type_filter: str | None = None,
) -> list[dict[str, Any]]:
"""Search Qdrant for nearest neighbours.
Returns a list of dicts with 'score' and 'payload' keys.
Returns empty list on failure.
"""
query_filter = None
if type_filter:
query_filter = Filter(
must=[FieldCondition(key="type", match=MatchValue(value=type_filter))]
)
try:
results = await asyncio.wait_for(
self._qdrant.query_points(
collection_name=self._collection,
query=vector,
query_filter=query_filter,
limit=limit,
with_payload=True,
),
timeout=_EXTERNAL_TIMEOUT,
)
except asyncio.TimeoutError:
logger.warning("Qdrant search timeout (%.0fms limit)", _EXTERNAL_TIMEOUT * 1000)
return []
except qdrant_exceptions.UnexpectedResponse as exc:
logger.warning("Qdrant search error: %s", exc)
return []
except Exception as exc:
logger.warning("Qdrant connection error (%s: %s)", type(exc).__name__, exc)
return []
return [
{"score": point.score, "payload": point.payload}
for point in results.points
]
# ── Keyword fallback ─────────────────────────────────────────────────
async def keyword_search(
self,
query: str,
scope: str,
limit: int,
db: AsyncSession,
) -> list[dict[str, Any]]:
"""ILIKE keyword search across technique pages, key moments, and creators.
Searches title/name columns. Returns a unified list of result dicts.
"""
results: list[dict[str, Any]] = []
pattern = f"%{query}%"
if scope in ("all", "topics"):
stmt = (
select(TechniquePage)
.where(
or_(
TechniquePage.title.ilike(pattern),
TechniquePage.summary.ilike(pattern),
)
)
.limit(limit)
)
rows = await db.execute(stmt)
for tp in rows.scalars().all():
results.append({
"type": "technique_page",
"title": tp.title,
"slug": tp.slug,
"summary": tp.summary or "",
"topic_category": tp.topic_category,
"topic_tags": tp.topic_tags or [],
"creator_id": str(tp.creator_id),
"score": 0.0,
})
if scope in ("all",):
km_stmt = (
select(KeyMoment)
.where(KeyMoment.title.ilike(pattern))
.limit(limit)
)
km_rows = await db.execute(km_stmt)
for km in km_rows.scalars().all():
results.append({
"type": "key_moment",
"title": km.title,
"slug": "",
"summary": km.summary or "",
"topic_category": "",
"topic_tags": [],
"creator_id": "",
"score": 0.0,
})
if scope in ("all", "creators"):
cr_stmt = (
select(Creator)
.where(Creator.name.ilike(pattern))
.limit(limit)
)
cr_rows = await db.execute(cr_stmt)
for cr in cr_rows.scalars().all():
results.append({
"type": "creator",
"title": cr.name,
"slug": cr.slug,
"summary": "",
"topic_category": "",
"topic_tags": cr.genres or [],
"creator_id": str(cr.id),
"score": 0.0,
})
return results[:limit]
# ── Orchestrator ─────────────────────────────────────────────────────
async def search(
self,
query: str,
scope: str,
limit: int,
db: AsyncSession,
) -> dict[str, Any]:
"""Run semantic search with keyword fallback.
Returns a dict matching the SearchResponse schema shape.
"""
start = time.monotonic()
# Validate / sanitize inputs
if not query or not query.strip():
return {"items": [], "total": 0, "query": query, "fallback_used": False}
# Truncate long queries
query = query.strip()[:500]
# Normalize scope
if scope not in ("all", "topics", "creators"):
scope = "all"
# Map scope to Qdrant type filter
type_filter_map = {
"all": None,
"topics": "technique_page",
"creators": None, # creators aren't in Qdrant
}
qdrant_type_filter = type_filter_map.get(scope)
fallback_used = False
items: list[dict[str, Any]] = []
# Try semantic search
vector = await self.embed_query(query)
if vector is not None:
qdrant_results = await self.search_qdrant(vector, limit=limit, type_filter=qdrant_type_filter)
if qdrant_results:
# Enrich Qdrant results with DB metadata
items = await self._enrich_results(qdrant_results, db)
# Fallback to keyword search if semantic failed or returned nothing
if not items:
items = await self.keyword_search(query, scope, limit, db)
fallback_used = True
elapsed_ms = (time.monotonic() - start) * 1000
logger.info(
"Search query=%r scope=%s results=%d fallback=%s latency_ms=%.1f",
query,
scope,
len(items),
fallback_used,
elapsed_ms,
)
return {
"items": items,
"total": len(items),
"query": query,
"fallback_used": fallback_used,
}
# ── Result enrichment ────────────────────────────────────────────────
async def _enrich_results(
self,
qdrant_results: list[dict[str, Any]],
db: AsyncSession,
) -> list[dict[str, Any]]:
"""Enrich Qdrant results with creator names and slugs from DB."""
enriched: list[dict[str, Any]] = []
# Collect creator_ids to batch-fetch
creator_ids = set()
for r in qdrant_results:
payload = r.get("payload", {})
cid = payload.get("creator_id")
if cid:
creator_ids.add(cid)
# Batch fetch creators
creator_map: dict[str, dict[str, str]] = {}
if creator_ids:
from sqlalchemy.dialects.postgresql import UUID as PgUUID
import uuid as uuid_mod
valid_ids = []
for cid in creator_ids:
try:
valid_ids.append(uuid_mod.UUID(cid))
except (ValueError, AttributeError):
pass
if valid_ids:
stmt = select(Creator).where(Creator.id.in_(valid_ids))
result = await db.execute(stmt)
for c in result.scalars().all():
creator_map[str(c.id)] = {"name": c.name, "slug": c.slug}
for r in qdrant_results:
payload = r.get("payload", {})
cid = payload.get("creator_id", "")
creator_info = creator_map.get(cid, {"name": "", "slug": ""})
enriched.append({
"type": payload.get("type", ""),
"title": payload.get("title", ""),
"slug": payload.get("slug", payload.get("title", "").lower().replace(" ", "-")),
"summary": payload.get("summary", ""),
"topic_category": payload.get("topic_category", ""),
"topic_tags": payload.get("topic_tags", []),
"creator_id": cid,
"creator_name": creator_info["name"],
"creator_slug": creator_info["slug"],
"score": r.get("score", 0.0),
})
return enriched