Commit graph

98 commits

Author SHA1 Message Date
jlightner
0fc0df1d29 feat: Added GET /api/v1/creator/dashboard returning video_count, techni…
- "backend/routers/creator_dashboard.py"
- "backend/schemas.py"
- "backend/main.py"
- "alembic/versions/016_add_users_and_invite_codes.py"

GSD-Task: S02/T01
2026-04-04 00:09:19 +00:00
jlightner
87cb667848 test: Added GET /videos/{video_id} and GET /videos/{video_id}/transcrip…
- "backend/routers/videos.py"
- "backend/schemas.py"
- "backend/tests/test_video_detail.py"

GSD-Task: S01/T01
2026-04-03 23:42:43 +00:00
jlightner
dbc4afcf42 feat: Normalized /topics and /videos endpoints from bare lists to pagin…
- "backend/schemas.py"
- "backend/routers/topics.py"
- "backend/routers/videos.py"
- "frontend/src/api/topics.ts"
- "frontend/src/pages/TopicsBrowse.tsx"
- "frontend/src/pages/Home.tsx"

GSD-Task: S05/T03
2026-04-03 23:09:33 +00:00
jlightner
338be29e92 feat: Created reindex_lightrag.py that extracts technique pages from Po…
- "backend/scripts/reindex_lightrag.py"

GSD-Task: S04/T01
2026-04-03 22:37:30 +00:00
jlightner
bfb303860b test: Add 22 integration tests for consent endpoints covering auth, own…
- "backend/tests/test_consent.py"
- "backend/tests/conftest.py"

GSD-Task: S03/T03
2026-04-03 22:16:31 +00:00
jlightner
db135f738e feat: Added consent API router with 5 endpoints (list, get, upsert with…
- "backend/routers/consent.py"
- "backend/schemas.py"
- "backend/main.py"

GSD-Task: S03/T02
2026-04-03 22:11:36 +00:00
jlightner
8487af0282 feat: Added VideoConsent and ConsentAuditLog models with ConsentField e…
- "backend/models.py"
- "alembic/versions/017_add_consent_tables.py"

GSD-Task: S03/T01
2026-04-03 22:09:27 +00:00
jlightner
77f44b0b48 test: Implemented auth API router with register/login/me/update-profile…
- "backend/routers/auth.py"
- "backend/main.py"
- "backend/auth.py"
- "backend/requirements.txt"
- "backend/tests/conftest.py"
- "backend/tests/test_auth.py"

GSD-Task: S02/T02
2026-04-03 21:54:11 +00:00
jlightner
a06ea946b1 chore: Added User/InviteCode models, Alembic migration 016, auth utilit…
- "backend/models.py"
- "backend/auth.py"
- "backend/schemas.py"
- "backend/requirements.txt"
- "alembic/versions/016_add_users_and_invite_codes.py"

GSD-Task: S02/T01
2026-04-03 21:47:01 +00:00
jlightner
69335d8d6d chore: remove 2,367 lines of dead code — orphaned CSS, unused imports, stale files
Deleted files:
- generate_stage5_variants.py (874 lines) — superseded by pipeline.quality toolkit
- PROJECT_CONTEXT.md (461 lines) — stale, .gsd/PROJECT.md is the living doc
- CHRYSOPEDIA-ASSESSMENT.md (654 lines) — M011 triage artifact, all findings actioned

CSS cleanup (364 lines):
- 20 orphaned block groups from deleted review queue/old components
- Duplicate .btn base rule, .btn--warning, @keyframes stagePulse

Python imports:
- routers/pipeline.py: uuid, literal_column, over, text
- tests/test_pipeline.py: 9 unused imports (PropertyMock, create_engine, etc.)

Build verified: tsc --noEmit clean, npm run build clean (59 modules, 0 warnings).
2026-04-03 09:43:37 +00:00
jlightner
539274ce58 feat: Added summary, topic_tags, and key_moment_count fields to Creator…
- "backend/schemas.py"
- "backend/routers/creators.py"

GSD-Task: S03/T01
2026-04-03 09:07:34 +00:00
jlightner
cafbd0afb1 feat: Added moment_count field to CreatorDetail schema, router query (K…
- "backend/schemas.py"
- "backend/routers/creators.py"
- "frontend/src/api/public-client.ts"

GSD-Task: S02/T01
2026-04-03 08:58:05 +00:00
jlightner
906b6491fe fix: static 96k max_tokens for all pipeline stages — dynamic estimator was truncating thinking model output
The dynamic token estimator calculated max_tokens from input size × stage ratio,
which produced ~9k for stage 5 compose calls. Thinking models consume unpredictable
budget for internal reasoning, leaving 0 visible output tokens.

Changed: hard_limit 32768→96000, estimate_max_tokens now returns hard_limit directly.
2026-04-03 08:18:28 +00:00
jlightner
c3e5a8fe86 feat: add embed-status endpoint for per-video embedding/Qdrant detail
GET /admin/pipeline/embed-status/{video_id} returns technique pages
linked to the video, Qdrant vector count, and last stage 6 event —
provides data for the currently non-functional Embed tab in admin UI.
2026-04-03 06:24:58 +00:00
jlightner
7d4168c048 feat: Added .section-heading utility class to unify four heading styles…
- "frontend/src/App.css"
- "frontend/src/pages/Home.tsx"

GSD-Task: S06/T02
2026-04-03 06:22:11 +00:00
jlightner
a16559e668 fix: add /app to sys.path for Celery forked workers importing services.avatar 2026-04-03 05:58:14 +00:00
jlightner
44e5905bd7 feat: auto-avatar integration with TheAudioDB
- Added avatar_url, avatar_source, avatar_fetched_at columns to Creator
  model with Alembic migration 014
- New backend/services/avatar.py — TheAudioDB lookup with token-based
  name similarity scoring and genre overlap bonus
- New Celery task fetch_creator_avatar for background avatar fetching
- Admin endpoints: POST /creators/{id}/fetch-avatar (single) and
  POST /creators/fetch-all-avatars (batch for missing avatars)
- Wired avatar_url into CreatorRead, CreatorInfo, and CreatorBrowseItem
  schemas so all API responses include avatar data
2026-04-03 05:55:42 +00:00
jlightner
89ef2751fa feat: Added IntersectionObserver scroll-spy to ToC highlighting the act…
- "frontend/src/components/TableOfContents.tsx"
- "frontend/src/App.css"

GSD-Task: S04/T02
2026-04-03 05:54:14 +00:00
jlightner
0743d80b6a feat: Moved Table of Contents from main prose column to sidebar top; re…
- "frontend/src/pages/TechniquePage.tsx"
- "frontend/src/components/TableOfContents.tsx"
- "frontend/src/App.css"

GSD-Task: S04/T01
2026-04-03 05:52:47 +00:00
jlightner
61546bf25b perf: eliminate N+1 queries in stale-pages, add videos pagination, cache related techniques
- Rewrote stale-pages endpoint to use a single query with row_number
  window function instead of per-page queries for latest version + creator
- Added optional offset/limit/status/creator_id params to videos endpoint
  (backward compatible — defaults return all results)
- Added 1-hour Redis cache to _find_dynamic_related technique scoring
2026-04-03 05:50:53 +00:00
jlightner
094e832032 feat: Added favicon (SVG + 32px PNG), apple-touch-icon, OG social image…
- "frontend/public/favicon.svg"
- "frontend/public/favicon-32.png"
- "frontend/public/apple-touch-icon.png"
- "frontend/public/og-image.png"
- "frontend/index.html"

GSD-Task: S03/T01
2026-04-03 05:45:51 +00:00
jlightner
9f0b0922b0 feat: add GET /api/v1/stats endpoint with technique and creator counts 2026-04-03 04:24:58 +00:00
jlightner
8c81c472ea fix: pass last_technique_at through row unpacking 2026-04-03 04:15:39 +00:00
jlightner
acd0567e3c feat: add last_technique_at to creators API endpoint 2026-04-03 04:12:31 +00:00
jlightner
f64a0c1107 perf: Added SearchLog model, Alembic migration 013, Pydantic schemas, f…
- "backend/models.py"
- "backend/schemas.py"
- "backend/routers/search.py"
- "alembic/versions/013_add_search_log.py"

GSD-Task: S01/T01
2026-04-03 04:02:55 +00:00
jlightner
6940f172a3 fix: Serialize BodySection Pydantic models to dicts before JSONB storage
Stage 5 parses LLM output into list[BodySection] (Pydantic models) but
SQLAlchemy's JSONB column needs plain dicts. Added _serialize_body_sections()
helper that calls .model_dump() on each BodySection before DB write.
Fixes 'Object of type BodySection is not JSON serializable' errors.
2026-04-03 03:38:32 +00:00
jlightner
57b8705e26 feat: Added per-section embedding to stage 6 for v2 technique pages wit…
- "backend/schemas.py"
- "backend/pipeline/stages.py"
- "backend/pipeline/qdrant_client.py"
- "backend/search_service.py"
- "backend/pipeline/test_section_embedding.py"

GSD-Task: S07/T01
2026-04-03 02:12:56 +00:00
jlightner
495d1fa489 feat: Added paginated GET /admin/pipeline/technique-pages endpoint with…
- "backend/routers/pipeline.py"
- "backend/schemas.py"

GSD-Task: S06/T01
2026-04-03 01:55:35 +00:00
jlightner
7070ef3f51 test: Added 12 unit tests covering compose prompt construction, branchi…
- "backend/pipeline/test_compose_pipeline.py"

GSD-Task: S04/T02
2026-04-03 01:33:16 +00:00
jlightner
d709c9edce feat: Added _build_compose_user_prompt(), _compose_into_existing(), and…
- "backend/pipeline/stages.py"

GSD-Task: S04/T01
2026-04-03 01:29:21 +00:00
jlightner
dc18d0a543 feat: Wired source_videos and body_sections_format into technique detai…
- "backend/routers/techniques.py"

GSD-Task: S03/T02
2026-04-03 01:19:32 +00:00
jlightner
bd0dbb4df9 feat: Added body_sections_format column, technique_page_videos associat…
- "alembic/versions/012_multi_source_format.py"
- "backend/models.py"
- "backend/schemas.py"

GSD-Task: S03/T01
2026-04-03 01:16:31 +00:00
jlightner
5cd7db8938 test: 16 unit tests covering compose prompt XML structure, citation off…
- "backend/pipeline/test_harness_compose.py"
- ".gsd/milestones/M014/slices/S02/tasks/T03-SUMMARY.md"

GSD-Task: S02/T03
2026-04-03 01:08:41 +00:00
jlightner
efe6d7197c test: Added compose subcommand with build_compose_prompt(), run_compose…
- "backend/pipeline/test_harness.py"

GSD-Task: S02/T02
2026-04-03 01:05:25 +00:00
jlightner
44197f550c test: Updated test_harness.py word-count/section-count logic for list[B…
- "backend/pipeline/test_harness.py"
- "backend/pipeline/test_harness_v2_format.py"

GSD-Task: S01/T03
2026-04-03 00:54:27 +00:00
jlightner
15dcab201a test: Added BodySection/BodySubSection schema models, changed Synthesiz…
- "backend/pipeline/schemas.py"
- "backend/pipeline/citation_utils.py"
- "backend/pipeline/test_citation_utils.py"

GSD-Task: S01/T01
2026-04-03 00:50:30 +00:00
jlightner
293d1f4df4 feat: add wipe-all-output admin endpoint and UI button
Deletes all technique pages, versions, links, key moments, pipeline
events/runs, Qdrant vectors, and Redis cache while preserving creators,
videos, and transcript segments. Resets all video status to not_started.
Double-confirm dialog in the UI prevents accidental use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 22:17:48 +00:00
jlightner
f0d0b8ac1a feat: add pipeline iteration tooling — offline test harness, stage re-runs, chunking inspector
Drops prompt iteration cycles from 20-30 min to under 5 min by enabling
stage-isolated re-runs and offline prompt testing against exported fixtures.

Phase 1: Offline prompt test harness
- export_fixture.py: export stage 5 inputs from DB to reusable JSON fixtures
- test_harness.py: run synthesis offline with any prompt, no Docker needed
- promote subcommand: deploy winning prompts with backup and optional git commit

Phase 2: Classification data persistence
- Dual-write classification to PostgreSQL + Redis (fixes 24hr TTL data loss)
- Clean retrigger now clears Redis cache keys (fixes stale data bug)
- Alembic migration 011: classification_data JSONB column + stage_rerun enum

Phase 3: Stage-isolated re-run
- run_single_stage Celery task with prerequisite validation and prompt overrides
- _load_prompt supports per-video Redis overrides for testing custom prompts
- POST /admin/pipeline/rerun-stage/{video_id}/{stage_name} endpoint
- Frontend: Re-run Stage modal with stage selector and prompt override textarea

Phase 4: Chunking inspector
- GET /admin/pipeline/chunking/{video_id} returns topic boundaries,
  classifications, and synthesis group breakdowns
- Frontend: collapsible Chunking Inspector panel per video

Phase 5: Prompt deployment & stale data cleanup
- GET /admin/pipeline/stale-pages detects pages from older prompts
- POST /admin/pipeline/bulk-resynthesize re-runs a stage on all completed videos
- Frontend: stale pages indicator badge with one-click bulk re-synth

Phase 6: Automated iteration foundation
- Quality CLI --video-id flag auto-exports fixture from DB
- POST /admin/pipeline/optimize-prompt/{stage} dispatches optimization as Celery task

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:47:46 +00:00
jlightner
29f6e74b4f pipeline: run stages inline instead of Celery chain dispatch
Each video now completes all stages (2→6) before the worker picks up the
next queued video. Previously, dispatching celery_chain for multiple videos
caused interleaved execution — nothing finished until everything went through
all stages. Now run_pipeline calls each stage function synchronously within
the same worker task, so videos complete linearly and efficiently.
2026-04-01 11:39:21 +00:00
jlightner
d75ec80c98 optimize: Stage 5 synthesis prompt — round 0 winner (0.95→1.0 composite)
Applied first optimization result: tighter voice preservation instructions,
improved section flow guidance, trimmed redundant metadata instructions.
13382→11123 chars (-17%).
2026-04-01 10:15:24 +00:00
jlightner
18520f7936 feat: Generalized OptimizationLoop to stages 2-5 with per-stage fixture…
- "backend/pipeline/quality/optimizer.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/fixtures/sample_segments.json"
- "backend/pipeline/quality/fixtures/sample_topic_group.json"
- "backend/pipeline/quality/fixtures/sample_classifications.json"

GSD-Task: S04/T02
2026-04-01 09:24:42 +00:00
jlightner
e740798f7c feat: Added STAGE_CONFIGS registry (stages 2-5) with per-stage rubrics,…
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/variant_generator.py"

GSD-Task: S04/T01
2026-04-01 09:20:24 +00:00
jlightner
84e85a52b3 perf: Added optimize CLI subcommand with leaderboard table, ASCII traje…
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/results/.gitkeep"

GSD-Task: S03/T02
2026-04-01 09:10:42 +00:00
jlightner
c6cbb09dd3 feat: Created PromptVariantGenerator (LLM-powered prompt mutation) and…
- "backend/pipeline/quality/variant_generator.py"
- "backend/pipeline/quality/optimizer.py"

GSD-Task: S03/T01
2026-04-01 09:08:01 +00:00
jlightner
15a7afdaff feat: Added VoiceDial class with 3-band prompt modification and ScoreRu…
- "backend/pipeline/quality/voice_dial.py"
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/__main__.py"

GSD-Task: S02/T02
2026-04-01 08:57:07 +00:00
jlightner
5223772756 feat: Built ScoreRunner with 5-dimension LLM-as-judge scoring rubric, C…
- "backend/pipeline/quality/scorer.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/fixtures/sample_moments.json"
- "backend/pipeline/quality/fixtures/__init__.py"

GSD-Task: S02/T01
2026-04-01 08:53:40 +00:00
jlightner
c27cd77ae6 test: Built pipeline.quality package with FitnessRunner (9 tests, 4 cat…
- "backend/pipeline/quality/__init__.py"
- "backend/pipeline/quality/__main__.py"
- "backend/pipeline/quality/fitness.py"

GSD-Task: S01/T01
2026-04-01 08:45:05 +00:00
jlightner
fd1fd6c6f9 fix: Pipeline LLM audit — temperature=0, realistic token ratios, structured request_params
Audit findings & fixes:
- temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON
- llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768
- Output ratio estimates were 5-30x too high (based on actual pipeline data):
  stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8
- request_params now structured as api_params (what's sent to LLM) vs pipeline_config
  (internal estimator settings) — no more ambiguous 'hard_limit' in request params
- temperature=0.0 sent on both primary and fallback endpoints
2026-04-01 07:20:09 +00:00
jlightner
d58194ff96 feat: Store LLM request params (max_tokens, model, modality) in pipeline events
- _make_llm_callback now accepts request_params dict
- All 6 LLM call sites pass max_tokens, model_override, modality, response_model, hard_limit
- request_params stored in payload JSONB on every llm_call event (always, not just debug mode)
- Frontend JSON export includes full payload + request_params at top level
- DebugPayloadViewer shows 'Request Params' section even with debug mode off
- Answers whether max_tokens is actually being sent on pipeline requests
2026-04-01 07:01:57 +00:00
jlightner
a673e641b8 fix: Parallel search with match_context, deterministic Qdrant IDs, raised embedding timeout
- Search now runs semantic + keyword in parallel, merges and deduplicates
- Keyword results always included with match_context explaining WHY matched
- Semantic results filtered by minimum score threshold (0.45)
- match_context shows 'Creator: X', 'Tag: Y', 'Title match', 'Content: ...'
- Qdrant points use deterministic uuid5 IDs (no more duplicates on reindex)
- Embedding timeout raised from 300ms to 2s (Ollama needs it)
- _enrich_qdrant_results reads creator_name from payload before DB fallback
- Frontend displays match_context as highlighted bar on search result cards
2026-04-01 06:54:34 +00:00