chrysopedia

Author	SHA1	Message	Date
jlightner	db135f738e	feat: Added consent API router with 5 endpoints (list, get, upsert with… - "backend/routers/consent.py" - "backend/schemas.py" - "backend/main.py" GSD-Task: S03/T02	2026-04-03 22:11:36 +00:00
jlightner	8487af0282	feat: Added VideoConsent and ConsentAuditLog models with ConsentField e… - "backend/models.py" - "alembic/versions/017_add_consent_tables.py" GSD-Task: S03/T01	2026-04-03 22:09:27 +00:00
jlightner	77f44b0b48	test: Implemented auth API router with register/login/me/update-profile… - "backend/routers/auth.py" - "backend/main.py" - "backend/auth.py" - "backend/requirements.txt" - "backend/tests/conftest.py" - "backend/tests/test_auth.py" GSD-Task: S02/T02	2026-04-03 21:54:11 +00:00
jlightner	a06ea946b1	chore: Added User/InviteCode models, Alembic migration 016, auth utilit… - "backend/models.py" - "backend/auth.py" - "backend/schemas.py" - "backend/requirements.txt" - "alembic/versions/016_add_users_and_invite_codes.py" GSD-Task: S02/T01	2026-04-03 21:47:01 +00:00
jlightner	69335d8d6d	chore: remove 2,367 lines of dead code — orphaned CSS, unused imports, stale files Deleted files: - generate_stage5_variants.py (874 lines) — superseded by pipeline.quality toolkit - PROJECT_CONTEXT.md (461 lines) — stale, .gsd/PROJECT.md is the living doc - CHRYSOPEDIA-ASSESSMENT.md (654 lines) — M011 triage artifact, all findings actioned CSS cleanup (364 lines): - 20 orphaned block groups from deleted review queue/old components - Duplicate .btn base rule, .btn--warning, @keyframes stagePulse Python imports: - routers/pipeline.py: uuid, literal_column, over, text - tests/test_pipeline.py: 9 unused imports (PropertyMock, create_engine, etc.) Build verified: tsc --noEmit clean, npm run build clean (59 modules, 0 warnings).	2026-04-03 09:43:37 +00:00
jlightner	539274ce58	feat: Added summary, topic_tags, and key_moment_count fields to Creator… - "backend/schemas.py" - "backend/routers/creators.py" GSD-Task: S03/T01	2026-04-03 09:07:34 +00:00
jlightner	cafbd0afb1	feat: Added moment_count field to CreatorDetail schema, router query (K… - "backend/schemas.py" - "backend/routers/creators.py" - "frontend/src/api/public-client.ts" GSD-Task: S02/T01	2026-04-03 08:58:05 +00:00
jlightner	906b6491fe	fix: static 96k max_tokens for all pipeline stages — dynamic estimator was truncating thinking model output The dynamic token estimator calculated max_tokens from input size × stage ratio, which produced ~9k for stage 5 compose calls. Thinking models consume unpredictable budget for internal reasoning, leaving 0 visible output tokens. Changed: hard_limit 32768→96000, estimate_max_tokens now returns hard_limit directly.	2026-04-03 08:18:28 +00:00
jlightner	c3e5a8fe86	feat: add embed-status endpoint for per-video embedding/Qdrant detail GET /admin/pipeline/embed-status/{video_id} returns technique pages linked to the video, Qdrant vector count, and last stage 6 event — provides data for the currently non-functional Embed tab in admin UI.	2026-04-03 06:24:58 +00:00
jlightner	7d4168c048	feat: Added .section-heading utility class to unify four heading styles… - "frontend/src/App.css" - "frontend/src/pages/Home.tsx" GSD-Task: S06/T02	2026-04-03 06:22:11 +00:00
jlightner	a16559e668	fix: add /app to sys.path for Celery forked workers importing services.avatar	2026-04-03 05:58:14 +00:00
jlightner	44e5905bd7	feat: auto-avatar integration with TheAudioDB - Added avatar_url, avatar_source, avatar_fetched_at columns to Creator model with Alembic migration 014 - New backend/services/avatar.py — TheAudioDB lookup with token-based name similarity scoring and genre overlap bonus - New Celery task fetch_creator_avatar for background avatar fetching - Admin endpoints: POST /creators/{id}/fetch-avatar (single) and POST /creators/fetch-all-avatars (batch for missing avatars) - Wired avatar_url into CreatorRead, CreatorInfo, and CreatorBrowseItem schemas so all API responses include avatar data	2026-04-03 05:55:42 +00:00
jlightner	89ef2751fa	feat: Added IntersectionObserver scroll-spy to ToC highlighting the act… - "frontend/src/components/TableOfContents.tsx" - "frontend/src/App.css" GSD-Task: S04/T02	2026-04-03 05:54:14 +00:00
jlightner	0743d80b6a	feat: Moved Table of Contents from main prose column to sidebar top; re… - "frontend/src/pages/TechniquePage.tsx" - "frontend/src/components/TableOfContents.tsx" - "frontend/src/App.css" GSD-Task: S04/T01	2026-04-03 05:52:47 +00:00
jlightner	61546bf25b	perf: eliminate N+1 queries in stale-pages, add videos pagination, cache related techniques - Rewrote stale-pages endpoint to use a single query with row_number window function instead of per-page queries for latest version + creator - Added optional offset/limit/status/creator_id params to videos endpoint (backward compatible — defaults return all results) - Added 1-hour Redis cache to _find_dynamic_related technique scoring	2026-04-03 05:50:53 +00:00
jlightner	094e832032	feat: Added favicon (SVG + 32px PNG), apple-touch-icon, OG social image… - "frontend/public/favicon.svg" - "frontend/public/favicon-32.png" - "frontend/public/apple-touch-icon.png" - "frontend/public/og-image.png" - "frontend/index.html" GSD-Task: S03/T01	2026-04-03 05:45:51 +00:00
jlightner	9f0b0922b0	feat: add GET /api/v1/stats endpoint with technique and creator counts	2026-04-03 04:24:58 +00:00
jlightner	8c81c472ea	fix: pass last_technique_at through row unpacking	2026-04-03 04:15:39 +00:00
jlightner	acd0567e3c	feat: add last_technique_at to creators API endpoint	2026-04-03 04:12:31 +00:00
jlightner	f64a0c1107	perf: Added SearchLog model, Alembic migration 013, Pydantic schemas, f… - "backend/models.py" - "backend/schemas.py" - "backend/routers/search.py" - "alembic/versions/013_add_search_log.py" GSD-Task: S01/T01	2026-04-03 04:02:55 +00:00
jlightner	6940f172a3	fix: Serialize BodySection Pydantic models to dicts before JSONB storage Stage 5 parses LLM output into list[BodySection] (Pydantic models) but SQLAlchemy's JSONB column needs plain dicts. Added _serialize_body_sections() helper that calls .model_dump() on each BodySection before DB write. Fixes 'Object of type BodySection is not JSON serializable' errors.	2026-04-03 03:38:32 +00:00
jlightner	57b8705e26	feat: Added per-section embedding to stage 6 for v2 technique pages wit… - "backend/schemas.py" - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/search_service.py" - "backend/pipeline/test_section_embedding.py" GSD-Task: S07/T01	2026-04-03 02:12:56 +00:00
jlightner	495d1fa489	feat: Added paginated GET /admin/pipeline/technique-pages endpoint with… - "backend/routers/pipeline.py" - "backend/schemas.py" GSD-Task: S06/T01	2026-04-03 01:55:35 +00:00
jlightner	7070ef3f51	test: Added 12 unit tests covering compose prompt construction, branchi… - "backend/pipeline/test_compose_pipeline.py" GSD-Task: S04/T02	2026-04-03 01:33:16 +00:00
jlightner	d709c9edce	feat: Added _build_compose_user_prompt(), _compose_into_existing(), and… - "backend/pipeline/stages.py" GSD-Task: S04/T01	2026-04-03 01:29:21 +00:00
jlightner	dc18d0a543	feat: Wired source_videos and body_sections_format into technique detai… - "backend/routers/techniques.py" GSD-Task: S03/T02	2026-04-03 01:19:32 +00:00
jlightner	bd0dbb4df9	feat: Added body_sections_format column, technique_page_videos associat… - "alembic/versions/012_multi_source_format.py" - "backend/models.py" - "backend/schemas.py" GSD-Task: S03/T01	2026-04-03 01:16:31 +00:00
jlightner	5cd7db8938	test: 16 unit tests covering compose prompt XML structure, citation off… - "backend/pipeline/test_harness_compose.py" - ".gsd/milestones/M014/slices/S02/tasks/T03-SUMMARY.md" GSD-Task: S02/T03	2026-04-03 01:08:41 +00:00
jlightner	efe6d7197c	test: Added compose subcommand with build_compose_prompt(), run_compose… - "backend/pipeline/test_harness.py" GSD-Task: S02/T02	2026-04-03 01:05:25 +00:00
jlightner	44197f550c	test: Updated test_harness.py word-count/section-count logic for list[B… - "backend/pipeline/test_harness.py" - "backend/pipeline/test_harness_v2_format.py" GSD-Task: S01/T03	2026-04-03 00:54:27 +00:00
jlightner	15dcab201a	test: Added BodySection/BodySubSection schema models, changed Synthesiz… - "backend/pipeline/schemas.py" - "backend/pipeline/citation_utils.py" - "backend/pipeline/test_citation_utils.py" GSD-Task: S01/T01	2026-04-03 00:50:30 +00:00
jlightner	293d1f4df4	feat: add wipe-all-output admin endpoint and UI button Deletes all technique pages, versions, links, key moments, pipeline events/runs, Qdrant vectors, and Redis cache while preserving creators, videos, and transcript segments. Resets all video status to not_started. Double-confirm dialog in the UI prevents accidental use. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 22:17:48 +00:00
jlightner	f0d0b8ac1a	feat: add pipeline iteration tooling — offline test harness, stage re-runs, chunking inspector Drops prompt iteration cycles from 20-30 min to under 5 min by enabling stage-isolated re-runs and offline prompt testing against exported fixtures. Phase 1: Offline prompt test harness - export_fixture.py: export stage 5 inputs from DB to reusable JSON fixtures - test_harness.py: run synthesis offline with any prompt, no Docker needed - promote subcommand: deploy winning prompts with backup and optional git commit Phase 2: Classification data persistence - Dual-write classification to PostgreSQL + Redis (fixes 24hr TTL data loss) - Clean retrigger now clears Redis cache keys (fixes stale data bug) - Alembic migration 011: classification_data JSONB column + stage_rerun enum Phase 3: Stage-isolated re-run - run_single_stage Celery task with prerequisite validation and prompt overrides - _load_prompt supports per-video Redis overrides for testing custom prompts - POST /admin/pipeline/rerun-stage/{video_id}/{stage_name} endpoint - Frontend: Re-run Stage modal with stage selector and prompt override textarea Phase 4: Chunking inspector - GET /admin/pipeline/chunking/{video_id} returns topic boundaries, classifications, and synthesis group breakdowns - Frontend: collapsible Chunking Inspector panel per video Phase 5: Prompt deployment & stale data cleanup - GET /admin/pipeline/stale-pages detects pages from older prompts - POST /admin/pipeline/bulk-resynthesize re-runs a stage on all completed videos - Frontend: stale pages indicator badge with one-click bulk re-synth Phase 6: Automated iteration foundation - Quality CLI --video-id flag auto-exports fixture from DB - POST /admin/pipeline/optimize-prompt/{stage} dispatches optimization as Celery task Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 15:47:46 +00:00
jlightner	29f6e74b4f	pipeline: run stages inline instead of Celery chain dispatch Each video now completes all stages (2→6) before the worker picks up the next queued video. Previously, dispatching celery_chain for multiple videos caused interleaved execution — nothing finished until everything went through all stages. Now run_pipeline calls each stage function synchronously within the same worker task, so videos complete linearly and efficiently.	2026-04-01 11:39:21 +00:00
jlightner	d75ec80c98	optimize: Stage 5 synthesis prompt — round 0 winner (0.95→1.0 composite) Applied first optimization result: tighter voice preservation instructions, improved section flow guidance, trimmed redundant metadata instructions. 13382→11123 chars (-17%).	2026-04-01 10:15:24 +00:00
jlightner	18520f7936	feat: Generalized OptimizationLoop to stages 2-5 with per-stage fixture… - "backend/pipeline/quality/optimizer.py" - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/fixtures/sample_segments.json" - "backend/pipeline/quality/fixtures/sample_topic_group.json" - "backend/pipeline/quality/fixtures/sample_classifications.json" GSD-Task: S04/T02	2026-04-01 09:24:42 +00:00
jlightner	e740798f7c	feat: Added STAGE_CONFIGS registry (stages 2-5) with per-stage rubrics,… - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/variant_generator.py" GSD-Task: S04/T01	2026-04-01 09:20:24 +00:00
jlightner	84e85a52b3	perf: Added optimize CLI subcommand with leaderboard table, ASCII traje… - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/results/.gitkeep" GSD-Task: S03/T02	2026-04-01 09:10:42 +00:00
jlightner	c6cbb09dd3	feat: Created PromptVariantGenerator (LLM-powered prompt mutation) and… - "backend/pipeline/quality/variant_generator.py" - "backend/pipeline/quality/optimizer.py" GSD-Task: S03/T01	2026-04-01 09:08:01 +00:00
jlightner	15a7afdaff	feat: Added VoiceDial class with 3-band prompt modification and ScoreRu… - "backend/pipeline/quality/voice_dial.py" - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/__main__.py" GSD-Task: S02/T02	2026-04-01 08:57:07 +00:00
jlightner	5223772756	feat: Built ScoreRunner with 5-dimension LLM-as-judge scoring rubric, C… - "backend/pipeline/quality/scorer.py" - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/fixtures/sample_moments.json" - "backend/pipeline/quality/fixtures/__init__.py" GSD-Task: S02/T01	2026-04-01 08:53:40 +00:00
jlightner	c27cd77ae6	test: Built pipeline.quality package with FitnessRunner (9 tests, 4 cat… - "backend/pipeline/quality/__init__.py" - "backend/pipeline/quality/__main__.py" - "backend/pipeline/quality/fitness.py" GSD-Task: S01/T01	2026-04-01 08:45:05 +00:00
jlightner	fd1fd6c6f9	fix: Pipeline LLM audit — temperature=0, realistic token ratios, structured request_params Audit findings & fixes: - temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON - llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768 - Output ratio estimates were 5-30x too high (based on actual pipeline data): stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8 - request_params now structured as api_params (what's sent to LLM) vs pipeline_config (internal estimator settings) — no more ambiguous 'hard_limit' in request params - temperature=0.0 sent on both primary and fallback endpoints	2026-04-01 07:20:09 +00:00
jlightner	d58194ff96	feat: Store LLM request params (max_tokens, model, modality) in pipeline events - _make_llm_callback now accepts request_params dict - All 6 LLM call sites pass max_tokens, model_override, modality, response_model, hard_limit - request_params stored in payload JSONB on every llm_call event (always, not just debug mode) - Frontend JSON export includes full payload + request_params at top level - DebugPayloadViewer shows 'Request Params' section even with debug mode off - Answers whether max_tokens is actually being sent on pipeline requests	2026-04-01 07:01:57 +00:00
jlightner	a673e641b8	fix: Parallel search with match_context, deterministic Qdrant IDs, raised embedding timeout - Search now runs semantic + keyword in parallel, merges and deduplicates - Keyword results always included with match_context explaining WHY matched - Semantic results filtered by minimum score threshold (0.45) - match_context shows 'Creator: X', 'Tag: Y', 'Title match', 'Content: ...' - Qdrant points use deterministic uuid5 IDs (no more duplicates on reindex) - Embedding timeout raised from 300ms to 2s (Ollama needs it) - _enrich_qdrant_results reads creator_name from payload before DB fallback - Frontend displays match_context as highlighted bar on search result cards	2026-04-01 06:54:34 +00:00
jlightner	8272da430b	fix: Variable ordering bug and stage 5 truncation recovery Two fixes: 1. page_moment_indices was referenced before assignment in the page persist loop — moved assignment to top of loop body. This caused "cannot access local variable" errors on every stage 5 run. 2. Stage 5 now catches LLMTruncationError and splits the chunk in half for retry, instead of blindly retrying the same oversized prompt. This handles categories where synthesis output exceeds the model context window. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 01:51:28 -05:00
jlightner	e0c73db8ff	feat: Added sort query parameter (relevance/newest/oldest/alpha/creator… - "backend/routers/search.py" - "backend/routers/topics.py" - "backend/routers/techniques.py" - "backend/search_service.py" GSD-Task: S02/T01	2026-04-01 06:41:52 +00:00
jlightner	fa82f1079a	feat: Enriched Qdrant embedding text with creator_name/tags and added r… - "backend/pipeline/stages.py" - "backend/pipeline/qdrant_client.py" - "backend/routers/pipeline.py" GSD-Task: S01/T02	2026-04-01 06:41:52 +00:00
jlightner	84e7a9906c	feat: Refactored keyword_search to multi-token AND with cross-field mat… - "backend/search_service.py" - "backend/schemas.py" - "backend/routers/search.py" - "backend/tests/test_search.py" GSD-Task: S01/T01	2026-04-01 06:41:52 +00:00
jlightner	c344b8c670	fix: Moment-to-page linking via moment_indices in stage 5 synthesis When the LLM splits a category group into multiple technique pages, moments were blanket-linked to the last page in the loop, leaving all other pages as orphans with 0 key moments (48 out of 204 pages affected). Added moment_indices field to SynthesizedPage schema and synthesis prompt so the LLM explicitly declares which input moments each page covers. Stage 5 now uses these indices for targeted linking instead of the broken blanket approach. Tags are also computed per-page from linked moments only, fixing cross-contamination (e.g. "stereo imaging" tag appearing on gain staging pages). Deleted 48 orphan technique pages from the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:34:37 -05:00

1 2

93 commits