Commit graph

124 commits

Author SHA1 Message Date
jlightner
e8bc3fd9a2 feat(whisper): add batch_transcribe.py and document HAL0022 transcription setup
- batch_transcribe.py: recursive multi-creator transcription runner that
  walks nested subdirectories, attributes creator from top-level folder name,
  writes batch_manifest.json with timing and per-creator results
- README.md: updated with batch mode docs, HAL0022 environment details,
  transcript output location (C:\Users\jlightner\chrysopedia\transcripts),
  scheduled task usage, and transfer instructions for ub01 ingestion
2026-03-30 11:46:52 -05:00
jlightner
0484c15516 chore: auto-commit after complete-milestone
GSD-Unit: M006
2026-03-30 12:13:09 +00:00
jlightner
2e9ef20e24 feat: Updated Dockerfile.web and docker-compose.yml on ub01 to pass VIT…
- "docker/Dockerfile.web"
- "docker-compose.yml"

GSD-Task: S06/T02
2026-03-30 12:05:28 +00:00
jlightner
e6ce650487 feat: Added AppFooter component displaying app version, build date, com…
- "frontend/src/components/AppFooter.tsx"
- "frontend/vite.config.ts"
- "frontend/src/App.tsx"
- "frontend/src/App.css"
- "frontend/src/vite-env.d.ts"

GSD-Task: S06/T01
2026-03-30 12:00:58 +00:00
jlightner
75332343cb feat: Rewrote TopicsBrowse.tsx from vertical accordion to responsive 2-…
- "frontend/src/pages/TopicsBrowse.tsx"
- "frontend/src/App.css"

GSD-Task: S05/T02
2026-03-30 11:48:51 +00:00
jlightner
3f3fe065f8 feat: Added Music Theory as 7th category in canonical_tags.yaml with 8…
- "config/canonical_tags.yaml"
- "frontend/src/App.css"

GSD-Task: S05/T01
2026-03-30 11:44:18 +00:00
jlightner
61d52d719e feat: Reordered technique page sidebar (plugins first), added prominent…
- "frontend/src/pages/TechniquePage.tsx"
- "frontend/src/App.css"

GSD-Task: S04/T01
2026-03-30 11:34:14 +00:00
jlightner
b4d4caeda6 feat: Added "Commit" row to version metadata panel on TechniquePage — r…
- "frontend/src/pages/TechniquePage.tsx"

GSD-Task: S03/T02
2026-03-30 11:25:47 +00:00
jlightner
12f9fb7334 chore: Added GIT_COMMIT_SHA build arg to Dockerfile.api, compose build…
- "docker/Dockerfile.api"
- "docker-compose.yml"
- "backend/config.py"
- "backend/pipeline/stages.py"

GSD-Task: S03/T01
2026-03-30 11:24:34 +00:00
jlightner
ee24731e59 feat: Added Head/Tail segmented toggle to EventLog with order param wir…
- "frontend/src/api/public-client.ts"
- "frontend/src/pages/AdminPipeline.tsx"
- "frontend/src/App.css"

GSD-Task: S02/T02
2026-03-30 11:15:21 +00:00
jlightner
bf126f4825 feat: Added order query parameter (asc/desc, default desc) to pipelin…
- "backend/routers/pipeline.py"

GSD-Task: S02/T01
2026-03-30 11:10:44 +00:00
jlightner
05c7ba3ca2 feat: Created AdminDropdown component with click-outside/Escape close,…
- "frontend/src/components/AdminDropdown.tsx"
- "frontend/src/App.tsx"
- "frontend/src/App.css"

GSD-Task: S01/T01
2026-03-30 11:02:23 +00:00
jlightner
08d7d19d0e fix: Nginx resolver for Docker DNS — prevent stale upstream IPs
Use Docker embedded DNS (127.0.0.11) with 30s TTL and variable-based
proxy_pass so nginx re-resolves the API container IP after recreates
instead of caching the startup IP forever.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 05:55:42 -05:00
jlightner
c6f69019cf feat: Content hash dedup and prior-page versioning
- Add content_hash (SHA-256 of transcript text) to source_videos (migration 005)
- 3-tier duplicate detection at ingest: exact filename, content hash,
  then normalized filename + duration (handles yt-dlp re-downloads)
- Snapshot prior technique_page_ids to Redis before pipeline dispatch
- Stage 5 matches prior pages by creator+category before slug fallback,
  enabling version snapshots on reprocessing even when LLM generates
  different slugs
- Expose content_hash in API responses and admin pipeline dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 05:55:27 -05:00
jlightner
c6c15defee feat: Dynamic token estimation for per-stage max_tokens
- Add estimate_tokens() and estimate_max_tokens() to llm_client with
  stage-specific output ratios (0.3x segmentation, 1.2x extraction,
  0.15x classification, 1.5x synthesis)
- Add max_tokens override parameter to LLMClient.complete()
- Wire all 4 pipeline stages to estimate max_tokens from actual prompt
  content with 20% buffer and 2048 floor
- Add LLM_MAX_TOKENS_HARD_LIMIT=32768 config (dynamic estimator ceiling)
- Log token estimates alongside every LLM request

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 05:55:17 -05:00
jlightner
df33d15360 feat: Pipeline events, admin dashboard, and version switcher UI
- Add pipeline_events table (migration 004) for structured stage logging
- Add PipelineEvent model with token usage tracking
- Admin pipeline dashboard with video list, event log, worker status,
  trigger/revoke controls, and collapsible JSON payload viewer
- Version switcher on technique pages — view historical snapshots
  with pipeline metadata (model names, prompt hashes)
- Frontend types for pipeline admin and version APIs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 05:55:07 -05:00
jlightner
94460faf9d chore: auto-commit after complete-milestone
GSD-Unit: M005
2026-03-30 09:01:54 +00:00
jlightner
c6efec8363 feat: Split key moment card header into standalone h3 title and flex-ro…
- "frontend/src/pages/TechniquePage.tsx"
- "frontend/src/App.css"

GSD-Task: S03/T01
2026-03-30 08:55:48 +00:00
jlightner
aa71387ad5 feat: Added CSS grid layout splitting technique page into prose (left)…
- "frontend/src/App.css"
- "frontend/src/pages/TechniquePage.tsx"

GSD-Task: S02/T01
2026-03-30 08:47:55 +00:00
jlightner
26556ba03e feat: Built AdminPipeline.tsx page at /admin/pipeline with video table,…
- "frontend/src/pages/AdminPipeline.tsx"
- "frontend/src/api/public-client.ts"
- "frontend/src/App.tsx"
- "frontend/src/App.css"

GSD-Task: S01/T03
2026-03-30 08:35:11 +00:00
jlightner
b3d405bb84 fix: All five admin pipeline endpoints respond correctly — fix was ngin…
- "backend/routers/pipeline.py"

GSD-Task: S01/T02
2026-03-30 08:30:15 +00:00
jlightner
7aa33cd17f fix: Fixed syntax errors in pipeline event instrumentation — _emit_even…
- "backend/pipeline/stages.py"

GSD-Task: S01/T01
2026-03-30 08:27:53 +00:00
jlightner
b3204bece9 feat: Version switcher on technique pages — view historical snapshots with pipeline metadata
- Version dropdown appears when version_count > 0 (hidden until first re-run)
- Selecting a historical version overlays snapshot content (title, summary, body, chains, plugins)
- Key moments and related links always show live data (not versioned)
- Pipeline metadata block shows model, capture time, and prompt file hashes (truncated)
- Cyan banner when viewing historical version with "Back to current" button
- fetchTechniqueVersion API function for single version detail
2026-03-30 03:02:31 -05:00
jlightner
324e933670 feat: Content issue reporting — submit from technique pages, manage in admin reports page
- ContentReport model with generic content_type/content_id (supports any entity)
- Alembic migration 003: content_reports table with status + content indexes
- POST /reports (public), GET/PATCH /admin/reports (admin triage)
- Report modal on technique pages with issue type dropdown + description
- Admin reports page with status filter, expand/collapse detail, triage actions
- All CSS uses var(--*) tokens, dark theme consistent
2026-03-30 02:53:56 -05:00
jlightner
e08e8d021f fix: Creators page 422 — limit=200 exceeded API max of 100, also fix error display for Pydantic validation arrays 2026-03-30 02:37:37 -05:00
jlightner
ac45ce7313 chore: auto-commit after complete-milestone
GSD-Unit: M004
2026-03-30 07:27:40 +00:00
jlightner
8fb3f199dc feat: Added TypeScript version types, fetchTechniqueVersions function,…
- "frontend/src/api/public-client.ts"
- "frontend/src/pages/TechniquePage.tsx"

GSD-Task: S04/T03
2026-03-30 07:27:40 +00:00
jlightner
44fbbf030f test: Added version list/detail API endpoints, Pydantic schemas, versio…
- "backend/schemas.py"
- "backend/routers/techniques.py"
- "backend/tests/test_public_api.py"

GSD-Task: S04/T02
2026-03-30 07:27:40 +00:00
jlightner
5c3e9b83c8 feat: Added TechniquePageVersion model, Alembic migration 002, pipeline…
- "backend/models.py"
- "alembic/versions/002_technique_page_versions.py"
- "backend/pipeline/stages.py"

GSD-Task: S04/T01
2026-03-30 07:27:40 +00:00
jlightner
37426aae77 feat: Redesigned technique page frontend: meta stats line, video filena…
- "frontend/src/api/public-client.ts"
- "frontend/src/pages/TechniquePage.tsx"
- "frontend/src/App.css"

GSD-Task: S03/T02
2026-03-30 07:27:40 +00:00
jlightner
f99ac1b8b9 prompts: Rewrite all four pipeline stage prompts for quality and domain awareness
- Stage 2: Add domain context, granularity guidance, unstructured content handling
- Stage 3: Add extract/skip framework, summary quality standards, fewer-richer directive
- Stage 4: Add production-session classification principles, ambiguity resolution examples
- Stage 5: Add voice/tone guidance, anti-generic section names, signal chain detail, anti-filler rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 07:27:35 +00:00
jlightner
39006ca5b6 feat: redesign technique page - meta stats, video filenames, monospace signal chains 2026-03-30 06:54:11 +00:00
jlightner
0c4162a777 feat: Added video_filename field to KeyMomentSummary schema and populat…
- "backend/schemas.py"
- "backend/routers/techniques.py"

GSD-Task: S03/T01
2026-03-30 06:50:01 +00:00
jlightner
c575e76861 fix: Added overflow-x:hidden to html/body, fixed mobile overflow on mod…
- "frontend/src/App.css"
- "frontend/index.html"

GSD-Task: S02/T02
2026-03-30 06:40:58 +00:00
jlightner
893105abd0 feat: Replaced all 193 hex colors and 24 rgba values in App.css with 77…
- "frontend/src/App.css"

GSD-Task: S02/T01
2026-03-30 06:37:08 +00:00
jlightner
76138887d2 fix: Creators endpoint returns paginated response, review queue limit raised to 1000, added GET /review/moments/{id} endpoint
- Creators: response_model changed from list to {items, total, offset, limit} matching frontend CreatorBrowseResponse
- Review queue: limit raised from 100 to 1000
- New GET /review/moments/{moment_id} endpoint for direct moment fetch
- MomentDetail uses fetchMoment instead of fetching full queue
- Merge candidates fetch uses limit=100
2026-03-30 01:26:12 -05:00
jlightner
0b0ca598b4 feat: Log LLM response token usage (prompt/completion/total, content_len, finish_reason) 2026-03-30 06:15:24 +00:00
jlightner
17347da87e feat: Switch to FYN-LLM-Agent models — chat for stages 2/4, think for stages 3/5 2026-03-30 05:42:27 +00:00
jlightner
dfaf0481fe fix: Reduce Celery worker concurrency from 2 to 1 — concurrent LLM requests cause empty responses
Qwen 3.5 397B (quantized) returns empty content when handling two large-context
extraction requests simultaneously, likely due to vLLM memory pressure. Sequential
processing eliminates this failure mode.
2026-03-30 05:37:21 +00:00
jlightner
f67e676264 fix: Bump max_tokens to 65536 (model supports 94K context, extraction needs headroom) 2026-03-30 04:57:44 +00:00
jlightner
6fb497d03a chore: Bump LLM max_tokens to 32768, commit M002/M003 GSD artifacts
- max_tokens bumped from 16384 to 32768 (extraction responses still hitting limits)
- All GSD planning/completion artifacts for M002 (deployment) and M003 (DNS + LLM routing)
- KNOWLEDGE.md updated with XPLTD domain setup flow and container healthcheck patterns
- DECISIONS.md updated with D015 (subnet) and D016 (Ollama for embeddings)
2026-03-30 04:22:45 +00:00
jlightner
cf759f3739 fix: Add max_tokens=16384 to LLM requests (OpenWebUI defaults to 1000, truncating pipeline JSON) 2026-03-30 04:08:29 +00:00
jlightner
8e96fae64f fix: Set PROMPTS_PATH=/prompts in API and worker containers 2026-03-30 03:46:46 +00:00
jlightner
4aa4b08a7f feat: Per-stage LLM model routing with thinking modality and think-tag stripping
- Added 8 per-stage config fields: llm_stage{2-5}_model and llm_stage{2-5}_modality
- LLMClient.complete() accepts modality ('chat'/'thinking') and model_override
- Thinking modality: appends JSON instructions to system prompt, strips <think> tags
- strip_think_tags() handles multiline, multiple blocks, and edge cases
- Pipeline stages 2-5 read per-stage config and pass to LLM client
- Updated .env.example with per-stage model/modality documentation
- All 59 tests pass including new think-tag stripping test
2026-03-30 02:12:14 +00:00
jlightner
9fdef3b720 docs: Added CLAUDE.md redirect to ub01 canonical path, updated README with deployment section 2026-03-30 01:28:26 +00:00
jlightner
541354d89e fix: Worker healthcheck uses celery inspect ping instead of HTTP (no web server) 2026-03-30 01:25:24 +00:00
jlightner
1b4b803f6b fix: web healthcheck uses curl instead of wget (busybox wget fails) 2026-03-30 01:24:05 +00:00
jlightner
b49326147f fix: alembic env.py sys.path includes parent dir for Docker compatibility 2026-03-30 01:22:30 +00:00
jlightner
8dc4e9137d fix: Include alembic.ini and alembic/ in API Docker image for migrations 2026-03-30 01:21:41 +00:00
jlightner
7256fe7667 fix: Qdrant healthcheck uses bash /dev/tcp (no wget/curl in image) 2026-03-30 01:20:14 +00:00