Commit graph

141 commits

Author SHA1 Message Date
jlightner
c1583820ea feat: Add context labels to multi-call pipeline stages
Stage 3 (extraction) LLM calls now show the topic group label (e.g.,
'Sound Design Basics') and Stage 5 (synthesis) calls show the category
name. Displayed as a cyan italic label in the event row between the
event type badge and model name. Helps admins understand why there are
multiple LLM calls per stage.
2026-03-31 17:27:40 +00:00
jlightner
c2db9aa011 feat: Pipeline runs — per-execution tracking with run-scoped events
Data model:
- New pipeline_runs table (id, video_id, run_number, trigger, status,
  started_at, finished_at, error_stage, total_tokens)
- pipeline_events gains run_id FK (nullable for backward compat)
- Alembic migration 010_add_pipeline_runs

Backend:
- run_pipeline() creates a PipelineRun, threads run_id through all stages
- _emit_event() and _make_llm_callback() accept and store run_id
- Stage 6 (final) calls _finish_run() to mark complete with token totals
- mark_pipeline_error marks run as error
- Revoke marks running runs as cancelled
- Trigger endpoints pass trigger type (manual, clean_reprocess)
- New GET /admin/pipeline/runs/{video_id} — lists runs with event counts
- GET /admin/pipeline/events supports ?run_id= filter

Frontend:
- Expanded video detail now shows RunList instead of flat EventLog
- Each run is a collapsible card showing: run number, trigger type,
  status badge, timestamps, token count, event count
- Latest run auto-expands, older runs collapsed
- Legacy events (pre-run-tracking) shown as separate collapsible section
- Run cards color-coded: cyan border for running, red for error,
  gray for cancelled
- EventLog accepts optional runId prop to scope events to a single run
2026-03-31 17:13:41 +00:00
jlightner
cd3b57a156 fix: Clean retrigger preserves transcript_segments (pipeline input data)
Deleting transcript_segments left the pipeline with nothing to process —
all stages would skip immediately. Segments come from the ingest step,
not from pipeline stages 2-6. Only pipeline_events and key_moments
(pipeline output) are deleted during clean reprocess.
2026-03-31 16:32:25 +00:00
jlightner
b0ad4c2dfc feat: Add real-time pipeline visibility — auto-refresh, stage timeline, activity feed, bulk log
- Backend: Video list now includes active_stage, active_stage_status, and
  stage_started_at fields via DISTINCT ON subquery
- Backend: New GET /admin/pipeline/recent-activity endpoint returns
  latest stage completions/errors with video context
- Frontend: 15-second auto-refresh with change detection — video rows
  flash when status changes
- Frontend: Stage timeline dots on processing/complete/error videos
  showing progress through stages 2-5, active stage pulses
- Frontend: Collapsible Recent Activity feed at top showing last 8
  stage completions/errors with duration and creator
- Frontend: Bulk operation scrollable log showing per-video results
  as they complete
- Frontend: Auto-refresh checkbox toggle in header
2026-03-31 16:12:57 +00:00
jlightner
e17132bd60 feat: Add bulk pipeline reprocessing — creator filter, multi-select, clean retrigger
- Backend: POST /admin/pipeline/clean-retrigger/{video_id} endpoint that
  deletes pipeline_events, key_moments, transcript_segments, and Qdrant
  vectors before retriggering the pipeline
- Backend: QdrantManager.delete_by_video_id() for vector cleanup
- Frontend: Creator filter dropdown on pipeline admin page
- Frontend: Checkbox selection column with select-all
- Frontend: Bulk toolbar with Retrigger Selected and Clean Reprocess
  actions, sequential dispatch with progress bar, cancel support
- Bulk dispatch uses 500ms delay between requests to avoid slamming API
2026-03-31 15:24:59 +00:00
jlightner
717f6c0785 feat: Added GET /api/v1/techniques/random endpoint returning {slug}, fe…
- "backend/routers/techniques.py"
- "frontend/src/api/public-client.ts"
- "frontend/src/pages/Home.tsx"
- "frontend/src/App.css"

GSD-Task: S01/T02
2026-03-31 08:24:38 +00:00
jlightner
1254e173d4 test: Added GET /api/v1/search/suggestions endpoint returning popular t…
- "backend/schemas.py"
- "backend/routers/search.py"
- "backend/tests/test_search.py"

GSD-Task: S04/T01
2026-03-31 06:35:37 +00:00
jlightner
5d0fd05b98 feat: Added scored dynamic related-techniques query returning up to 4 r…
- "backend/schemas.py"
- "backend/routers/techniques.py"
- "backend/tests/test_public_api.py"

GSD-Task: S02/T01
2026-03-31 06:13:59 +00:00
jlightner
8661549ab1 test: Added GET /topics/{category_slug}/{subtopic_slug} endpoint filter…
- "backend/routers/topics.py"
- "backend/tests/test_public_api.py"

GSD-Task: S01/T01
2026-03-31 05:59:36 +00:00
jlightner
0b27e5752e feat: Added sort=random|recent query param to list_techniques endpoint…
- "backend/routers/techniques.py"
- "frontend/src/api/public-client.ts"

GSD-Task: S03/T01
2026-03-31 05:46:31 +00:00
jlightner
95b11ae5bc feat: Added key_moment_count correlated subquery to technique list API…
- "backend/schemas.py"
- "backend/routers/techniques.py"
- "frontend/src/api/public-client.ts"
- "frontend/src/pages/Home.tsx"
- "frontend/src/App.css"

GSD-Task: S03/T01
2026-03-31 05:23:37 +00:00
jlightner
127919565a feat: Added hidden boolean column to Creator model, migration marking T…
- "backend/models.py"
- "backend/routers/creators.py"
- "alembic/versions/009_add_creator_hidden_flag.py"

GSD-Task: S02/T01
2026-03-31 05:13:17 +00:00
jlightner
af250a6f5d feat: Added technique_page_slug to search results across Qdrant payload…
- "backend/schemas.py"
- "backend/search_service.py"
- "backend/pipeline/stages.py"
- "backend/pipeline/qdrant_client.py"
- "backend/tests/test_search.py"

GSD-Task: S01/T01
2026-03-31 05:02:48 +00:00
jlightner
720c2f501f feat: meaningful pipeline status lifecycle — Not Started → Queued → In Progress → Complete/Errored
Replace stage-level statuses (pending/transcribed/extracted/published) with
user-meaningful lifecycle states (not_started/queued/processing/error/complete).

Backend:
- ProcessingStatus enum: not_started, queued, processing, error, complete
- run_pipeline sets 'processing' before dispatching Celery chain
- stage5 sets 'complete' (was 'published')
- stage3 no longer sets intermediate status (stays 'processing')
- New mark_pipeline_error task wired as link_error on chain
- _set_error_status helper marks video on permanent failure
- Ingest sets 'queued' (was 'transcribed')
- Migration 008 renames all existing values

Frontend:
- StatusFilter shows fixed-order lifecycle tabs: Not Started | Queued | In Progress | Errored | Complete
- Per-video badges show friendly labels instead of raw enum values
- Badge colors mapped to new statuses
2026-03-31 02:43:49 +00:00
jlightner
52e7e3bbc2 feat: remove review workflow — unused gate that blocked nothing
773 key moments sat at 'pending' with 0 approved/edited/rejected.
review_status was never checked by any public-facing query — all content
was always visible regardless of review state.

Removed:
- backend/routers/review.py (10 endpoints)
- backend/tests/test_review.py
- frontend ReviewQueue, MomentDetail pages
- frontend client.ts (review-only API client)
- frontend ModeToggle, StatusBadge components
- Review link from AdminDropdown, Moments link from pipeline rows
- ReviewStatus, PageReviewStatus enums from models
- review_mode config flag
- review_status columns (migration 007)
- ~80 lines of mode-toggle CSS

Pipeline now always sets processing_status to 'published'.
Migration 007 drops columns, enums, and migrates 'reviewed' → 'published'.
2026-03-31 02:34:12 +00:00
jlightner
4b0914b12b fix: restore complete project tree from ub01 canonical state
Auto-mode commit 7aa33cd accidentally deleted 78 files (14,814 lines) during M005
execution. Subsequent commits rebuilt some frontend files but backend/, alembic/,
tests/, whisper/, docker configs, and prompts were never restored in this repo.

This commit restores the full project tree by syncing from ub01's working directory,
which has all M001-M007 features running in production containers.

Restored: backend/ (config, models, routers, database, redis, search_service, worker),
alembic/ (6 migrations), docker/ (Dockerfiles, nginx, compose), prompts/ (4 stages),
tests/, whisper/, README.md, .env.example, chrysopedia-spec.md
2026-03-31 02:10:41 +00:00
jlightner
5e408dff5a feat: Built backend/watcher.py with PollingObserver-based folder watchi…
- "backend/watcher.py"
- "backend/requirements.txt"

GSD-Task: S03/T01
2026-03-30 19:17:47 +00:00
jlightner
7aa33cd17f fix: Fixed syntax errors in pipeline event instrumentation — _emit_even…
- "backend/pipeline/stages.py"

GSD-Task: S01/T01
2026-03-30 08:27:53 +00:00
jlightner
44fbbf030f test: Added version list/detail API endpoints, Pydantic schemas, versio…
- "backend/schemas.py"
- "backend/routers/techniques.py"
- "backend/tests/test_public_api.py"

GSD-Task: S04/T02
2026-03-30 07:27:40 +00:00
jlightner
5c3e9b83c8 feat: Added TechniquePageVersion model, Alembic migration 002, pipeline…
- "backend/models.py"
- "alembic/versions/002_technique_page_versions.py"
- "backend/pipeline/stages.py"

GSD-Task: S04/T01
2026-03-30 07:27:40 +00:00
jlightner
0c4162a777 feat: Added video_filename field to KeyMomentSummary schema and populat…
- "backend/schemas.py"
- "backend/routers/techniques.py"

GSD-Task: S03/T01
2026-03-30 06:50:01 +00:00
jlightner
76138887d2 fix: Creators endpoint returns paginated response, review queue limit raised to 1000, added GET /review/moments/{id} endpoint
- Creators: response_model changed from list to {items, total, offset, limit} matching frontend CreatorBrowseResponse
- Review queue: limit raised from 100 to 1000
- New GET /review/moments/{moment_id} endpoint for direct moment fetch
- MomentDetail uses fetchMoment instead of fetching full queue
- Merge candidates fetch uses limit=100
2026-03-30 01:26:12 -05:00
jlightner
0b0ca598b4 feat: Log LLM response token usage (prompt/completion/total, content_len, finish_reason) 2026-03-30 06:15:24 +00:00
jlightner
17347da87e feat: Switch to FYN-LLM-Agent models — chat for stages 2/4, think for stages 3/5 2026-03-30 05:42:27 +00:00
jlightner
f67e676264 fix: Bump max_tokens to 65536 (model supports 94K context, extraction needs headroom) 2026-03-30 04:57:44 +00:00
jlightner
6fb497d03a chore: Bump LLM max_tokens to 32768, commit M002/M003 GSD artifacts
- max_tokens bumped from 16384 to 32768 (extraction responses still hitting limits)
- All GSD planning/completion artifacts for M002 (deployment) and M003 (DNS + LLM routing)
- KNOWLEDGE.md updated with XPLTD domain setup flow and container healthcheck patterns
- DECISIONS.md updated with D015 (subnet) and D016 (Ollama for embeddings)
2026-03-30 04:22:45 +00:00
jlightner
cf759f3739 fix: Add max_tokens=16384 to LLM requests (OpenWebUI defaults to 1000, truncating pipeline JSON) 2026-03-30 04:08:29 +00:00
jlightner
4aa4b08a7f feat: Per-stage LLM model routing with thinking modality and think-tag stripping
- Added 8 per-stage config fields: llm_stage{2-5}_model and llm_stage{2-5}_modality
- LLMClient.complete() accepts modality ('chat'/'thinking') and model_override
- Thinking modality: appends JSON instructions to system prompt, strips <think> tags
- strip_think_tags() handles multiline, multiple blocks, and edge cases
- Pipeline stages 2-5 read per-stage config and pass to LLM client
- Updated .env.example with per-stage model/modality documentation
- All 59 tests pass including new think-tag stripping test
2026-03-30 02:12:14 +00:00
jlightner
5b8be50994 test: Added 18 integration tests for search and public API endpoints (t…
- "backend/tests/test_search.py"
- "backend/tests/test_public_api.py"

GSD-Task: S05/T02
2026-03-30 00:01:32 +00:00
jlightner
c0df369018 feat: Created async search service with embedding+Qdrant+keyword fallba…
- "backend/search_service.py"
- "backend/schemas.py"
- "backend/routers/search.py"
- "backend/routers/techniques.py"
- "backend/routers/topics.py"
- "backend/routers/creators.py"
- "backend/main.py"

GSD-Task: S05/T01
2026-03-29 23:55:52 +00:00
jlightner
c2edba952c test: Built 9 review queue API endpoints (queue, stats, approve, reject…
- "backend/routers/review.py"
- "backend/schemas.py"
- "backend/redis_client.py"
- "backend/main.py"
- "backend/tests/test_review.py"

GSD-Task: S04/T01
2026-03-29 23:13:43 +00:00
jlightner
2cb10b5db8 test: Added 10 integration tests covering pipeline stages 2-6, trigger…
- "backend/tests/test_pipeline.py"
- "backend/tests/fixtures/mock_llm_responses.py"
- "backend/tests/conftest.py"

GSD-Task: S03/T05
2026-03-29 22:51:26 +00:00
jlightner
910e945d9c feat: Wired automatic run_pipeline.delay() dispatch after ingest commit…
- "backend/routers/pipeline.py"
- "backend/routers/ingest.py"
- "backend/main.py"

GSD-Task: S03/T04
2026-03-29 22:41:02 +00:00
jlightner
5c46d1e922 feat: Created sync EmbeddingClient, QdrantManager with idempotent colle…
- "backend/pipeline/embedding_client.py"
- "backend/pipeline/qdrant_client.py"
- "backend/pipeline/stages.py"

GSD-Task: S03/T03
2026-03-29 22:39:04 +00:00
jlightner
b5635a09db feat: Created 4 prompt templates and implemented 5 Celery tasks (stages…
- "prompts/stage2_segmentation.txt"
- "prompts/stage3_extraction.txt"
- "prompts/stage4_classification.txt"
- "prompts/stage5_synthesis.txt"
- "backend/pipeline/stages.py"
- "backend/requirements.txt"

GSD-Task: S03/T02
2026-03-29 22:36:06 +00:00
jlightner
12cc86aef9 chore: Extended Settings with 12 LLM/embedding/Qdrant config fields, cr…
- "backend/config.py"
- "backend/worker.py"
- "backend/pipeline/schemas.py"
- "backend/pipeline/llm_client.py"
- "backend/requirements.txt"
- "backend/pipeline/__init__.py"
- "backend/pipeline/stages.py"

GSD-Task: S03/T01
2026-03-29 22:30:31 +00:00
jlightner
bef8d95e64 test: Added 6 integration tests proving ingestion, creator auto-detecti…
- "backend/tests/conftest.py"
- "backend/tests/test_ingest.py"
- "backend/tests/fixtures/sample_transcript.json"
- "backend/pytest.ini"
- "backend/requirements.txt"
- "backend/models.py"

GSD-Task: S02/T02
2026-03-29 22:16:15 +00:00
jlightner
5bfeb50716 feat: Created POST /api/v1/ingest endpoint that accepts Whisper transcr…
- "backend/routers/ingest.py"
- "backend/schemas.py"
- "backend/requirements.txt"
- "backend/main.py"

GSD-Task: S02/T01
2026-03-29 22:09:46 +00:00
jlightner
07126138b5 chore: Built FastAPI app with DB-connected health check, Pydantic schem…
- "backend/main.py"
- "backend/config.py"
- "backend/schemas.py"
- "backend/routers/__init__.py"
- "backend/routers/health.py"
- "backend/routers/creators.py"
- "backend/routers/videos.py"

GSD-Task: S01/T03
2026-03-29 21:54:57 +00:00
jlightner
ad3bccf1f2 fix: Created SQLAlchemy models for all 7 entities, Alembic async migrat…
- "backend/models.py"
- "backend/database.py"
- "alembic/versions/001_initial.py"
- "alembic/env.py"
- "alembic.ini"
- "alembic/script.py.mako"
- "docker-compose.yml"
- ".gsd/KNOWLEDGE.md"

GSD-Task: S01/T02
2026-03-29 21:48:36 +00:00
jlightner
cd271c1a8d feat: Created full Docker Compose project (xpltd_chrysopedia) with Post…
- "docker-compose.yml"
- ".env.example"
- "docker/Dockerfile.api"
- "docker/Dockerfile.web"
- "docker/nginx.conf"
- "backend/main.py"
- "backend/requirements.txt"
- "config/canonical_tags.yaml"

GSD-Task: S01/T01
2026-03-29 21:42:56 +00:00