gsd: plan M001 (Chrysopedia Foundation) with 5 slices and S01 task breakdown
Milestone: Chrysopedia Foundation — Infrastructure, Pipeline Core, and Skeleton UI Slices: S01: Docker Compose + Database + Whisper Script (5 tasks) S02: Transcript Ingestion API S03: LLM Extraction Pipeline + Qdrant Integration S04: Review Queue Admin UI S05: Search-First Web UI Requirements: R001-R015 covering all spec sections. Decisions: D001 (tech stack), D002 (Docker conventions), D003 (storage layer)
This commit is contained in:
parent
8b506a95ca
commit
e15dd97b73
15 changed files with 415 additions and 1 deletions
6
.gitignore
vendored
6
.gitignore
vendored
|
|
@ -1,2 +1,6 @@
|
|||
.bg-shell/
|
||||
.gsd/
|
||||
.gsd/gsd.db
|
||||
.gsd/gsd.db-shm
|
||||
.gsd/gsd.db-wal
|
||||
.gsd/event-log.jsonl
|
||||
.gsd/state-manifest.json
|
||||
|
|
|
|||
9
.gsd/DECISIONS.md
Normal file
9
.gsd/DECISIONS.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Decisions Register
|
||||
|
||||
<!-- Append-only. Never edit or remove existing rows.
|
||||
To reverse a decision, add a new row that supersedes it.
|
||||
Read this file at the start of any planning or research phase. -->
|
||||
|
||||
| # | When | Scope | Decision | Choice | Rationale | Revisable? | Made By |
|
||||
|---|------|-------|----------|--------|-----------|------------|---------|
|
||||
| D001 | | architecture | Docker Compose project naming and path conventions | xpltd_chrysopedia with bind mounts at /vmPool/r/services/chrysopedia_*, compose at /vmPool/r/compose/chrysopedia/ | XPLTD lore: compose projects at /vmPool/r/compose/{name}/, service data at /vmPool/r/services/{service}_{role}/, project naming follows xpltd_{name} pattern. Network will be a dedicated bridge subnet avoiding existing 172.16-172.23 and 172.29-172.30 ranges. | Yes | agent |
|
||||
91
.gsd/REQUIREMENTS.md
Normal file
91
.gsd/REQUIREMENTS.md
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
# Requirements
|
||||
|
||||
## R001 — Whisper Transcription Pipeline
|
||||
**Status:** active
|
||||
**Description:** Desktop Python script that accepts video files (MP4/MKV), extracts audio via ffmpeg, runs Whisper large-v3 on RTX 4090, and outputs timestamped transcript JSON with segment-level timestamps and word-level timing. Must be resumable.
|
||||
**Validation:** Script processes a sample video and produces valid JSON with timestamped segments.
|
||||
**Primary Owner:** M001/S01
|
||||
|
||||
## R002 — Transcript Ingestion API
|
||||
**Status:** active
|
||||
**Description:** FastAPI endpoint that accepts transcript JSON uploads, creates/updates Creator and Source Video records, and stores transcript data in PostgreSQL. Handles new creator detection from folder names.
|
||||
**Validation:** POST transcript JSON → 200 OK, records created in DB, file stored on filesystem.
|
||||
**Primary Owner:** M001/S02
|
||||
|
||||
## R003 — LLM-Powered Extraction Pipeline (Stages 2-5)
|
||||
**Status:** active
|
||||
**Description:** Background worker pipeline: transcript segmentation → key moment extraction → classification/tagging → technique page synthesis. Uses OpenAI-compatible API with primary (DGX Sparks Qwen) and fallback (local Ollama) endpoints. Pipeline must be resumable per-video per-stage.
|
||||
**Validation:** End-to-end: transcript JSON in → technique pages with key moments, tags, and cross-references out.
|
||||
**Primary Owner:** M001/S03
|
||||
|
||||
## R004 — Review Queue UI
|
||||
**Status:** active
|
||||
**Description:** Admin interface for reviewing extracted key moments: approve, edit+approve, split, merge, reject. Organized by source video for contextual review. Includes mode toggle (review vs auto-publish).
|
||||
**Validation:** Admin can review, edit, and approve/reject moments; mode toggle controls whether new moments require review.
|
||||
**Primary Owner:** M001/S04
|
||||
|
||||
## R005 — Search-First Web UI
|
||||
**Status:** active
|
||||
**Description:** Landing page with prominent search bar, live typeahead (results after 2-3 chars), scope toggle (All/Topics/Creators), and two navigation cards (Topics, Creators). Recently added section. Search powered by Qdrant semantic search with keyword fallback.
|
||||
**Validation:** User types query → results appear within 500ms, grouped by type, with clickable navigation.
|
||||
**Primary Owner:** M001/S05
|
||||
|
||||
## R006 — Technique Page Display
|
||||
**Status:** active
|
||||
**Description:** Core content unit: header (tags, title, creator, meta), study guide prose (organized by sub-aspects with signal chain blocks and quotes), key moments index (timestamped list), related techniques, plugins referenced. Amber banner for livestream-sourced content.
|
||||
**Validation:** Technique page renders with all sections populated from synthesized data.
|
||||
**Primary Owner:** M001/S05
|
||||
|
||||
## R007 — Creators Browse Page
|
||||
**Status:** active
|
||||
**Description:** Filterable creator list with genre filter pills, type-to-narrow, sort options (randomized default, alphabetical, view count). Each row: name, genre tags, technique count, video count, view count. Links to creator detail page.
|
||||
**Validation:** Page loads with randomized order, genre filtering works, clicking row navigates to creator detail.
|
||||
**Primary Owner:** M001/S05
|
||||
|
||||
## R008 — Topics Browse Page
|
||||
**Status:** active
|
||||
**Description:** Two-level topic hierarchy (6 top-level categories → sub-topics). Filter input, genre filter pills. Each sub-topic shows technique count and creator count. Clicking sub-topic shows technique pages.
|
||||
**Validation:** Hierarchy renders, filtering works, sub-topic links show correct technique pages.
|
||||
**Primary Owner:** M001/S05
|
||||
|
||||
## R009 — Qdrant Vector Search Integration
|
||||
**Status:** active
|
||||
**Description:** Embed key moment summaries, technique page content, and transcript segments in Qdrant using configurable embedding model (nomic-embed-text default). Power semantic search with metadata filtering.
|
||||
**Validation:** Semantic search returns relevant results for natural language queries; embeddings update when content changes.
|
||||
**Primary Owner:** M001/S03
|
||||
|
||||
## R010 — Docker Compose Deployment
|
||||
**Status:** active
|
||||
**Description:** Single docker-compose.yml packaging API, web UI, PostgreSQL, and worker services. Follows XPLTD conventions: bind mounts at /vmPool/r/services/, compose at /vmPool/r/compose/chrysopedia/, xpltd_chrysopedia project name, dedicated Docker network.
|
||||
**Validation:** `docker compose up -d` brings up all services; data persists across restarts.
|
||||
**Primary Owner:** M001/S01
|
||||
|
||||
## R011 — Canonical Tag System
|
||||
**Status:** active
|
||||
**Description:** Editable canonical tag list (config file) with aliases. Pipeline references tags during classification. New tags can be proposed by LLM and queued for admin approval or auto-added within existing categories.
|
||||
**Validation:** Tag list is editable; pipeline uses canonical tags consistently; alias normalization works.
|
||||
**Primary Owner:** M001/S03
|
||||
|
||||
## R012 — Incremental Content Addition
|
||||
**Status:** active
|
||||
**Description:** System handles ongoing content: new videos processed through pipeline, new creators auto-detected, existing technique pages updated when new moments are added for same creator+topic.
|
||||
**Validation:** Adding a new video for an existing creator updates their technique pages; new creator folder creates new Creator record.
|
||||
**Primary Owner:** M001/S03
|
||||
|
||||
## R013 — Prompt Template System
|
||||
**Status:** active
|
||||
**Description:** Extraction prompts (stages 2-5) stored as editable configuration files, not hardcoded. Admin can edit prompts and re-run extraction on specific or all videos for calibration.
|
||||
**Validation:** Prompt files are editable; re-processing a video with updated prompts produces different output.
|
||||
**Primary Owner:** M001/S03
|
||||
|
||||
## R014 — Creator Equity
|
||||
**Status:** active
|
||||
**Description:** No creator is privileged in the UI. Default sort on Creators page is randomized on every page load. All creators get equal visual weight.
|
||||
**Validation:** Refreshing Creators page shows different order each time; no creator gets larger/bolder display.
|
||||
**Primary Owner:** M001/S05
|
||||
|
||||
## R015 — 30-Second Retrieval Target
|
||||
**Status:** active
|
||||
**Description:** A producer mid-session can find a specific technique in under 30 seconds from Alt+Tab to reading the key insight.
|
||||
**Validation:** Timed test: Alt+Tab → search → read technique → under 30 seconds.
|
||||
**Primary Owner:** M001/S05
|
||||
18
.gsd/STATE.md
Normal file
18
.gsd/STATE.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# GSD State
|
||||
|
||||
**Active Milestone:** M001: Chrysopedia Foundation — Infrastructure, Pipeline Core, and Skeleton UI
|
||||
**Active Slice:** S01: Docker Compose + Database + Whisper Script
|
||||
**Phase:** evaluating-gates
|
||||
**Requirements Status:** 0 active · 0 validated · 0 deferred · 0 out of scope
|
||||
|
||||
## Milestone Registry
|
||||
- 🔄 **M001:** Chrysopedia Foundation — Infrastructure, Pipeline Core, and Skeleton UI
|
||||
|
||||
## Recent Decisions
|
||||
- None recorded
|
||||
|
||||
## Blockers
|
||||
- None
|
||||
|
||||
## Next Action
|
||||
Evaluate 3 quality gate(s) for S01 before execution.
|
||||
13
.gsd/milestones/M001/M001-ROADMAP.md
Normal file
13
.gsd/milestones/M001/M001-ROADMAP.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# M001: Chrysopedia Foundation — Infrastructure, Pipeline Core, and Skeleton UI
|
||||
|
||||
## Vision
|
||||
Stand up the complete Chrysopedia stack: Docker Compose deployment on ub01, PostgreSQL data model, FastAPI backend with transcript ingestion, Whisper transcription script for the desktop, LLM extraction pipeline (stages 2-5), review queue, Qdrant integration, and the search-first web UI with technique pages, creators, and topics browsing. By the end, a video file can be transcribed → ingested → extracted → reviewed → searched and read in the web UI.
|
||||
|
||||
## Slice Overview
|
||||
| ID | Slice | Risk | Depends | Done | After this |
|
||||
|----|-------|------|---------|------|------------|
|
||||
| S01 | Docker Compose + Database + Whisper Script | low | — | ⬜ | docker compose up -d starts all services on ub01; Whisper script transcribes a sample video to JSON |
|
||||
| S02 | Transcript Ingestion API | low | S01 | ⬜ | POST a transcript JSON file to the API; Creator and Source Video records appear in PostgreSQL |
|
||||
| S03 | LLM Extraction Pipeline + Qdrant Integration | high | S02 | ⬜ | A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings. |
|
||||
| S04 | Review Queue Admin UI | medium | S03 | ⬜ | Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode |
|
||||
| S05 | Search-First Web UI | medium | S03 | ⬜ | User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links |
|
||||
81
.gsd/milestones/M001/slices/S01/S01-PLAN.md
Normal file
81
.gsd/milestones/M001/slices/S01/S01-PLAN.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# S01: Docker Compose + Database + Whisper Script
|
||||
|
||||
**Goal:** Deployable infrastructure: Docker Compose project with PostgreSQL (full schema), FastAPI skeleton, and desktop Whisper transcription script
|
||||
**Demo:** After this: docker compose up -d starts all services on ub01; Whisper script transcribes a sample video to JSON
|
||||
|
||||
## Tasks
|
||||
- [ ] **T01: Project scaffolding and Docker Compose** — 1. Create project directory structure:
|
||||
- backend/ (FastAPI app)
|
||||
- frontend/ (React app, placeholder)
|
||||
- whisper/ (desktop transcription script)
|
||||
- docker/ (Dockerfiles)
|
||||
- prompts/ (editable prompt templates)
|
||||
- config/ (canonical tags, settings)
|
||||
2. Write docker-compose.yml with services:
|
||||
- chrysopedia-api (FastAPI, Uvicorn)
|
||||
- chrysopedia-web (React, nginx)
|
||||
- chrysopedia-db (PostgreSQL 16)
|
||||
- chrysopedia-worker (Celery)
|
||||
- chrysopedia-redis (Redis for Celery broker)
|
||||
3. Follow XPLTD conventions: bind mounts, project naming xpltd_chrysopedia, dedicated bridge network
|
||||
4. Create .env.example with all required env vars
|
||||
5. Write Dockerfiles for API and web services
|
||||
- Estimate: 2-3 hours
|
||||
- Files: docker-compose.yml, .env.example, docker/Dockerfile.api, docker/Dockerfile.web, backend/main.py, backend/requirements.txt
|
||||
- Verify: docker compose config validates without errors
|
||||
- [ ] **T02: PostgreSQL schema and migrations** — 1. Create SQLAlchemy models for all 7 entities:
|
||||
- Creator (id, name, slug, genres, folder_name, view_count, timestamps)
|
||||
- SourceVideo (id, creator_id FK, filename, file_path, duration, content_type enum, transcript_path, processing_status enum, timestamps)
|
||||
- TranscriptSegment (id, source_video_id FK, start_time, end_time, text, segment_index, topic_label)
|
||||
- KeyMoment (id, source_video_id FK, technique_page_id FK nullable, title, summary, start/end time, content_type enum, plugins, review_status enum, raw_transcript, timestamps)
|
||||
- TechniquePage (id, creator_id FK, title, slug, topic_category, topic_tags, summary, body_sections JSONB, signal_chains JSONB, plugins, source_quality enum, view_count, review_status enum, timestamps)
|
||||
- RelatedTechniqueLink (id, source_page_id FK, target_page_id FK, relationship enum)
|
||||
- Tag (id, name, category, aliases)
|
||||
2. Set up Alembic for migrations
|
||||
3. Create initial migration
|
||||
4. Add seed data for canonical tags (6 top-level categories)
|
||||
- Estimate: 2-3 hours
|
||||
- Files: backend/models.py, backend/database.py, alembic.ini, alembic/versions/*.py, config/canonical_tags.yaml
|
||||
- Verify: alembic upgrade head succeeds; all 7 tables exist with correct columns and constraints
|
||||
- [ ] **T03: FastAPI application skeleton with health checks** — 1. Set up FastAPI app with:
|
||||
- CORS middleware
|
||||
- Database session dependency
|
||||
- Health check endpoint (/health)
|
||||
- API versioning prefix (/api/v1)
|
||||
2. Create Pydantic schemas for all entities
|
||||
3. Implement basic CRUD endpoints:
|
||||
- GET /api/v1/creators
|
||||
- GET /api/v1/creators/{slug}
|
||||
- GET /api/v1/videos
|
||||
- GET /api/v1/health
|
||||
4. Add structured logging
|
||||
5. Configure environment variable loading from .env
|
||||
- Estimate: 1-2 hours
|
||||
- Files: backend/main.py, backend/schemas.py, backend/routers/__init__.py, backend/routers/health.py, backend/routers/creators.py, backend/config.py
|
||||
- Verify: curl http://localhost:8000/health returns 200; curl http://localhost:8000/api/v1/creators returns empty list
|
||||
- [ ] **T04: Whisper transcription script** — 1. Create Python script whisper/transcribe.py that:
|
||||
- Accepts video file path (or directory for batch mode)
|
||||
- Extracts audio via ffmpeg (subprocess)
|
||||
- Runs Whisper large-v3 with segment-level and word-level timestamps
|
||||
- Outputs JSON matching the spec format (source_file, creator_folder, duration, segments with words)
|
||||
- Supports resumability: checks if output JSON already exists, skips
|
||||
2. Create whisper/requirements.txt (openai-whisper, ffmpeg-python)
|
||||
3. Write output to a configurable output directory
|
||||
4. Add CLI arguments: --input, --output-dir, --model (default large-v3), --device (default cuda)
|
||||
5. Include progress logging for long transcriptions
|
||||
- Estimate: 1-2 hours
|
||||
- Files: whisper/transcribe.py, whisper/requirements.txt, whisper/README.md
|
||||
- Verify: python whisper/transcribe.py --help shows usage; script validates ffmpeg is available
|
||||
- [ ] **T05: Integration verification and documentation** — 1. Write README.md with:
|
||||
- Project overview
|
||||
- Architecture diagram (text)
|
||||
- Setup instructions (Docker Compose + desktop Whisper)
|
||||
- Environment variable documentation
|
||||
- Development workflow
|
||||
2. Verify Docker Compose stack starts with: docker compose up -d
|
||||
3. Verify PostgreSQL schema with: alembic upgrade head
|
||||
4. Verify API health check responds
|
||||
5. Create sample transcript JSON for testing subsequent slices
|
||||
- Estimate: 1 hour
|
||||
- Files: README.md, tests/fixtures/sample_transcript.json
|
||||
- Verify: docker compose config validates; README covers all setup steps; sample transcript JSON is valid
|
||||
40
.gsd/milestones/M001/slices/S01/tasks/T01-PLAN.md
Normal file
40
.gsd/milestones/M001/slices/S01/tasks/T01-PLAN.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
---
|
||||
estimated_steps: 16
|
||||
estimated_files: 6
|
||||
skills_used: []
|
||||
---
|
||||
|
||||
# T01: Project scaffolding and Docker Compose
|
||||
|
||||
1. Create project directory structure:
|
||||
- backend/ (FastAPI app)
|
||||
- frontend/ (React app, placeholder)
|
||||
- whisper/ (desktop transcription script)
|
||||
- docker/ (Dockerfiles)
|
||||
- prompts/ (editable prompt templates)
|
||||
- config/ (canonical tags, settings)
|
||||
2. Write docker-compose.yml with services:
|
||||
- chrysopedia-api (FastAPI, Uvicorn)
|
||||
- chrysopedia-web (React, nginx)
|
||||
- chrysopedia-db (PostgreSQL 16)
|
||||
- chrysopedia-worker (Celery)
|
||||
- chrysopedia-redis (Redis for Celery broker)
|
||||
3. Follow XPLTD conventions: bind mounts, project naming xpltd_chrysopedia, dedicated bridge network
|
||||
4. Create .env.example with all required env vars
|
||||
5. Write Dockerfiles for API and web services
|
||||
|
||||
## Inputs
|
||||
|
||||
- `chrysopedia-spec.md`
|
||||
- `XPLTD lore conventions`
|
||||
|
||||
## Expected Output
|
||||
|
||||
- `docker-compose.yml`
|
||||
- `.env.example`
|
||||
- `docker/Dockerfile.api`
|
||||
- `backend/main.py`
|
||||
|
||||
## Verification
|
||||
|
||||
docker compose config validates without errors
|
||||
34
.gsd/milestones/M001/slices/S01/tasks/T02-PLAN.md
Normal file
34
.gsd/milestones/M001/slices/S01/tasks/T02-PLAN.md
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
---
|
||||
estimated_steps: 11
|
||||
estimated_files: 5
|
||||
skills_used: []
|
||||
---
|
||||
|
||||
# T02: PostgreSQL schema and migrations
|
||||
|
||||
1. Create SQLAlchemy models for all 7 entities:
|
||||
- Creator (id, name, slug, genres, folder_name, view_count, timestamps)
|
||||
- SourceVideo (id, creator_id FK, filename, file_path, duration, content_type enum, transcript_path, processing_status enum, timestamps)
|
||||
- TranscriptSegment (id, source_video_id FK, start_time, end_time, text, segment_index, topic_label)
|
||||
- KeyMoment (id, source_video_id FK, technique_page_id FK nullable, title, summary, start/end time, content_type enum, plugins, review_status enum, raw_transcript, timestamps)
|
||||
- TechniquePage (id, creator_id FK, title, slug, topic_category, topic_tags, summary, body_sections JSONB, signal_chains JSONB, plugins, source_quality enum, view_count, review_status enum, timestamps)
|
||||
- RelatedTechniqueLink (id, source_page_id FK, target_page_id FK, relationship enum)
|
||||
- Tag (id, name, category, aliases)
|
||||
2. Set up Alembic for migrations
|
||||
3. Create initial migration
|
||||
4. Add seed data for canonical tags (6 top-level categories)
|
||||
|
||||
## Inputs
|
||||
|
||||
- `chrysopedia-spec.md section 6 (Data Model)`
|
||||
|
||||
## Expected Output
|
||||
|
||||
- `backend/models.py`
|
||||
- `backend/database.py`
|
||||
- `alembic/versions/001_initial.py`
|
||||
- `config/canonical_tags.yaml`
|
||||
|
||||
## Verification
|
||||
|
||||
alembic upgrade head succeeds; all 7 tables exist with correct columns and constraints
|
||||
37
.gsd/milestones/M001/slices/S01/tasks/T03-PLAN.md
Normal file
37
.gsd/milestones/M001/slices/S01/tasks/T03-PLAN.md
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
estimated_steps: 13
|
||||
estimated_files: 6
|
||||
skills_used: []
|
||||
---
|
||||
|
||||
# T03: FastAPI application skeleton with health checks
|
||||
|
||||
1. Set up FastAPI app with:
|
||||
- CORS middleware
|
||||
- Database session dependency
|
||||
- Health check endpoint (/health)
|
||||
- API versioning prefix (/api/v1)
|
||||
2. Create Pydantic schemas for all entities
|
||||
3. Implement basic CRUD endpoints:
|
||||
- GET /api/v1/creators
|
||||
- GET /api/v1/creators/{slug}
|
||||
- GET /api/v1/videos
|
||||
- GET /api/v1/health
|
||||
4. Add structured logging
|
||||
5. Configure environment variable loading from .env
|
||||
|
||||
## Inputs
|
||||
|
||||
- `backend/models.py`
|
||||
- `backend/database.py`
|
||||
|
||||
## Expected Output
|
||||
|
||||
- `backend/main.py`
|
||||
- `backend/schemas.py`
|
||||
- `backend/routers/creators.py`
|
||||
- `backend/config.py`
|
||||
|
||||
## Verification
|
||||
|
||||
curl http://localhost:8000/health returns 200; curl http://localhost:8000/api/v1/creators returns empty list
|
||||
32
.gsd/milestones/M001/slices/S01/tasks/T04-PLAN.md
Normal file
32
.gsd/milestones/M001/slices/S01/tasks/T04-PLAN.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
---
|
||||
estimated_steps: 10
|
||||
estimated_files: 3
|
||||
skills_used: []
|
||||
---
|
||||
|
||||
# T04: Whisper transcription script
|
||||
|
||||
1. Create Python script whisper/transcribe.py that:
|
||||
- Accepts video file path (or directory for batch mode)
|
||||
- Extracts audio via ffmpeg (subprocess)
|
||||
- Runs Whisper large-v3 with segment-level and word-level timestamps
|
||||
- Outputs JSON matching the spec format (source_file, creator_folder, duration, segments with words)
|
||||
- Supports resumability: checks if output JSON already exists, skips
|
||||
2. Create whisper/requirements.txt (openai-whisper, ffmpeg-python)
|
||||
3. Write output to a configurable output directory
|
||||
4. Add CLI arguments: --input, --output-dir, --model (default large-v3), --device (default cuda)
|
||||
5. Include progress logging for long transcriptions
|
||||
|
||||
## Inputs
|
||||
|
||||
- `chrysopedia-spec.md section 7.2 Stage 1`
|
||||
|
||||
## Expected Output
|
||||
|
||||
- `whisper/transcribe.py`
|
||||
- `whisper/requirements.txt`
|
||||
- `whisper/README.md`
|
||||
|
||||
## Verification
|
||||
|
||||
python whisper/transcribe.py --help shows usage; script validates ffmpeg is available
|
||||
31
.gsd/milestones/M001/slices/S01/tasks/T05-PLAN.md
Normal file
31
.gsd/milestones/M001/slices/S01/tasks/T05-PLAN.md
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
---
|
||||
estimated_steps: 10
|
||||
estimated_files: 2
|
||||
skills_used: []
|
||||
---
|
||||
|
||||
# T05: Integration verification and documentation
|
||||
|
||||
1. Write README.md with:
|
||||
- Project overview
|
||||
- Architecture diagram (text)
|
||||
- Setup instructions (Docker Compose + desktop Whisper)
|
||||
- Environment variable documentation
|
||||
- Development workflow
|
||||
2. Verify Docker Compose stack starts with: docker compose up -d
|
||||
3. Verify PostgreSQL schema with: alembic upgrade head
|
||||
4. Verify API health check responds
|
||||
5. Create sample transcript JSON for testing subsequent slices
|
||||
|
||||
## Inputs
|
||||
|
||||
- `All T01-T04 outputs`
|
||||
|
||||
## Expected Output
|
||||
|
||||
- `README.md`
|
||||
- `tests/fixtures/sample_transcript.json`
|
||||
|
||||
## Verification
|
||||
|
||||
docker compose config validates; README covers all setup steps; sample transcript JSON is valid
|
||||
6
.gsd/milestones/M001/slices/S02/S02-PLAN.md
Normal file
6
.gsd/milestones/M001/slices/S02/S02-PLAN.md
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# S02: Transcript Ingestion API
|
||||
|
||||
**Goal:** FastAPI endpoints for transcript upload, creator management, and source video tracking
|
||||
**Demo:** After this: POST a transcript JSON file to the API; Creator and Source Video records appear in PostgreSQL
|
||||
|
||||
## Tasks
|
||||
6
.gsd/milestones/M001/slices/S03/S03-PLAN.md
Normal file
6
.gsd/milestones/M001/slices/S03/S03-PLAN.md
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# S03: LLM Extraction Pipeline + Qdrant Integration
|
||||
|
||||
**Goal:** Complete LLM pipeline with editable prompt templates, canonical tag system, Qdrant embedding, and resumable processing
|
||||
**Demo:** After this: A transcript JSON triggers stages 2-5: segmentation → extraction → classification → synthesis. Technique pages with key moments appear in DB. Qdrant has searchable embeddings.
|
||||
|
||||
## Tasks
|
||||
6
.gsd/milestones/M001/slices/S04/S04-PLAN.md
Normal file
6
.gsd/milestones/M001/slices/S04/S04-PLAN.md
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# S04: Review Queue Admin UI
|
||||
|
||||
**Goal:** Functional review workflow for calibrating extraction quality
|
||||
**Demo:** After this: Admin views pending key moments, approves/edits/rejects them, toggles between review and auto mode
|
||||
|
||||
## Tasks
|
||||
6
.gsd/milestones/M001/slices/S05/S05-PLAN.md
Normal file
6
.gsd/milestones/M001/slices/S05/S05-PLAN.md
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# S05: Search-First Web UI
|
||||
|
||||
**Goal:** Complete public-facing UI: landing page, live search, technique pages, creators browse, topics browse
|
||||
**Demo:** After this: User searches for a technique, gets semantic results in <500ms, clicks through to a full technique page with study guide prose, key moments, and related links
|
||||
|
||||
## Tasks
|
||||
Loading…
Add table
Reference in a new issue