mirror of
https://github.com/xpltdco/media-rip.git
synced 2026-04-03 02:53:58 -06:00
chore(M001/S01/T02): auto-commit after execute-task
This commit is contained in:
parent
9c37dbfa27
commit
a850b36d49
9 changed files with 830 additions and 0 deletions
1
.bg-shell/manifest.json
Normal file
1
.bg-shell/manifest.json
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
[]
|
||||||
111
.gsd/milestones/M001/slices/S01/S01-PLAN.md
Normal file
111
.gsd/milestones/M001/slices/S01/S01-PLAN.md
Normal file
|
|
@ -0,0 +1,111 @@
|
||||||
|
# S01: Foundation + Download Engine
|
||||||
|
|
||||||
|
**Goal:** Deliver the backend foundation: project scaffold, SQLite database with WAL mode, config system (defaults → YAML → env vars), Pydantic models, SSE broker data structure, yt-dlp download service with sync-to-async progress bridging, and API routes for submitting downloads and probing formats.
|
||||||
|
|
||||||
|
**Demo:** `POST /api/downloads` with a URL → yt-dlp downloads it to `/downloads` with progress events arriving in an `asyncio.Queue` via `call_soon_threadsafe`. `GET /api/formats?url=` returns available qualities. Config loads from YAML + env vars. SQLite with WAL mode stores jobs. Proven via pytest running API tests and a real yt-dlp download.
|
||||||
|
|
||||||
|
## Must-Haves
|
||||||
|
|
||||||
|
- Project scaffold with `pyproject.toml`, pinned dependencies, and `backend/app/` package structure matching the boundary map
|
||||||
|
- Pydantic models: `Job`, `JobStatus`, `JobCreate`, `ProgressEvent` (with `from_yt_dlp` normalizer handling `total_bytes: None`), `Session`, `FormatInfo`
|
||||||
|
- Config via `pydantic-settings[yaml]`: `AppConfig` with env prefix `MEDIARIP`, nested delimiter `__`, YAML source, zero-config defaults
|
||||||
|
- SQLite database via `aiosqlite`: WAL mode + `busy_timeout=5000` + `synchronous=NORMAL` as first PRAGMAs, schema for `sessions`/`jobs`/`config`/`unsupported_urls` tables, async CRUD functions
|
||||||
|
- `SSEBroker`: per-session queue map with `subscribe`/`unsubscribe`/`publish`, thread-safe via `call_soon_threadsafe`
|
||||||
|
- `DownloadService`: `ThreadPoolExecutor`, fresh `YoutubeDL` per job, progress hook → broker publish, `enqueue()` and `get_formats()` methods
|
||||||
|
- Output template resolver: per-domain template lookup with fallback to `*` default
|
||||||
|
- `POST /api/downloads`, `GET /api/downloads`, `DELETE /api/downloads/{id}`, `GET /api/formats?url=`
|
||||||
|
- Stub session ID dependency (reads `X-Session-ID` header, falls back to default UUID) replaceable by S02 middleware
|
||||||
|
- Real yt-dlp integration test proving progress events flow through the sync-to-async bridge
|
||||||
|
|
||||||
|
## Proof Level
|
||||||
|
|
||||||
|
- This slice proves: integration (sync-to-async bridge, DB concurrency, full API vertical)
|
||||||
|
- Real runtime required: yes (yt-dlp must download a real file)
|
||||||
|
- Human/UAT required: no
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
All tests run from `backend/`:
|
||||||
|
|
||||||
|
- `cd backend && python -m pytest tests/test_models.py -v` — model construction, `ProgressEvent.from_yt_dlp` normalization, edge cases
|
||||||
|
- `cd backend && python -m pytest tests/test_config.py -v` — env var override, YAML loading, zero-config defaults
|
||||||
|
- `cd backend && python -m pytest tests/test_database.py -v` — CRUD, WAL mode verification, concurrent writes
|
||||||
|
- `cd backend && python -m pytest tests/test_sse_broker.py -v` — subscribe/unsubscribe, thread-safe publish
|
||||||
|
- `cd backend && python -m pytest tests/test_download_service.py -v` — real yt-dlp download with progress events, format extraction
|
||||||
|
- `cd backend && python -m pytest tests/test_api.py -v` — all four API endpoints via httpx AsyncClient
|
||||||
|
- `cd backend && python -m pytest tests/ -v` — full suite green, 0 failures
|
||||||
|
- Verify `PRAGMA journal_mode` returns `wal` in database test
|
||||||
|
- Verify progress events contain `status=downloading` with valid percent values in download service test
|
||||||
|
|
||||||
|
## Observability / Diagnostics
|
||||||
|
|
||||||
|
- Runtime signals: `logging.getLogger("mediarip")` structured logs on job state transitions (queued → extracting → downloading → completed/failed), download errors logged with job_id + exception
|
||||||
|
- Inspection surfaces: `jobs` table in SQLite with `status`, `error_message`, `progress_percent` columns; `PRAGMA journal_mode` query to verify WAL
|
||||||
|
- Failure visibility: `Job.error_message` stores failure reason, `Job.status = "failed"` on any download error, `ProgressEvent` includes `status` field for real-time failure detection
|
||||||
|
- Redaction constraints: none in S01 (admin credentials are S04)
|
||||||
|
|
||||||
|
## Integration Closure
|
||||||
|
|
||||||
|
- Upstream surfaces consumed: none (first slice)
|
||||||
|
- New wiring introduced: FastAPI app factory with lifespan (DB init/close), router mounting, dependency injection for DownloadService/SSEBroker/database
|
||||||
|
- What remains before the milestone is truly usable end-to-end: S02 (SSE transport + real session middleware), S03 (frontend SPA), S04 (admin auth), S05 (themes), S06 (Docker + CI/CD)
|
||||||
|
|
||||||
|
## Tasks
|
||||||
|
|
||||||
|
- [x] **T01: Scaffold project and define Pydantic models** `est:45m`
|
||||||
|
- Why: Greenfield project — no code exists. Every subsequent task imports from the models and depends on the package structure. The boundary map contract (`app/core/`, `app/services/`, `app/routers/`, `app/models/`) must be established first.
|
||||||
|
- Files: `backend/pyproject.toml`, `backend/app/__init__.py`, `backend/app/main.py`, `backend/app/models/__init__.py`, `backend/app/models/job.py`, `backend/app/models/session.py`, `backend/tests/test_models.py`
|
||||||
|
- Do: Create `backend/pyproject.toml` with all pinned deps from research. Create directory structure with `__init__.py` files for `app/`, `app/core/`, `app/services/`, `app/routers/`, `app/models/`, `app/middleware/`. Write `JobStatus` enum, `JobCreate`, `Job`, `ProgressEvent` (with `from_yt_dlp` classmethod), `FormatInfo`, `Session` models. Write `app/main.py` skeleton (empty FastAPI app, placeholder lifespan). Write model unit tests covering ProgressEvent normalization with `total_bytes: None`, `total_bytes_estimate` fallback, and all status values.
|
||||||
|
- Verify: `cd backend && pip install -e ".[dev]" && python -m pytest tests/test_models.py -v`
|
||||||
|
- Done when: `pip install -e ".[dev]"` succeeds, all model tests pass, `from app.models.job import Job, JobStatus, ProgressEvent, JobCreate, FormatInfo` works
|
||||||
|
|
||||||
|
- [ ] **T02: Build config system, database layer, and SSE broker** `est:1h`
|
||||||
|
- Why: These three infrastructure modules are the foundation everything else depends on. Config provides settings to database and download service. Database stores all job state. SSE broker is the thread-safe event distribution mechanism. All three are pure infrastructure with well-defined interfaces.
|
||||||
|
- Files: `backend/app/core/config.py`, `backend/app/core/database.py`, `backend/app/core/sse_broker.py`, `backend/tests/conftest.py`, `backend/tests/test_config.py`, `backend/tests/test_database.py`, `backend/tests/test_sse_broker.py`
|
||||||
|
- Do: Build `AppConfig` via pydantic-settings with env prefix `MEDIARIP`, nested delimiter `__`, YAML source (handle missing file gracefully), and `settings_customise_sources` for priority ordering. Build database module with aiosqlite: singleton connection pattern for lifespan, WAL + busy_timeout + synchronous PRAGMAs first, schema creation (sessions, jobs, config, unsupported_urls tables with indexes), async CRUD functions. Build SSEBroker with per-session queue map, subscribe/unsubscribe, and `publish` using `loop.call_soon_threadsafe`. Create `conftest.py` with shared fixtures (temp DB, test config). Write tests: config env override + YAML + zero-config defaults; DB CRUD + WAL verification + concurrent write test; broker subscribe/publish-from-thread/unsubscribe.
|
||||||
|
- Verify: `cd backend && python -m pytest tests/test_config.py tests/test_database.py tests/test_sse_broker.py -v`
|
||||||
|
- Done when: All three test files pass. `PRAGMA journal_mode` returns `wal`. Concurrent writes (3 simultaneous) complete without `SQLITE_BUSY`. Broker publish from a thread delivers event to subscriber queue.
|
||||||
|
|
||||||
|
- [ ] **T03: Implement download service with sync-to-async bridge** `est:1h`
|
||||||
|
- Why: This is the highest-risk component in the slice — the sync-to-async bridge between yt-dlp worker threads and asyncio queues. It must be built and proven separately before API routes wire it up. The output template resolver is a direct dependency. This task retires the primary risk identified in the roadmap: "proving yt-dlp progress events arrive in an asyncio.Queue via call_soon_threadsafe."
|
||||||
|
- Files: `backend/app/services/download.py`, `backend/app/services/output_template.py`, `backend/app/services/__init__.py`, `backend/tests/test_download_service.py`, `backend/tests/test_output_template.py`
|
||||||
|
- Do: Build `resolve_template(url, user_override, config)` — extract domain, lookup in `source_templates` config map, fallback to `*`. Build `DownloadService` class: accepts config, database, SSE broker, event loop in constructor. `ThreadPoolExecutor(max_workers=config.downloads.max_concurrent)`. `enqueue(job_create, session_id)` creates DB row then submits `_run_download` to executor. `_run_download` creates fresh `YoutubeDL` per job (never shared), registers progress hook that calls `loop.call_soon_threadsafe(broker.publish, session_id, ProgressEvent.from_yt_dlp(...))`, updates DB on completion/failure. `get_formats(url)` runs `extract_info(url, download=False)` in executor, returns list of `FormatInfo`. `cancel(job_id)` sets status=failed in DB. Handle `total_bytes: None` in progress hook. Throttle DB progress writes (≥1% change or status change). Write integration test: real yt-dlp download of a short Creative Commons video, assert progress events arrive in broker queue with `status=downloading` and valid percent. Write format extraction test. Write output template unit tests.
|
||||||
|
- Verify: `cd backend && python -m pytest tests/test_download_service.py tests/test_output_template.py -v`
|
||||||
|
- Done when: Real download test passes — file appears in output dir AND progress events with `status=downloading` were received in the broker queue. Format extraction returns non-empty list with `format_id` and `ext` fields. Output template resolves domain-specific and fallback templates correctly.
|
||||||
|
|
||||||
|
- [ ] **T04: Wire API routes and FastAPI app factory** `est:45m`
|
||||||
|
- Why: The API routes are the HTTP surface that S02 and S03 consume. The app factory lifespan wires database init/close and service construction. The stub session dependency provides `session_id` for testing until S02 delivers real middleware. This task proves the full vertical: HTTP request → router → service → yt-dlp → DB + SSE broker.
|
||||||
|
- Files: `backend/app/main.py`, `backend/app/routers/downloads.py`, `backend/app/routers/formats.py`, `backend/app/routers/__init__.py`, `backend/app/dependencies.py`, `backend/tests/test_api.py`, `backend/tests/conftest.py`
|
||||||
|
- Do: Create `app/dependencies.py` with stub `get_session_id` dependency (reads `X-Session-ID` header, falls back to a default UUID — clearly documented as S02-replaceable). Update `app/main.py` lifespan: init aiosqlite connection with WAL PRAGMAs, create schema, instantiate AppConfig + SSEBroker + DownloadService, store on `app.state`, close DB on shutdown. Mount download and format routers under `/api`. Build `POST /api/downloads` (accepts `JobCreate` body + session_id dep, delegates to `DownloadService.enqueue`, returns `Job`), `GET /api/downloads` (returns jobs for session from DB), `DELETE /api/downloads/{id}` (cancels job), `GET /api/formats?url=` (delegates to `DownloadService.get_formats`). Write API tests via `httpx.AsyncClient` + `ASGITransport`: POST valid URL → 200 + Job JSON, GET downloads → list, DELETE → 200, GET formats → format list, POST invalid URL → error response.
|
||||||
|
- Verify: `cd backend && python -m pytest tests/test_api.py -v && python -m pytest tests/ -v`
|
||||||
|
- Done when: All four API endpoints return correct responses. Full test suite (`python -m pytest tests/ -v`) passes with 0 failures. The app starts via lifespan without errors.
|
||||||
|
|
||||||
|
## Files Likely Touched
|
||||||
|
|
||||||
|
- `backend/pyproject.toml`
|
||||||
|
- `backend/app/__init__.py`
|
||||||
|
- `backend/app/main.py`
|
||||||
|
- `backend/app/models/__init__.py`
|
||||||
|
- `backend/app/models/job.py`
|
||||||
|
- `backend/app/models/session.py`
|
||||||
|
- `backend/app/core/__init__.py`
|
||||||
|
- `backend/app/core/config.py`
|
||||||
|
- `backend/app/core/database.py`
|
||||||
|
- `backend/app/core/sse_broker.py`
|
||||||
|
- `backend/app/services/__init__.py`
|
||||||
|
- `backend/app/services/download.py`
|
||||||
|
- `backend/app/services/output_template.py`
|
||||||
|
- `backend/app/routers/__init__.py`
|
||||||
|
- `backend/app/routers/downloads.py`
|
||||||
|
- `backend/app/routers/formats.py`
|
||||||
|
- `backend/app/dependencies.py`
|
||||||
|
- `backend/app/middleware/__init__.py`
|
||||||
|
- `backend/tests/__init__.py`
|
||||||
|
- `backend/tests/conftest.py`
|
||||||
|
- `backend/tests/test_models.py`
|
||||||
|
- `backend/tests/test_config.py`
|
||||||
|
- `backend/tests/test_database.py`
|
||||||
|
- `backend/tests/test_sse_broker.py`
|
||||||
|
- `backend/tests/test_download_service.py`
|
||||||
|
- `backend/tests/test_output_template.py`
|
||||||
|
- `backend/tests/test_api.py`
|
||||||
157
.gsd/milestones/M001/slices/S01/S01-RESEARCH.md
Normal file
157
.gsd/milestones/M001/slices/S01/S01-RESEARCH.md
Normal file
|
|
@ -0,0 +1,157 @@
|
||||||
|
# S01: Foundation + Download Engine — Research
|
||||||
|
|
||||||
|
**Date:** 2026-03-17
|
||||||
|
**Depth:** Deep research — high-risk slice, sync-to-async bridge, greenfield project with no existing code
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
S01 is the foundation slice for a greenfield project. No source code exists yet — everything must be built from scratch using the comprehensive planning docs (PROJECT.md, ARCHITECTURE.md, STACK.md, PITFALLS.md) as specifications. The slice must deliver: project scaffolding with dependency management, SQLite database with WAL mode, a three-layer config system (defaults → YAML → env vars), Pydantic models for jobs/sessions/events, an SSE broker data structure for per-session queues, a download service wrapping yt-dlp in a ThreadPoolExecutor with `call_soon_threadsafe` progress bridging, and API routes for submitting downloads and probing formats.
|
||||||
|
|
||||||
|
The primary risk is the sync-to-async bridge: yt-dlp is synchronous, FastAPI is async, and progress events must flow from worker threads to asyncio queues without blocking the event loop or losing events. This is a well-documented pattern (`ThreadPoolExecutor` + `loop.call_soon_threadsafe`), but getting the event loop capture and hook wiring wrong produces silent event loss. The slice must prove this works with a real download test.
|
||||||
|
|
||||||
|
Secondary risks are SQLite write contention under concurrent downloads (solved by WAL mode + busy_timeout, but must be enabled before any schema work) and the config system's fourth layer (SQLite admin writes, which S04 builds on top of the pydantic-settings layers delivered here).
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
Build bottom-up: project scaffold → database → config → models → SSE broker → download service → API routes → tests. Prove the sync-to-async bridge as early as possible by writing an integration test that runs a real yt-dlp download and asserts progress events arrive in an asyncio.Queue.
|
||||||
|
|
||||||
|
**Key architectural choices to follow** (from DECISIONS.md):
|
||||||
|
- D001: Python 3.12 + FastAPI
|
||||||
|
- D004: SQLite via aiosqlite with WAL mode
|
||||||
|
- D005: yt-dlp as library import, not subprocess
|
||||||
|
- D006: ThreadPoolExecutor + loop.call_soon_threadsafe
|
||||||
|
- D007: Opaque UUID in httpOnly cookie (session model only; middleware is S02)
|
||||||
|
- D008: HTTPBasic + bcrypt 5.0.0 direct (admin auth is S04, but the model should accommodate it)
|
||||||
|
- D009: Defaults → config.yaml → env vars → SQLite admin writes
|
||||||
|
|
||||||
|
**Naming convention:** Follow the boundary map in the roadmap (`app/core/`, `app/services/`, `app/routers/`, `app/models/`, `app/middleware/`), not the PROJECT.md structure (which uses `app/api/` and `app/core/` for everything). The roadmap boundary map is the contract S02 depends on.
|
||||||
|
|
||||||
|
## Implementation Landscape
|
||||||
|
|
||||||
|
### Key Files
|
||||||
|
|
||||||
|
All paths relative to `backend/` within the repo root.
|
||||||
|
|
||||||
|
- `backend/pyproject.toml` — Python project config with pinned dependencies (fastapi 0.135.1, uvicorn 0.42.0, yt-dlp 2026.3.17, aiosqlite 0.22.1, apscheduler 3.11.2, pydantic 2.12.5, pydantic-settings[yaml] 2.13.1, sse-starlette 3.3.3, bcrypt 5.0.0, python-multipart 0.0.22, PyYAML 6.0.2). Dev deps: httpx 0.28.1, pytest 9.0.2, anyio, ruff.
|
||||||
|
- `backend/app/__init__.py` — Package marker
|
||||||
|
- `backend/app/main.py` — FastAPI app factory with lifespan context manager (DB init/close, future scheduler start). Mounts routers. SPA fallback for frontend (future). **S01 delivers the skeleton only** — lifespan starts DB, mounts download + format routers.
|
||||||
|
- `backend/app/core/__init__.py` — Package marker
|
||||||
|
- `backend/app/core/database.py` — Singleton aiosqlite connection managed in lifespan. Must set `PRAGMA journal_mode=WAL`, `PRAGMA synchronous=NORMAL`, `PRAGMA busy_timeout=5000` before schema creation. Schema: `sessions`, `jobs`, `config`, `unsupported_urls` tables. Provides async functions for job CRUD (create, get_by_id, get_by_session, update_status, update_progress, delete). Uses `aiosqlite.Row` row_factory for dict-like access. Indexes on `jobs(session_id, status)`, `jobs(completed_at)`, `sessions(last_seen)`.
|
||||||
|
- `backend/app/core/config.py` — `AppConfig` via pydantic-settings with `env_prefix="MEDIARIP"`, `env_nested_delimiter="__"`, `yaml_file` path. Nested models: `ServerConfig`, `DownloadsConfig`, `SessionConfig`, `PurgeConfig`, `UIConfig`, `ReportingConfig`, `AdminConfig`. `settings_customise_sources` override to order: env vars → YAML → init → defaults. This covers layers 1-3 of the config hierarchy. Layer 4 (SQLite admin writes) is S04's responsibility — S01 just reads config, never writes to SQLite config table.
|
||||||
|
- `backend/app/models/__init__.py` — Package marker
|
||||||
|
- `backend/app/models/job.py` — `JobStatus` enum (queued, extracting, downloading, completed, failed, expired), `JobCreate` (url, format_id, quality, output_template — all optional except url), `Job` Pydantic model matching the DB schema, `ProgressEvent` model (job_id, status, percent, speed, eta, downloaded_bytes, total_bytes, filename). ProgressEvent has a `from_yt_dlp(job_id, d)` classmethod that normalizes raw yt-dlp progress hook dicts.
|
||||||
|
- `backend/app/models/session.py` — `Session` Pydantic model (id, created_at, last_seen, job_count). Lightweight — S02 adds middleware that actually creates sessions.
|
||||||
|
- `backend/app/core/sse_broker.py` — `SSEBroker` class. Holds `dict[str, list[asyncio.Queue]]` mapping session_id → list of subscriber queues. Methods: `subscribe(session_id) → Queue`, `unsubscribe(session_id, queue)`, `publish(session_id, event)`. The `publish` method uses `loop.call_soon_threadsafe(queue.put_nowait, event)` — this is the thread-safe bridge. Must store a reference to the event loop captured at app startup. **S01 builds this data structure; S02 wires it to the SSE endpoint.**
|
||||||
|
- `backend/app/services/__init__.py` — Package marker
|
||||||
|
- `backend/app/services/download.py` — `DownloadService` class. Owns a `ThreadPoolExecutor(max_workers=config.downloads.max_concurrent)`. Methods: `enqueue(job_create, session_id) → Job` (creates DB row, submits to executor), `cancel(job_id)` (sets status=failed, relies on yt-dlp's internal cancellation — no reliable mid-stream abort exists), `get_formats(url) → list[FormatInfo]` (runs `extract_info(url, download=False)` in executor). The worker function `_run_download(job_id, url, opts)` creates a **fresh YoutubeDL instance per job** (never shared — Pitfall #1), registers a progress hook that calls `loop.call_soon_threadsafe(broker.publish, session_id, event)`, and handles errors by updating DB status to `failed`. The output template is resolved per-source domain using the `source_templates` config map (R019).
|
||||||
|
- `backend/app/services/output_template.py` — `resolve_template(url, user_override, config) → str`. Extracts domain from URL, looks up in `config.downloads.source_templates`, falls back to `*` default. If user provided an override in the job submission, use that instead. Simple utility, no I/O.
|
||||||
|
- `backend/app/routers/__init__.py` — Package marker
|
||||||
|
- `backend/app/routers/downloads.py` — `POST /api/downloads` (accepts JobCreate body + session_id from request state, delegates to DownloadService.enqueue), `GET /api/downloads` (returns jobs for current session from DB), `DELETE /api/downloads/{id}` (delegates to DownloadService.cancel). Session_id comes from `request.state.session_id` — **in S01, this must be a temporary dependency** since session middleware is S02. Use a header or query param fallback for testing, or a stub middleware.
|
||||||
|
- `backend/app/routers/formats.py` — `GET /api/formats?url={url}` (delegates to DownloadService.get_formats). Returns normalized format list with resolution, codec, ext, filesize estimate, format_id. Must handle `filesize: null` gracefully (common — R002 notes this).
|
||||||
|
- `backend/tests/` — Test directory with conftest.py (httpx AsyncClient + ASGITransport), test files for database, config, download service, and API routes.
|
||||||
|
|
||||||
|
### Build Order
|
||||||
|
|
||||||
|
The build order is strictly dependency-driven:
|
||||||
|
|
||||||
|
1. **Project scaffold** — `pyproject.toml`, directory structure, `__init__.py` files, `backend/app/main.py` skeleton with empty lifespan. This unblocks everything else.
|
||||||
|
|
||||||
|
2. **Pydantic models** (`app/models/`) — Job, Session, ProgressEvent, JobCreate, FormatInfo models. These are pure data classes with no dependencies. Every other module imports from here.
|
||||||
|
|
||||||
|
3. **Config system** (`app/core/config.py`) — AppConfig with pydantic-settings. Depends on nothing except pydantic. Creates the typed config that database, download service, and routes all need. Must be testable standalone: verify env var override works, verify YAML loading works, verify defaults are sane.
|
||||||
|
|
||||||
|
4. **Database** (`app/core/database.py`) — aiosqlite connection singleton, schema creation, WAL mode setup, job/session CRUD functions. Depends on models (for type hints) and config (for DB path). **Critical: WAL + busy_timeout must be the first PRAGMAs executed.** Test with concurrent writes to verify no SQLITE_BUSY errors.
|
||||||
|
|
||||||
|
5. **SSE Broker** (`app/core/sse_broker.py`) — Pure asyncio data structure. Depends only on the event loop reference. Test in isolation: create broker, subscribe, publish from a thread, verify event arrives in queue.
|
||||||
|
|
||||||
|
6. **Output template resolver** (`app/services/output_template.py`) — Pure function, depends only on config. Quick to build and test.
|
||||||
|
|
||||||
|
7. **Download service** (`app/services/download.py`) — The critical integration point. Depends on database, config, SSE broker, models, output_template. This is where the sync-to-async bridge lives. **Build and test this before API routes** — proving the bridge works is the slice's primary risk retirement.
|
||||||
|
|
||||||
|
8. **API routes** (`app/routers/downloads.py`, `app/routers/formats.py`) — Thin HTTP layer over the download service. Depends on everything above. Need a stub session_id mechanism for testing (S02 provides real middleware).
|
||||||
|
|
||||||
|
9. **Integration tests** — Real yt-dlp download test that proves events flow through the bridge. Format extraction test against a known URL. Concurrent download test (3 simultaneous) that proves WAL mode handles contention.
|
||||||
|
|
||||||
|
### Verification Approach
|
||||||
|
|
||||||
|
**Unit tests** (fast, no network):
|
||||||
|
- Config: env var override, YAML loading, defaults
|
||||||
|
- Models: ProgressEvent.from_yt_dlp with various yt-dlp dict shapes (including `total_bytes: None`)
|
||||||
|
- Database: CRUD operations, WAL mode verification (`PRAGMA journal_mode` returns `wal`), concurrent write test
|
||||||
|
- SSE Broker: subscribe/unsubscribe, publish from thread via call_soon_threadsafe
|
||||||
|
- Output template: domain matching, fallback to `*`, user override priority
|
||||||
|
|
||||||
|
**Integration tests** (require yt-dlp, may need network):
|
||||||
|
- `test_real_download` — Submit a short public-domain video URL → verify file appears in output dir, verify ProgressEvents were emitted with status=downloading and status=finished
|
||||||
|
- `test_format_extraction` — Call `get_formats` on a known URL → verify formats list is non-empty, each has format_id + ext
|
||||||
|
- `test_concurrent_downloads` — Start 3 downloads simultaneously → verify all complete without SQLITE_BUSY errors or progress cross-contamination
|
||||||
|
|
||||||
|
**API tests** (httpx AsyncClient):
|
||||||
|
- `POST /api/downloads` with valid URL → 200 + Job response
|
||||||
|
- `GET /api/downloads` → list of jobs
|
||||||
|
- `DELETE /api/downloads/{id}` → 200
|
||||||
|
- `GET /api/formats?url=...` → format list
|
||||||
|
- `POST /api/downloads` with invalid URL → appropriate error
|
||||||
|
|
||||||
|
**Smoke command:** `cd backend && python -m pytest tests/ -v`
|
||||||
|
|
||||||
|
## Don't Hand-Roll
|
||||||
|
|
||||||
|
| Problem | Existing Solution | Why Use It |
|
||||||
|
|---------|------------------|------------|
|
||||||
|
| Config loading from YAML + env vars with nested delimiter | `pydantic-settings[yaml]` with `YamlConfigSettingsSource` | Handles `MEDIARIP__SECTION__KEY` → nested model natively via `env_nested_delimiter="__"`. Custom source priority via `settings_customise_sources`. No manual parsing needed. |
|
||||||
|
| Progress hook normalization | yt-dlp's built-in `progress_hooks` callback | Fires with structured dict containing `status`, `downloaded_bytes`, `total_bytes`, `speed`, `eta`, `filename`. Just normalize into Pydantic model. |
|
||||||
|
| Thread-safe event loop bridging | `asyncio.AbstractEventLoop.call_soon_threadsafe` | stdlib solution. The ONLY safe way to push data from a sync thread to an asyncio Queue. |
|
||||||
|
| SQLite async access | `aiosqlite` | asyncio bridge over stdlib sqlite3. Context manager pattern for connection lifecycle. |
|
||||||
|
| HTTP test client | `httpx.AsyncClient` with `ASGITransport` | FastAPI's recommended testing pattern. No real server needed. |
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- **Python 3.12 only** — passlib breaks on 3.13; pinned in Dockerfile (D001)
|
||||||
|
- **yt-dlp as library, not subprocess** — structured progress hooks, no shell injection (D005)
|
||||||
|
- **Fresh YoutubeDL instance per job** — never shared across threads. YoutubeDL contains mutable state (cookies, temp files, logger) that corrupts under concurrent access (Pitfall #1)
|
||||||
|
- **ThreadPoolExecutor only** — YoutubeDL is not picklable, rules out ProcessPoolExecutor (D006, yt-dlp issue #9487)
|
||||||
|
- **WAL mode + busy_timeout BEFORE any schema work** — first PRAGMAs on DB init. Without this, 3+ concurrent downloads cause SQLITE_BUSY (Pitfall #7)
|
||||||
|
- **Event loop captured at startup** — `asyncio.get_event_loop()` in lifespan, stored on SSEBroker/DownloadService. Cannot call `get_event_loop()` inside a worker thread.
|
||||||
|
- **yt-dlp >= 2023.07.06** — CVE-2023-35934 cookie leak via redirect. Pin version in dependencies.
|
||||||
|
- **pydantic-settings env prefix** — Must use `MEDIARIP` prefix (no trailing underscore — pydantic-settings adds `_` between prefix and field). Double-underscore `__` for nesting: `MEDIARIP__DOWNLOADS__MAX_CONCURRENT`.
|
||||||
|
- **No automatic outbound network requests** — R020 hard constraint. No telemetry, no CDN, no update checks.
|
||||||
|
- **Session middleware is S02** — S01 routes need a temporary session_id mechanism. Use a dependency that reads `X-Session-ID` header or generates a default UUID for testing. S02 replaces this with real cookie middleware.
|
||||||
|
|
||||||
|
## Common Pitfalls
|
||||||
|
|
||||||
|
- **Shared YoutubeDL instance** — Progress percentages jump between jobs, `TypeError` on `None` fields. Create fresh instance per job inside the worker function. Never pass YoutubeDL across thread boundaries. (Pitfall #1)
|
||||||
|
- **Calling asyncio primitives from progress hook** — `asyncio.Queue.put_nowait()` directly from the hook raises `RuntimeError: no running event loop`. Must use `loop.call_soon_threadsafe(queue.put_nowait, data)`. (Pitfall #2)
|
||||||
|
- **`total_bytes` is frequently None** — yt-dlp returns `None` for subtitle downloads, live streams, and some sites. The `ProgressEvent.from_yt_dlp` normalizer must handle this: use `total_bytes_estimate` as fallback, calculate percent as 0 if both are None. (R002 notes, Pitfall checklist)
|
||||||
|
- **aiosqlite connection not closed properly** — Always use `async with aiosqlite.connect()` context manager. Unclosed connections in test teardown cause "database is locked" errors in subsequent tests.
|
||||||
|
- **pydantic-settings YAML file missing** — If `config.yaml` doesn't exist (zero-config mode), pydantic-settings must not crash. Set `yaml_file` only if the file exists, or handle `FileNotFoundError` in the custom source.
|
||||||
|
- **Progress hook throttling** — yt-dlp fires the hook very frequently (every few KB on fast connections). Writing every event to DB causes write contention. Throttle DB writes: update only when percent changes by ≥1% or status changes. SSE broker gets all events (they're cheap in-memory), but DB gets throttled writes.
|
||||||
|
- **Format extraction timeout** — `extract_info(url, download=False)` can take 3-10+ seconds for some sites. Must run in executor (not on event loop). Consider a timeout wrapper so a bad URL doesn't block a thread pool slot forever.
|
||||||
|
|
||||||
|
## Open Risks
|
||||||
|
|
||||||
|
- **Session ID mechanism for S01 testing** — S01 produces download/format routes that need `session_id`, but session middleware is S02. The stub mechanism (header-based fallback) must be cleanly replaceable. Risk: if the stub leaks into production code or makes assumptions S02 breaks.
|
||||||
|
- **yt-dlp version drift** — Pinning to 2026.3.17 ensures reproducibility, but site extractors break as YouTube/Vimeo update APIs. Users will report "can't download X" before a new image is published. Acceptable for v1.0 but needs an update strategy for v1.x.
|
||||||
|
- **Large playlist memory pressure** — A 200-video playlist creates 201 DB rows and 201 SSE events on reconnect replay. S01 should design the schema to handle this but cannot fully test it without the SSE endpoint (S02).
|
||||||
|
- **Config YAML missing vs. malformed** — Missing file = zero-config (expected). Malformed YAML = crash at startup. Need graceful error handling with clear error message pointing to the syntax problem.
|
||||||
|
|
||||||
|
## Skills Discovered
|
||||||
|
|
||||||
|
| Technology | Skill | Status |
|
||||||
|
|------------|-------|--------|
|
||||||
|
| FastAPI | `wshobson/agents@fastapi-templates` (7.3K installs) | available — most popular; general FastAPI templates |
|
||||||
|
| FastAPI | `fastapi/fastapi@fastapi` (509 installs) | available — official repo skill |
|
||||||
|
| yt-dlp | `lwmxiaobei/yt-dlp-skill@yt-dlp` (559 installs) | available — yt-dlp specific |
|
||||||
|
|
||||||
|
None are critical for this work — the planning docs + library docs provide sufficient implementation guidance. Consider installing the FastAPI templates skill if future slices need more boilerplate generation.
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
- yt-dlp progress hooks and extract_info API (source: [yt-dlp embedding docs](https://github.com/yt-dlp/yt-dlp#embedding-yt-dlp))
|
||||||
|
- pydantic-settings YAML + env nested delimiter (source: [pydantic-settings docs](https://docs.pydantic.dev/latest/concepts/pydantic_settings/))
|
||||||
|
- sse-starlette disconnect handling with CancelledError (source: [sse-starlette README](https://github.com/sysid/sse-starlette))
|
||||||
|
- aiosqlite async context manager pattern (source: [aiosqlite README](https://github.com/omnilib/aiosqlite))
|
||||||
|
- yt-dlp YoutubeDL not picklable — ThreadPoolExecutor required (source: [yt-dlp issue #9487](https://github.com/yt-dlp/yt-dlp/issues/9487))
|
||||||
|
- CVE-2023-35934 cookie leak via redirect (source: [GHSA-v8mc-9377-rwjj](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-v8mc-9377-rwjj))
|
||||||
|
- SQLite WAL mode for concurrent write access (source: [SQLite WAL docs](https://www.sqlite.org/wal.html))
|
||||||
|
- APScheduler CronTrigger.from_crontab for cron string parsing (source: [APScheduler 3.x docs](https://apscheduler.readthedocs.io/en/3.x/))
|
||||||
96
.gsd/milestones/M001/slices/S01/tasks/T01-PLAN.md
Normal file
96
.gsd/milestones/M001/slices/S01/tasks/T01-PLAN.md
Normal file
|
|
@ -0,0 +1,96 @@
|
||||||
|
---
|
||||||
|
estimated_steps: 5
|
||||||
|
estimated_files: 7
|
||||||
|
---
|
||||||
|
|
||||||
|
# T01: Scaffold project and define Pydantic models
|
||||||
|
|
||||||
|
**Slice:** S01 — Foundation + Download Engine
|
||||||
|
**Milestone:** M001
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Create the entire `backend/` project from scratch. This is a greenfield project — no source code exists yet. Establish `pyproject.toml` with all pinned dependencies, the package directory structure matching the boundary map (`app/core/`, `app/services/`, `app/routers/`, `app/models/`, `app/middleware/`), and all Pydantic models that every subsequent task imports from.
|
||||||
|
|
||||||
|
The models are pure data classes with no I/O dependencies. The critical implementation detail is `ProgressEvent.from_yt_dlp(job_id, d)` — a classmethod that normalizes raw yt-dlp progress hook dictionaries into a typed model. It must handle `total_bytes: None` (common for subtitles, live streams, and some sites) by falling back to `total_bytes_estimate`, and calculating percent as 0 if both are `None`.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Create `backend/pyproject.toml` with:
|
||||||
|
- `[project]` section: name `media-rip`, python `>=3.12,<3.13`, pinned dependencies: `fastapi==0.135.1`, `uvicorn[standard]==0.42.0`, `yt-dlp==2026.3.17`, `aiosqlite==0.22.1`, `apscheduler==3.11.2`, `pydantic==2.12.5`, `pydantic-settings[yaml]==2.13.1`, `sse-starlette==3.3.3`, `bcrypt==5.0.0`, `python-multipart==0.0.22`, `PyYAML==6.0.2`
|
||||||
|
- `[project.optional-dependencies]` dev: `httpx==0.28.1`, `pytest==9.0.2`, `anyio[trio]`, `pytest-asyncio`, `ruff`
|
||||||
|
- `[tool.pytest.ini_options]` asyncio_mode = "auto"
|
||||||
|
- `[tool.ruff]` target-version = "py312"
|
||||||
|
|
||||||
|
2. Create directory structure with `__init__.py` files:
|
||||||
|
- `backend/app/__init__.py`
|
||||||
|
- `backend/app/core/__init__.py`
|
||||||
|
- `backend/app/models/__init__.py`
|
||||||
|
- `backend/app/services/__init__.py`
|
||||||
|
- `backend/app/routers/__init__.py`
|
||||||
|
- `backend/app/middleware/__init__.py`
|
||||||
|
- `backend/tests/__init__.py`
|
||||||
|
|
||||||
|
3. Create `backend/app/models/job.py` with:
|
||||||
|
- `JobStatus` — string enum: `queued`, `extracting`, `downloading`, `completed`, `failed`, `expired`
|
||||||
|
- `JobCreate` — `url: str`, optional `format_id: str | None`, `quality: str | None`, `output_template: str | None`
|
||||||
|
- `Job` — full model matching DB schema: `id: str` (UUID4), `session_id: str`, `url: str`, `status: JobStatus`, `format_id`, `quality`, `output_template`, `filename: str | None`, `filesize: int | None`, `progress_percent: float` (default 0), `speed: str | None`, `eta: str | None`, `error_message: str | None`, `created_at: str`, `started_at: str | None`, `completed_at: str | None`
|
||||||
|
- `ProgressEvent` — `job_id: str`, `status: str`, `percent: float`, `speed: str | None`, `eta: str | None`, `downloaded_bytes: int | None`, `total_bytes: int | None`, `filename: str | None`. Has `from_yt_dlp(cls, job_id: str, d: dict) -> ProgressEvent` classmethod that normalizes yt-dlp's progress hook dict. Key logic: `total_bytes = d.get("total_bytes") or d.get("total_bytes_estimate")`, percent = `(downloaded / total * 100)` if both exist else `0.0`, speed formatted from bytes/sec, eta from seconds.
|
||||||
|
- `FormatInfo` — `format_id: str`, `ext: str`, `resolution: str | None`, `codec: str | None`, `filesize: int | None`, `format_note: str | None`, `vcodec: str | None`, `acodec: str | None`
|
||||||
|
|
||||||
|
4. Create `backend/app/models/session.py` with:
|
||||||
|
- `Session` — `id: str`, `created_at: str`, `last_seen: str`, `job_count: int` (default 0)
|
||||||
|
|
||||||
|
5. Create `backend/app/main.py` — minimal FastAPI app skeleton:
|
||||||
|
- `from fastapi import FastAPI`
|
||||||
|
- `@asynccontextmanager async def lifespan(app): yield` (placeholder — T04 fills it in)
|
||||||
|
- `app = FastAPI(title="media.rip()", lifespan=lifespan)`
|
||||||
|
|
||||||
|
6. Create `backend/tests/test_models.py`:
|
||||||
|
- Test `JobStatus` enum values
|
||||||
|
- Test `JobCreate` with minimal fields (just url)
|
||||||
|
- Test `Job` construction with all fields
|
||||||
|
- Test `ProgressEvent.from_yt_dlp` with complete dict (total_bytes present)
|
||||||
|
- Test `ProgressEvent.from_yt_dlp` with `total_bytes: None, total_bytes_estimate: 5000`
|
||||||
|
- Test `ProgressEvent.from_yt_dlp` with both `None` → percent = 0.0
|
||||||
|
- Test `ProgressEvent.from_yt_dlp` with `status: "finished"` dict shape
|
||||||
|
- Test `FormatInfo` construction
|
||||||
|
- Test `Session` construction with defaults
|
||||||
|
|
||||||
|
7. Install and run tests: `cd backend && pip install -e ".[dev]" && python -m pytest tests/test_models.py -v`
|
||||||
|
|
||||||
|
## Must-Haves
|
||||||
|
|
||||||
|
- [ ] `pyproject.toml` has all pinned deps from research (exact versions)
|
||||||
|
- [ ] Directory structure matches boundary map: `app/core/`, `app/services/`, `app/routers/`, `app/models/`, `app/middleware/`
|
||||||
|
- [ ] `ProgressEvent.from_yt_dlp` handles `total_bytes: None` gracefully (falls back to `total_bytes_estimate`, then 0.0)
|
||||||
|
- [ ] `JobStatus` is a string enum with all 6 values
|
||||||
|
- [ ] All model tests pass
|
||||||
|
- [ ] `pip install -e ".[dev]"` succeeds without errors
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- `cd backend && pip install -e ".[dev]"` — installs without errors
|
||||||
|
- `cd backend && python -m pytest tests/test_models.py -v` — all tests pass
|
||||||
|
- `cd backend && python -c "from app.models.job import Job, JobStatus, ProgressEvent, JobCreate, FormatInfo; from app.models.session import Session; print('OK')"` — prints OK
|
||||||
|
|
||||||
|
## Observability Impact
|
||||||
|
|
||||||
|
- **Signals changed:** None at runtime — this task creates pure data models with no I/O. No logs, no DB, no network.
|
||||||
|
- **Inspection surfaces:** A future agent can verify the scaffold by importing models: `python -c "from app.models.job import Job, JobStatus, ProgressEvent; print('OK')"`. Package structure is inspectable via `find backend/app -name '*.py'`.
|
||||||
|
- **Failure visibility:** `ProgressEvent.from_yt_dlp` normalizes yt-dlp hook dicts — malformed inputs (missing `total_bytes`, missing `total_bytes_estimate`) produce `percent=0.0` rather than exceptions, which is the designed graceful-degradation path. Model validation errors from Pydantic raise `ValidationError` with field-level detail.
|
||||||
|
|
||||||
|
## Inputs
|
||||||
|
|
||||||
|
- No prior code exists — this is the first task
|
||||||
|
- Research doc specifies all dependency versions, model fields, and directory structure
|
||||||
|
|
||||||
|
## Expected Output
|
||||||
|
|
||||||
|
- `backend/pyproject.toml` — complete project config with pinned dependencies
|
||||||
|
- `backend/app/__init__.py` and all sub-package `__init__.py` files — package structure
|
||||||
|
- `backend/app/main.py` — minimal FastAPI skeleton
|
||||||
|
- `backend/app/models/job.py` — Job, JobStatus, JobCreate, ProgressEvent, FormatInfo models
|
||||||
|
- `backend/app/models/session.py` — Session model
|
||||||
|
- `backend/tests/__init__.py` — test package marker
|
||||||
|
- `backend/tests/test_models.py` — model unit tests (8+ test cases)
|
||||||
105
.gsd/milestones/M001/slices/S01/tasks/T01-SUMMARY.md
Normal file
105
.gsd/milestones/M001/slices/S01/tasks/T01-SUMMARY.md
Normal file
|
|
@ -0,0 +1,105 @@
|
||||||
|
---
|
||||||
|
id: T01
|
||||||
|
parent: S01
|
||||||
|
milestone: M001
|
||||||
|
provides:
|
||||||
|
- Python package structure (backend/app/ with core, models, services, routers, middleware subpackages)
|
||||||
|
- Pydantic models: Job, JobStatus, JobCreate, ProgressEvent (with from_yt_dlp normalizer), FormatInfo, Session
|
||||||
|
- pyproject.toml with all pinned dependencies
|
||||||
|
- Minimal FastAPI app skeleton (backend/app/main.py)
|
||||||
|
- Model unit tests (16 test cases)
|
||||||
|
key_files:
|
||||||
|
- backend/pyproject.toml
|
||||||
|
- backend/app/models/job.py
|
||||||
|
- backend/app/models/session.py
|
||||||
|
- backend/app/main.py
|
||||||
|
- backend/tests/test_models.py
|
||||||
|
key_decisions:
|
||||||
|
- Used Python 3.12 venv (py -3.12) since system default is 3.14 but pyproject.toml requires >=3.12,<3.13
|
||||||
|
- Fixed build-backend from setuptools.backends._legacy:_Backend to setuptools.build_meta for compatibility with pip 24.0's bundled setuptools
|
||||||
|
patterns_established:
|
||||||
|
- ProgressEvent.from_yt_dlp normalizes yt-dlp hook dicts: total_bytes fallback chain (total_bytes → total_bytes_estimate → None), percent=0.0 when both None
|
||||||
|
- Speed formatting: B/s → KiB/s → MiB/s → GiB/s with human-readable output
|
||||||
|
- ETA formatting: seconds → Xs / XmYYs / XhYYmZZs
|
||||||
|
observability_surfaces:
|
||||||
|
- Model validation errors raise Pydantic ValidationError with field-level detail
|
||||||
|
- ProgressEvent.from_yt_dlp gracefully degrades (percent=0.0) instead of raising on missing total_bytes
|
||||||
|
duration: 12m
|
||||||
|
verification_result: passed
|
||||||
|
completed_at: 2026-03-17T22:24:00-05:00
|
||||||
|
blocker_discovered: false
|
||||||
|
---
|
||||||
|
|
||||||
|
# T01: Scaffold project and define Pydantic models
|
||||||
|
|
||||||
|
**Created backend/ project scaffold with pyproject.toml (all pinned deps), package structure matching boundary map, Pydantic models (Job, JobStatus, JobCreate, ProgressEvent with from_yt_dlp normalizer, FormatInfo, Session), FastAPI skeleton, and 16 passing model tests.**
|
||||||
|
|
||||||
|
## What Happened
|
||||||
|
|
||||||
|
Built the entire `backend/` project from scratch as the first task in the greenfield project. Created `pyproject.toml` with all 11 pinned runtime dependencies and 5 dev dependencies. Established the package directory structure with `__init__.py` files for `app/core/`, `app/models/`, `app/services/`, `app/routers/`, and `app/middleware/`.
|
||||||
|
|
||||||
|
Implemented all Pydantic models in `app/models/job.py` and `app/models/session.py`. The critical `ProgressEvent.from_yt_dlp` classmethod normalizes raw yt-dlp progress hook dictionaries with the specified fallback chain: `total_bytes → total_bytes_estimate → None`, with `percent=0.0` when no total is available. Speed and ETA are formatted into human-readable strings.
|
||||||
|
|
||||||
|
Created a minimal FastAPI app in `app/main.py` with a placeholder lifespan context manager (T04 will wire DB and services).
|
||||||
|
|
||||||
|
Wrote 16 model unit tests covering all models, enum values, the complete ProgressEvent normalization path (complete data, fallback to estimate, both None, finished status, minimal dict), and edge cases.
|
||||||
|
|
||||||
|
Had to fix the build-backend in `pyproject.toml` from `setuptools.backends._legacy:_Backend` to `setuptools.build_meta` because the Python 3.12 venv's setuptools didn't have the newer backend module.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
All three task-level verification commands pass:
|
||||||
|
|
||||||
|
1. `pip install -e ".[dev]"` — installed successfully with all dependencies
|
||||||
|
2. `python -m pytest tests/test_models.py -v` — 16/16 tests pass
|
||||||
|
3. `python -c "from app.models.job import Job, JobStatus, ProgressEvent, JobCreate, FormatInfo; from app.models.session import Session; print('OK')"` — prints OK
|
||||||
|
|
||||||
|
## Verification Evidence
|
||||||
|
|
||||||
|
| # | Command | Exit Code | Verdict | Duration |
|
||||||
|
|---|---------|-----------|---------|----------|
|
||||||
|
| 1 | `pip install -e ".[dev]"` | 0 | ✅ pass | 43.8s |
|
||||||
|
| 2 | `python -m pytest tests/test_models.py -v` | 0 | ✅ pass | 0.12s |
|
||||||
|
| 3 | `python -c "from app.models.job import ...;print('OK')"` | 0 | ✅ pass | <1s |
|
||||||
|
| 4 | `python -m pytest tests/ -v` (full suite) | 0 | ✅ pass | 0.07s |
|
||||||
|
|
||||||
|
### Slice-level verification (T01 scope):
|
||||||
|
|
||||||
|
| # | Slice Check | Status | Notes |
|
||||||
|
|---|-------------|--------|-------|
|
||||||
|
| 1 | `pytest tests/test_models.py -v` | ✅ pass | 16/16 tests |
|
||||||
|
| 2 | `pytest tests/test_config.py -v` | ⏳ pending | T02 |
|
||||||
|
| 3 | `pytest tests/test_database.py -v` | ⏳ pending | T02 |
|
||||||
|
| 4 | `pytest tests/test_sse_broker.py -v` | ⏳ pending | T02 |
|
||||||
|
| 5 | `pytest tests/test_download_service.py -v` | ⏳ pending | T03 |
|
||||||
|
| 6 | `pytest tests/test_api.py -v` | ⏳ pending | T04 |
|
||||||
|
| 7 | `pytest tests/ -v` (full suite) | ⏳ partial | Only test_models.py exists |
|
||||||
|
|
||||||
|
## Diagnostics
|
||||||
|
|
||||||
|
- Import check: `python -c "from app.models.job import Job, JobStatus, ProgressEvent, JobCreate, FormatInfo; from app.models.session import Session; print('OK')"`
|
||||||
|
- Structure check: `find backend/app -name '*.py' | grep -v .venv | sort`
|
||||||
|
- Venv activation: `source backend/.venv/Scripts/activate` (Python 3.12.4)
|
||||||
|
|
||||||
|
## Deviations
|
||||||
|
|
||||||
|
- Changed `pyproject.toml` build-backend from `setuptools.backends._legacy:_Backend` to `setuptools.build_meta` because the legacy backend module doesn't exist in setuptools bundled with Python 3.12.4's pip. This is a minor tooling fix, not an architectural change.
|
||||||
|
|
||||||
|
## Known Issues
|
||||||
|
|
||||||
|
- None
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `backend/pyproject.toml` — project config with all pinned dependencies
|
||||||
|
- `backend/app/__init__.py` — package root
|
||||||
|
- `backend/app/core/__init__.py` — core subpackage marker
|
||||||
|
- `backend/app/models/__init__.py` — models subpackage marker
|
||||||
|
- `backend/app/services/__init__.py` — services subpackage marker
|
||||||
|
- `backend/app/routers/__init__.py` — routers subpackage marker
|
||||||
|
- `backend/app/middleware/__init__.py` — middleware subpackage marker
|
||||||
|
- `backend/app/main.py` — minimal FastAPI app skeleton with placeholder lifespan
|
||||||
|
- `backend/app/models/job.py` — JobStatus, JobCreate, Job, ProgressEvent, FormatInfo models
|
||||||
|
- `backend/app/models/session.py` — Session model
|
||||||
|
- `backend/tests/__init__.py` — test package marker
|
||||||
|
- `backend/tests/test_models.py` — 16 model unit tests
|
||||||
18
.gsd/milestones/M001/slices/S01/tasks/T01-VERIFY.json
Normal file
18
.gsd/milestones/M001/slices/S01/tasks/T01-VERIFY.json
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
{
|
||||||
|
"schemaVersion": 1,
|
||||||
|
"taskId": "T01",
|
||||||
|
"unitId": "M001/S01/T01",
|
||||||
|
"timestamp": 1773804770955,
|
||||||
|
"passed": false,
|
||||||
|
"discoverySource": "task-plan",
|
||||||
|
"checks": [
|
||||||
|
{
|
||||||
|
"command": "pip install -e \".[dev]\"",
|
||||||
|
"exitCode": 1,
|
||||||
|
"durationMs": 650,
|
||||||
|
"verdict": "fail"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"retryAttempt": 2,
|
||||||
|
"maxRetries": 2
|
||||||
|
}
|
||||||
111
.gsd/milestones/M001/slices/S01/tasks/T02-PLAN.md
Normal file
111
.gsd/milestones/M001/slices/S01/tasks/T02-PLAN.md
Normal file
|
|
@ -0,0 +1,111 @@
|
||||||
|
---
|
||||||
|
estimated_steps: 7
|
||||||
|
estimated_files: 7
|
||||||
|
---
|
||||||
|
|
||||||
|
# T02: Build config system, database layer, and SSE broker
|
||||||
|
|
||||||
|
**Slice:** S01 — Foundation + Download Engine
|
||||||
|
**Milestone:** M001
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Build the three infrastructure modules that the download service and API routes depend on: the pydantic-settings config system, the aiosqlite database layer with WAL mode, and the SSE broker for thread-safe per-session event distribution. Also establish the shared test fixtures in `conftest.py`.
|
||||||
|
|
||||||
|
The config system uses `pydantic-settings[yaml]` with env prefix `MEDIARIP` and nested delimiter `__`. It must handle a missing `config.yaml` gracefully (zero-config mode). The database must execute WAL + busy_timeout + synchronous PRAGMAs before any schema creation — this is critical for concurrent download writes. The SSE broker stores a reference to the event loop captured at init time and uses `loop.call_soon_threadsafe(queue.put_nowait, event)` for thread-safe publishing.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Create `backend/app/core/config.py`:
|
||||||
|
- Import `pydantic_settings.BaseSettings`, `pydantic.BaseModel`
|
||||||
|
- Define nested config models: `ServerConfig` (host, port, log_level, db_path defaulting to `"mediarip.db"`), `DownloadsConfig` (output_dir, max_concurrent, source_templates dict, default_template), `SessionConfig` (mode, timeout_hours), `PurgeConfig` (enabled, max_age_hours, cron), `UIConfig` (default_theme), `AdminConfig` (enabled, username, password_hash)
|
||||||
|
- `AppConfig(BaseSettings)` with `model_config = SettingsConfigDict(env_prefix="MEDIARIP", env_nested_delimiter="__", yaml_file=None)`. Nested models with sensible defaults: `server: ServerConfig = ServerConfig()`, `downloads: DownloadsConfig = DownloadsConfig()`, etc.
|
||||||
|
- Override `settings_customise_sources` to order: `env_settings` → `YamlConfigSettingsSource` → `init_settings` → `dotenv_settings`. Wrap YAML source to handle missing file gracefully (return empty dict if file doesn't exist or `yaml_file` is None).
|
||||||
|
- Defaults: `downloads.output_dir="/downloads"`, `downloads.max_concurrent=3`, `downloads.source_templates={"youtube.com": "%(uploader)s/%(title)s.%(ext)s", "soundcloud.com": "%(uploader)s/%(title)s.%(ext)s", "*": "%(title)s.%(ext)s"}`, `session.mode="isolated"`, `session.timeout_hours=72`, `admin.enabled=False`
|
||||||
|
|
||||||
|
2. Create `backend/app/core/database.py`:
|
||||||
|
- Async functions: `init_db(db_path: str) -> aiosqlite.Connection` — opens connection, sets `row_factory = aiosqlite.Row`, executes PRAGMAs in this exact order: `PRAGMA busy_timeout=5000`, `PRAGMA journal_mode=WAL`, `PRAGMA synchronous=NORMAL`. Then creates tables.
|
||||||
|
- Schema: `sessions` (id TEXT PRIMARY KEY, created_at TEXT, last_seen TEXT), `jobs` (id TEXT PRIMARY KEY, session_id TEXT, url TEXT, status TEXT, format_id TEXT, quality TEXT, output_template TEXT, filename TEXT, filesize INTEGER, progress_percent REAL DEFAULT 0, speed TEXT, eta TEXT, error_message TEXT, created_at TEXT, started_at TEXT, completed_at TEXT), `config` (key TEXT PRIMARY KEY, value TEXT, updated_at TEXT), `unsupported_urls` (id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT, session_id TEXT, error TEXT, created_at TEXT)
|
||||||
|
- Indexes: `CREATE INDEX IF NOT EXISTS idx_jobs_session_status ON jobs(session_id, status)`, `CREATE INDEX IF NOT EXISTS idx_jobs_completed ON jobs(completed_at)`, `CREATE INDEX IF NOT EXISTS idx_sessions_last_seen ON sessions(last_seen)`
|
||||||
|
- CRUD functions: `create_job(db, job: Job) -> Job`, `get_job(db, job_id: str) -> Job | None`, `get_jobs_by_session(db, session_id: str) -> list[Job]`, `update_job_status(db, job_id: str, status: str, error_message: str | None = None)`, `update_job_progress(db, job_id: str, progress_percent: float, speed: str | None, eta: str | None, filename: str | None)`, `delete_job(db, job_id: str)`, `close_db(db)` — calls `db.close()`
|
||||||
|
- All write operations use `await db.commit()` after execution
|
||||||
|
|
||||||
|
3. Create `backend/app/core/sse_broker.py`:
|
||||||
|
- `SSEBroker` class with `__init__(self, loop: asyncio.AbstractEventLoop)`
|
||||||
|
- Internal state: `self._subscribers: dict[str, list[asyncio.Queue]] = {}`, `self._loop = loop`
|
||||||
|
- `subscribe(session_id: str) -> asyncio.Queue` — creates queue, appends to session's list, returns queue
|
||||||
|
- `unsubscribe(session_id: str, queue: asyncio.Queue)` — removes queue from list, removes session key if list empty
|
||||||
|
- `publish(session_id: str, event)` — uses `self._loop.call_soon_threadsafe(self._publish_sync, session_id, event)` where `_publish_sync` iterates all queues for that session and calls `queue.put_nowait(event)` (catches `asyncio.QueueFull` and logs warning)
|
||||||
|
- `publish_sync(session_id: str, event)` — the actual sync method called on the event loop thread, iterates queues and calls `put_nowait`
|
||||||
|
|
||||||
|
4. Create `backend/tests/conftest.py`:
|
||||||
|
- `tmp_db_path` fixture: returns a temp file path for test database, cleans up after
|
||||||
|
- `test_config` fixture: returns `AppConfig` with `downloads.output_dir` set to a temp dir
|
||||||
|
- `db` async fixture: calls `init_db(tmp_db_path)`, yields connection, calls `close_db`
|
||||||
|
- `broker` fixture: creates SSEBroker with current event loop
|
||||||
|
- Mark all async fixtures with appropriate scope
|
||||||
|
|
||||||
|
5. Create `backend/tests/test_config.py`:
|
||||||
|
- Test zero-config: `AppConfig()` loads with all defaults, no crash
|
||||||
|
- Test env var override: set `MEDIARIP__DOWNLOADS__MAX_CONCURRENT=5` in env, verify `config.downloads.max_concurrent == 5`
|
||||||
|
- Test YAML loading: write a temp YAML file, set `yaml_file` path, verify values load
|
||||||
|
- Test missing YAML file: set `yaml_file` to nonexistent path, verify no crash (zero-config)
|
||||||
|
- Test default source_templates contains youtube.com, soundcloud.com, and `*` entries
|
||||||
|
|
||||||
|
6. Create `backend/tests/test_database.py`:
|
||||||
|
- Test `init_db` creates all tables (query `sqlite_master`)
|
||||||
|
- Test WAL mode: `PRAGMA journal_mode` returns `wal`
|
||||||
|
- Test `create_job` + `get_job` roundtrip
|
||||||
|
- Test `get_jobs_by_session` returns correct subset
|
||||||
|
- Test `update_job_status` changes status field
|
||||||
|
- Test `update_job_progress` updates progress fields
|
||||||
|
- Test `delete_job` removes the row
|
||||||
|
- Test concurrent writes: launch 3 simultaneous `create_job` calls via `asyncio.gather`, verify all succeed without `SQLITE_BUSY`
|
||||||
|
|
||||||
|
7. Create `backend/tests/test_sse_broker.py`:
|
||||||
|
- Test subscribe creates a queue and returns it
|
||||||
|
- Test publish delivers event to subscribed queue
|
||||||
|
- Test publish from a thread (simulating yt-dlp worker): start a `threading.Thread` that calls `broker.publish(session_id, event)`, verify event arrives in queue within 1 second
|
||||||
|
- Test unsubscribe removes queue, subsequent publish doesn't deliver
|
||||||
|
- Test multiple subscribers to same session all receive the event
|
||||||
|
- Test publish to non-existent session doesn't raise
|
||||||
|
|
||||||
|
## Must-Haves
|
||||||
|
|
||||||
|
- [ ] Config: zero-config mode works (no YAML, no env vars → all defaults)
|
||||||
|
- [ ] Config: env var with `MEDIARIP__` prefix and `__` nesting overrides config
|
||||||
|
- [ ] Database: WAL mode verified via `PRAGMA journal_mode` query returning `wal`
|
||||||
|
- [ ] Database: `busy_timeout=5000` set before schema creation
|
||||||
|
- [ ] Database: All four tables created with correct schema
|
||||||
|
- [ ] Database: 3 concurrent writes succeed without `SQLITE_BUSY`
|
||||||
|
- [ ] SSE Broker: publish from a separate thread delivers event to subscriber queue
|
||||||
|
- [ ] SSE Broker: unsubscribe removes queue from distribution
|
||||||
|
- [ ] All tests pass
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- `cd backend && python -m pytest tests/test_config.py -v` — all config tests pass
|
||||||
|
- `cd backend && python -m pytest tests/test_database.py -v` — all DB tests pass including WAL verification and concurrent writes
|
||||||
|
- `cd backend && python -m pytest tests/test_sse_broker.py -v` — all broker tests pass including thread-safe publish
|
||||||
|
|
||||||
|
## Observability Impact
|
||||||
|
|
||||||
|
- Database module logs table creation and PRAGMA results at startup (INFO level)
|
||||||
|
- SSEBroker logs `QueueFull` warnings if a subscriber queue is backed up
|
||||||
|
- Job status transitions visible via `jobs` table `status` column
|
||||||
|
|
||||||
|
## Inputs
|
||||||
|
|
||||||
|
- `backend/app/models/job.py` — Job, JobStatus models for database type hints
|
||||||
|
- `backend/app/models/session.py` — Session model
|
||||||
|
- `backend/pyproject.toml` — dependencies already installed from T01
|
||||||
|
|
||||||
|
## Expected Output
|
||||||
|
|
||||||
|
- `backend/app/core/config.py` — AppConfig with nested models, pydantic-settings integration
|
||||||
|
- `backend/app/core/database.py` — init_db, CRUD functions, WAL mode setup
|
||||||
|
- `backend/app/core/sse_broker.py` — SSEBroker with thread-safe publish
|
||||||
|
- `backend/tests/conftest.py` — shared test fixtures (db, config, broker)
|
||||||
|
- `backend/tests/test_config.py` — config test suite
|
||||||
|
- `backend/tests/test_database.py` — database test suite with concurrency test
|
||||||
|
- `backend/tests/test_sse_broker.py` — broker test suite with thread-safety test
|
||||||
130
.gsd/milestones/M001/slices/S01/tasks/T03-PLAN.md
Normal file
130
.gsd/milestones/M001/slices/S01/tasks/T03-PLAN.md
Normal file
|
|
@ -0,0 +1,130 @@
|
||||||
|
---
|
||||||
|
estimated_steps: 5
|
||||||
|
estimated_files: 5
|
||||||
|
---
|
||||||
|
|
||||||
|
# T03: Implement download service with sync-to-async bridge
|
||||||
|
|
||||||
|
**Slice:** S01 — Foundation + Download Engine
|
||||||
|
**Milestone:** M001
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Build the download service — the highest-risk component in S01. This is where yt-dlp (synchronous, thread-bound) meets FastAPI (async, event-loop-bound). The service wraps yt-dlp in a `ThreadPoolExecutor` and bridges progress events to the async world via `loop.call_soon_threadsafe`. Also build the output template resolver utility.
|
||||||
|
|
||||||
|
This task retires the primary risk identified in the M001 roadmap: **"proving yt-dlp progress events arrive in an asyncio.Queue via call_soon_threadsafe, with a test that runs a real download and asserts events were received."**
|
||||||
|
|
||||||
|
**Critical implementation constraints:**
|
||||||
|
- **Fresh YoutubeDL instance per job** — never shared across threads. YoutubeDL has mutable state (cookies, temp files, logger) that corrupts under concurrent access.
|
||||||
|
- **Event loop captured at construction** — `asyncio.get_event_loop()` in `__init__`, stored as `self._loop`. Cannot call `get_event_loop()` inside a worker thread.
|
||||||
|
- **Progress hook throttling** — Write to DB only when percent changes by ≥1% or status changes. SSE broker gets all events (cheap in-memory), DB gets throttled writes.
|
||||||
|
- **`total_bytes` is frequently None** — Already handled in `ProgressEvent.from_yt_dlp` from T01, but the hook must not crash when the dict is sparse.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Create `backend/app/services/output_template.py`:
|
||||||
|
- `resolve_template(url: str, user_override: str | None, config: AppConfig) -> str`
|
||||||
|
- Extract domain from URL using `urllib.parse.urlparse`. Strip `www.` prefix.
|
||||||
|
- If `user_override` is not None, return it directly (R025 per-download override)
|
||||||
|
- Look up domain in `config.downloads.source_templates`. If found, return it.
|
||||||
|
- Fall back to `config.downloads.source_templates.get("*", "%(title)s.%(ext)s")`
|
||||||
|
- Handle malformed URLs gracefully (return default template)
|
||||||
|
|
||||||
|
2. Create `backend/app/services/download.py`:
|
||||||
|
- `DownloadService` class. Constructor takes `config: AppConfig`, `db: aiosqlite.Connection`, `broker: SSEBroker`, `loop: asyncio.AbstractEventLoop`.
|
||||||
|
- `self._executor = ThreadPoolExecutor(max_workers=config.downloads.max_concurrent)`
|
||||||
|
- `async def enqueue(self, job_create: JobCreate, session_id: str) -> Job`:
|
||||||
|
- Generate UUID4 for job_id, resolve output template via `resolve_template`
|
||||||
|
- Create Job model, persist via `create_job(self._db, job)` (from database module)
|
||||||
|
- Submit `self._run_download` to executor via `self._loop.run_in_executor(self._executor, self._run_download, job.id, job.url, opts, session_id)`
|
||||||
|
- Return the Job
|
||||||
|
- `def _run_download(self, job_id: str, url: str, opts: dict, session_id: str)`:
|
||||||
|
- This runs in a worker thread. **Create a fresh YoutubeDL instance** with opts.
|
||||||
|
- Register a `progress_hooks` callback that:
|
||||||
|
- Creates `ProgressEvent.from_yt_dlp(job_id, d)` from the hook dict
|
||||||
|
- Calls `self._loop.call_soon_threadsafe(self._broker.publish_sync, session_id, event)` (NOT `publish` — call the sync method directly since we're already scheduling on the event loop)
|
||||||
|
- Throttles DB writes: track `_last_db_percent` per job, only write when `abs(new - last) >= 1.0` or status changed
|
||||||
|
- DB writes from the thread use `asyncio.run_coroutine_threadsafe(update_job_progress(...), self._loop).result()` — blocks the worker thread until the async DB write completes
|
||||||
|
- Call `ydl.download([url])`
|
||||||
|
- On success: update status to `completed`, set `completed_at`
|
||||||
|
- On exception: update status to `failed`, set `error_message` to str(e), log the error
|
||||||
|
- `async def get_formats(self, url: str) -> list[FormatInfo]`:
|
||||||
|
- Run in executor: `ydl.extract_info(url, download=False)`
|
||||||
|
- Parse result `formats` list into `FormatInfo` models
|
||||||
|
- Handle `filesize: None` gracefully
|
||||||
|
- Return list sorted by resolution (best first)
|
||||||
|
- `async def cancel(self, job_id: str)`:
|
||||||
|
- Update job status to `failed` with error_message "Cancelled by user" in DB
|
||||||
|
- Note: yt-dlp has no reliable mid-stream abort. The thread continues but the job is marked failed.
|
||||||
|
- `def shutdown(self)`:
|
||||||
|
- `self._executor.shutdown(wait=False)`
|
||||||
|
|
||||||
|
3. Create `backend/tests/test_output_template.py`:
|
||||||
|
- Test YouTube URL → youtube.com template
|
||||||
|
- Test SoundCloud URL → soundcloud.com template
|
||||||
|
- Test unknown domain → fallback `*` template
|
||||||
|
- Test `www.` prefix stripping (www.youtube.com → youtube.com lookup)
|
||||||
|
- Test user override takes priority over domain match
|
||||||
|
- Test malformed URL → fallback template
|
||||||
|
|
||||||
|
4. Create `backend/tests/test_download_service.py`:
|
||||||
|
- **Integration test — real download** (mark with `@pytest.mark.integration` or `@pytest.mark.slow`):
|
||||||
|
- Set up: create temp output dir, init DB, create SSEBroker, create DownloadService
|
||||||
|
- Subscribe to broker queue for the test session
|
||||||
|
- Call `service.enqueue(JobCreate(url="https://www.youtube.com/watch?v=BaW_jenozKc"), session_id="test-session")` — this is a 10-second Creative Commons video commonly used in yt-dlp tests. If this URL stops working, any short public video works.
|
||||||
|
- Collect events from broker queue with a timeout (10-30 seconds depending on network)
|
||||||
|
- Assert: at least one event has `status == "downloading"` with `percent > 0`
|
||||||
|
- Assert: final event has `status == "finished"` (this is yt-dlp's hook status, not JobStatus)
|
||||||
|
- Assert: output file exists in the temp dir
|
||||||
|
- Assert: DB job status is `completed`
|
||||||
|
- **Format extraction test** (also integration — needs network):
|
||||||
|
- Call `service.get_formats("https://www.youtube.com/watch?v=BaW_jenozKc")`
|
||||||
|
- Assert: result is non-empty list
|
||||||
|
- Assert: each FormatInfo has `format_id` and `ext` populated
|
||||||
|
- **Cancel test** (unit — no network):
|
||||||
|
- Create a job in DB with status `downloading`
|
||||||
|
- Call `service.cancel(job_id)`
|
||||||
|
- Assert: DB job status is now `failed` with error_message "Cancelled by user"
|
||||||
|
- **Concurrent enqueue test** (integration — light):
|
||||||
|
- Enqueue 2 downloads simultaneously via `asyncio.gather`
|
||||||
|
- Verify both complete without errors (proves ThreadPoolExecutor + WAL work together)
|
||||||
|
|
||||||
|
5. Run all tests: `cd backend && python -m pytest tests/test_output_template.py tests/test_download_service.py -v`
|
||||||
|
|
||||||
|
## Must-Haves
|
||||||
|
|
||||||
|
- [ ] Fresh YoutubeDL instance created per job inside worker thread (never shared)
|
||||||
|
- [ ] Progress events bridge from worker thread to SSE broker via `call_soon_threadsafe`
|
||||||
|
- [ ] Real download integration test passes — file appears in output dir AND progress events received
|
||||||
|
- [ ] Format extraction returns non-empty list with `format_id` and `ext`
|
||||||
|
- [ ] DB progress writes throttled (≥1% change or status change)
|
||||||
|
- [ ] Output template resolves domain-specific and fallback correctly
|
||||||
|
- [ ] `total_bytes: None` doesn't crash the progress hook
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- `cd backend && python -m pytest tests/test_output_template.py -v` — all template tests pass
|
||||||
|
- `cd backend && python -m pytest tests/test_download_service.py -v` — all service tests pass including real download
|
||||||
|
- `cd backend && python -m pytest tests/test_download_service.py -v -k "real_download"` — specifically verify the risk-retirement test
|
||||||
|
|
||||||
|
## Observability Impact
|
||||||
|
|
||||||
|
- Download worker logs job_id + status transitions at INFO level
|
||||||
|
- Download errors logged at ERROR level with job_id + exception traceback
|
||||||
|
- Progress hook logs throttling decisions at DEBUG level
|
||||||
|
- `jobs` table `error_message` column populated on failure
|
||||||
|
|
||||||
|
## Inputs
|
||||||
|
|
||||||
|
- `backend/app/models/job.py` — Job, JobCreate, ProgressEvent, FormatInfo, JobStatus
|
||||||
|
- `backend/app/core/config.py` — AppConfig with downloads settings
|
||||||
|
- `backend/app/core/database.py` — init_db, CRUD functions
|
||||||
|
- `backend/app/core/sse_broker.py` — SSEBroker with publish/subscribe
|
||||||
|
- `backend/tests/conftest.py` — shared fixtures (db, config, broker)
|
||||||
|
|
||||||
|
## Expected Output
|
||||||
|
|
||||||
|
- `backend/app/services/output_template.py` — resolve_template utility
|
||||||
|
- `backend/app/services/download.py` — DownloadService with enqueue, get_formats, cancel
|
||||||
|
- `backend/tests/test_output_template.py` — template resolution tests
|
||||||
|
- `backend/tests/test_download_service.py` — integration tests proving sync-to-async bridge works
|
||||||
101
.gsd/milestones/M001/slices/S01/tasks/T04-PLAN.md
Normal file
101
.gsd/milestones/M001/slices/S01/tasks/T04-PLAN.md
Normal file
|
|
@ -0,0 +1,101 @@
|
||||||
|
---
|
||||||
|
estimated_steps: 5
|
||||||
|
estimated_files: 7
|
||||||
|
---
|
||||||
|
|
||||||
|
# T04: Wire API routes and FastAPI app factory
|
||||||
|
|
||||||
|
**Slice:** S01 — Foundation + Download Engine
|
||||||
|
**Milestone:** M001
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
Build the HTTP layer that ties everything together: the FastAPI app factory with lifespan (DB init/close, service construction), API routers for downloads and format extraction, a stub session dependency for testing, and API-level tests via httpx. This is the composition task — it proves the full vertical from HTTP request through to yt-dlp and back.
|
||||||
|
|
||||||
|
The stub session dependency reads `X-Session-ID` from request headers, falling back to a default UUID. This is explicitly documented as S02-replaceable — S02 delivers real cookie-based session middleware that replaces this dependency entirely.
|
||||||
|
|
||||||
|
**Important:** The API tests use `httpx.AsyncClient` with `ASGITransport` — no real server is started. This is FastAPI's recommended testing pattern.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Create `backend/app/dependencies.py`:
|
||||||
|
- `get_session_id(request: Request) -> str` dependency function
|
||||||
|
- Reads `X-Session-ID` header from request. If present, return it.
|
||||||
|
- If not present, return a default UUID string (e.g., `"00000000-0000-0000-0000-000000000000"`)
|
||||||
|
- Add a docstring clearly marking this as a stub: `"""Stub session ID dependency. S02 replaces this with cookie-based session middleware."""`
|
||||||
|
|
||||||
|
2. Update `backend/app/main.py` — full app factory with lifespan:
|
||||||
|
- `@asynccontextmanager async def lifespan(app: FastAPI)`:
|
||||||
|
- Load config: `config = AppConfig(yaml_file="config.yaml")` if file exists, else `AppConfig()`
|
||||||
|
- Init DB: `db = await init_db(config.server.db_path)`
|
||||||
|
- Capture event loop: `loop = asyncio.get_event_loop()`
|
||||||
|
- Create SSEBroker: `broker = SSEBroker(loop)`
|
||||||
|
- Create DownloadService: `download_service = DownloadService(config, db, broker, loop)`
|
||||||
|
- Store on `app.state`: `app.state.config = config`, `app.state.db = db`, `app.state.broker = broker`, `app.state.download_service = download_service`
|
||||||
|
- `yield`
|
||||||
|
- Teardown: `download_service.shutdown()`, `await close_db(db)`
|
||||||
|
- Include routers: `app.include_router(downloads_router, prefix="/api")`, `app.include_router(formats_router, prefix="/api")`
|
||||||
|
|
||||||
|
3. Create `backend/app/routers/downloads.py`:
|
||||||
|
- `router = APIRouter(tags=["downloads"])`
|
||||||
|
- `POST /downloads` — accepts `JobCreate` body, gets `session_id` from `Depends(get_session_id)`, gets `download_service` from `request.app.state.download_service`. Calls `await download_service.enqueue(job_create, session_id)`. Returns Job as JSON with status 201.
|
||||||
|
- `GET /downloads` — gets session_id, queries DB via `get_jobs_by_session(request.app.state.db, session_id)`. Returns list of Jobs.
|
||||||
|
- `DELETE /downloads/{job_id}` — calls `await download_service.cancel(job_id)`. Returns `{"status": "cancelled"}`.
|
||||||
|
|
||||||
|
4. Create `backend/app/routers/formats.py`:
|
||||||
|
- `router = APIRouter(tags=["formats"])`
|
||||||
|
- `GET /formats` — accepts `url: str` query param. Gets download_service from app.state. Calls `await download_service.get_formats(url)`. Returns list of FormatInfo.
|
||||||
|
- Handle errors gracefully: if extraction fails, return 400 with error message.
|
||||||
|
|
||||||
|
5. Create/update `backend/tests/test_api.py` and update `backend/tests/conftest.py`:
|
||||||
|
- Add `client` async fixture to conftest: creates `httpx.AsyncClient` with `ASGITransport(app=app)`, base_url `http://test`
|
||||||
|
- The app fixture needs a fresh lifespan — use temp DB path and temp output dir
|
||||||
|
- Tests:
|
||||||
|
- `test_post_download` — POST `/api/downloads` with `{"url": "https://www.youtube.com/watch?v=BaW_jenozKc"}` and `X-Session-ID: test-session` header → 201 + response has `id`, `status == "queued"`, `url` matches
|
||||||
|
- `test_get_downloads_empty` — GET `/api/downloads` with `X-Session-ID: new-session` → 200 + empty list
|
||||||
|
- `test_get_downloads_after_post` — POST a download, then GET → list contains the job
|
||||||
|
- `test_delete_download` — POST a download, then DELETE → 200 + status cancelled, GET confirms status changed
|
||||||
|
- `test_get_formats` — GET `/api/formats?url=https://www.youtube.com/watch?v=BaW_jenozKc` → 200 + non-empty list with format_id fields (integration — needs network)
|
||||||
|
- `test_post_download_invalid_url` — POST with `{"url": "not-a-url"}` → appropriate error response
|
||||||
|
- Run full suite: `cd backend && python -m pytest tests/ -v`
|
||||||
|
|
||||||
|
## Must-Haves
|
||||||
|
|
||||||
|
- [ ] App starts without errors via lifespan (DB initialized, services created)
|
||||||
|
- [ ] POST /api/downloads creates a job and returns it with status 201
|
||||||
|
- [ ] GET /api/downloads returns jobs filtered by session_id
|
||||||
|
- [ ] DELETE /api/downloads/{id} marks job as cancelled/failed
|
||||||
|
- [ ] GET /api/formats?url= returns format list from yt-dlp extraction
|
||||||
|
- [ ] Stub session_id dependency reads X-Session-ID header with fallback
|
||||||
|
- [ ] Full test suite (`python -m pytest tests/ -v`) passes with 0 failures
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- `cd backend && python -m pytest tests/test_api.py -v` — all API tests pass
|
||||||
|
- `cd backend && python -m pytest tests/ -v` — FULL suite (models + config + db + broker + download + template + api) passes with 0 failures
|
||||||
|
- `python -c "from app.main import app; print(app.title)"` — prints "media.rip()"
|
||||||
|
|
||||||
|
## Observability Impact
|
||||||
|
|
||||||
|
- App lifespan logs config source (YAML/env/defaults) and DB path at startup (INFO level)
|
||||||
|
- API routes log incoming requests with session_id at DEBUG level
|
||||||
|
- Error responses include structured error messages (not stack traces)
|
||||||
|
|
||||||
|
## Inputs
|
||||||
|
|
||||||
|
- `backend/app/models/job.py` — Job, JobCreate, FormatInfo models
|
||||||
|
- `backend/app/core/config.py` — AppConfig
|
||||||
|
- `backend/app/core/database.py` — init_db, close_db, CRUD functions
|
||||||
|
- `backend/app/core/sse_broker.py` — SSEBroker
|
||||||
|
- `backend/app/services/download.py` — DownloadService
|
||||||
|
- `backend/tests/conftest.py` — shared fixtures from T02
|
||||||
|
|
||||||
|
## Expected Output
|
||||||
|
|
||||||
|
- `backend/app/dependencies.py` — stub session_id dependency
|
||||||
|
- `backend/app/main.py` — complete app factory with lifespan, router mounting
|
||||||
|
- `backend/app/routers/downloads.py` — POST/GET/DELETE download endpoints
|
||||||
|
- `backend/app/routers/formats.py` — GET formats endpoint
|
||||||
|
- `backend/tests/test_api.py` — API test suite (6+ test cases)
|
||||||
|
- `backend/tests/conftest.py` — updated with httpx client fixture
|
||||||
|
- All prior test files still passing (full regression)
|
||||||
Loading…
Add table
Reference in a new issue