docs: define v1 requirements

2026-04-11 01:40:49 -05:00 · 2026-04-11 01:40:49 -05:00 · 93bd57d386
commit 93bd57d386
parent b51cf7cd6b
1 changed files with 113 additions and 0 deletions
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@ -0,0 +1,113 @@
+# Requirements: AI Music Pipeline
+
+**Defined:** 2026-04-11
+**Core Value:** A hummed melody input must produce instrument-specific output that audibly follows the melody's contour and rhythm
+
+## v1 Requirements
+
+Requirements for initial release. Each maps to roadmap phases.
+
+### Melody Fidelity
+
+- [ ] **MEL-01**: Output audio audibly follows the pitch contour of the hummed input melody
+- [ ] **MEL-02**: Output audio preserves the rhythmic timing and phrasing of the hummed input
+- [ ] **MEL-03**: User can control fidelity vs creativity tradeoff via cover_strength parameter
+- [ ] **MEL-04**: Output is musically coherent — sounds like a real instrument performance, not garbled audio
+
+### Instrument Selection
+
+- [ ] **INST-01**: User can specify target instrument via text prompt (piano, guitar, saxophone, etc.)
+- [ ] **INST-02**: Different instrument prompts produce audibly different timbres in the output
+- [ ] **INST-03**: Instrument selection works across at least 5 distinct instrument types
+
+### Input Handling
+
+- [ ] **INP-01**: Pipeline accepts raw humming WAV audio as input with no manual preprocessing
+- [ ] **INP-02**: Pipeline auto-detects input audio duration and configures output duration appropriately for quality
+- [ ] **INP-03**: Input audio at common sample rates (44.1kHz, 48kHz, 16kHz) is handled without errors
+
+### Output Quality
+
+- [ ] **OUT-01**: Output audio is at least 44.1kHz sample rate (CD quality)
+- [ ] **OUT-02**: Output is saved as WAV file to a user-specified or default output directory
+- [ ] **OUT-03**: Output filenames include instrument name and timestamp for easy identification
+
+### Reproducibility
+
+- [ ] **REPR-01**: User can set a seed value to reproduce identical output from the same input + prompt
+- [ ] **REPR-02**: Different seeds with the same input + prompt produce meaningfully different outputs
+
+### Pipeline Usability
+
+- [ ] **PIPE-01**: Single CLI command or script invocation to go from humming WAV to instrument output
+- [ ] **PIPE-02**: Configuration via TOML file or CLI arguments for instrument, strength, duration, seed
+- [ ] **PIPE-03**: Clear error messages when input file is missing, corrupted, or in unsupported format
+
+## v2 Requirements
+
+Deferred to future release. Tracked but not in current roadmap.
+
+### Multi-Instrument Rendering
+
+- **MULTI-01**: Hum once, generate multiple instrument versions automatically (batch over captions)
+- **MULTI-02**: User specifies list of target instruments, pipeline renders all in one run
+
+### Evaluation & Metrics
+
+- **EVAL-01**: Pitch contour comparison score between input humming and output audio
+- **EVAL-02**: Per-output fidelity report showing how closely melody was followed
+
+### Input Flexibility
+
+- **FLEX-01**: Accept MIDI files as input (render via FluidSynth before generation)
+- **FLEX-02**: Auto-detect BPM from humming input
+- **FLEX-03**: Accept whistling and played-instrument recordings as input
+
+### Style Control
+
+- **STYLE-01**: Caption templates for genre/mood beyond instrument (jazz, rock, classical, etc.)
+- **STYLE-02**: Named parameter presets (faithful / creative / loose interpretation)
+
+## Out of Scope
+
+| Feature | Reason |
+|---------|--------|
+| MusicGen melody debugging via HF transformers | Root cause identified (missing cfg_coef_beta, no Demucs, broken null conditioning). ACE-Step is the better path. |
+| Real-time / streaming generation | Batch inference (~3s on RTX 4090) is fast enough for creative workflow |
+| Model training or fine-tuning | Pretrained models only — no training budget or data |
+| Web/mobile deployment | Local CLI execution on user's machine |
+| Multi-track arrangement generation | Different problem domain — single-instrument tracks, user layers in DAW |
+| Polyphonic input handling | Assume monophonic humming. Model does best-effort on complex input |
+| Singing synthesis / lyrics | Instrumental rendition only |
+
+## Traceability
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| MEL-01 | — | Pending |
+| MEL-02 | — | Pending |
+| MEL-03 | — | Pending |
+| MEL-04 | — | Pending |
+| INST-01 | — | Pending |
+| INST-02 | — | Pending |
+| INST-03 | — | Pending |
+| INP-01 | — | Pending |
+| INP-02 | — | Pending |
+| INP-03 | — | Pending |
+| OUT-01 | — | Pending |
+| OUT-02 | — | Pending |
+| OUT-03 | — | Pending |
+| REPR-01 | — | Pending |
+| REPR-02 | — | Pending |
+| PIPE-01 | — | Pending |
+| PIPE-02 | — | Pending |
+| PIPE-03 | — | Pending |
+
+**Coverage:**
+- v1 requirements: 18 total
+- Mapped to phases: 0
+- Unmapped: 18
+
+---
+*Requirements defined: 2026-04-11*
+*Last updated: 2026-04-11 after initial definition*