docs: define v1 requirements

This commit is contained in:
John Lightner 2026-04-11 01:40:49 -05:00
parent b51cf7cd6b
commit 93bd57d386

113
.planning/REQUIREMENTS.md Normal file
View file

@ -0,0 +1,113 @@
# Requirements: AI Music Pipeline
**Defined:** 2026-04-11
**Core Value:** A hummed melody input must produce instrument-specific output that audibly follows the melody's contour and rhythm
## v1 Requirements
Requirements for initial release. Each maps to roadmap phases.
### Melody Fidelity
- [ ] **MEL-01**: Output audio audibly follows the pitch contour of the hummed input melody
- [ ] **MEL-02**: Output audio preserves the rhythmic timing and phrasing of the hummed input
- [ ] **MEL-03**: User can control fidelity vs creativity tradeoff via cover_strength parameter
- [ ] **MEL-04**: Output is musically coherent — sounds like a real instrument performance, not garbled audio
### Instrument Selection
- [ ] **INST-01**: User can specify target instrument via text prompt (piano, guitar, saxophone, etc.)
- [ ] **INST-02**: Different instrument prompts produce audibly different timbres in the output
- [ ] **INST-03**: Instrument selection works across at least 5 distinct instrument types
### Input Handling
- [ ] **INP-01**: Pipeline accepts raw humming WAV audio as input with no manual preprocessing
- [ ] **INP-02**: Pipeline auto-detects input audio duration and configures output duration appropriately for quality
- [ ] **INP-03**: Input audio at common sample rates (44.1kHz, 48kHz, 16kHz) is handled without errors
### Output Quality
- [ ] **OUT-01**: Output audio is at least 44.1kHz sample rate (CD quality)
- [ ] **OUT-02**: Output is saved as WAV file to a user-specified or default output directory
- [ ] **OUT-03**: Output filenames include instrument name and timestamp for easy identification
### Reproducibility
- [ ] **REPR-01**: User can set a seed value to reproduce identical output from the same input + prompt
- [ ] **REPR-02**: Different seeds with the same input + prompt produce meaningfully different outputs
### Pipeline Usability
- [ ] **PIPE-01**: Single CLI command or script invocation to go from humming WAV to instrument output
- [ ] **PIPE-02**: Configuration via TOML file or CLI arguments for instrument, strength, duration, seed
- [ ] **PIPE-03**: Clear error messages when input file is missing, corrupted, or in unsupported format
## v2 Requirements
Deferred to future release. Tracked but not in current roadmap.
### Multi-Instrument Rendering
- **MULTI-01**: Hum once, generate multiple instrument versions automatically (batch over captions)
- **MULTI-02**: User specifies list of target instruments, pipeline renders all in one run
### Evaluation & Metrics
- **EVAL-01**: Pitch contour comparison score between input humming and output audio
- **EVAL-02**: Per-output fidelity report showing how closely melody was followed
### Input Flexibility
- **FLEX-01**: Accept MIDI files as input (render via FluidSynth before generation)
- **FLEX-02**: Auto-detect BPM from humming input
- **FLEX-03**: Accept whistling and played-instrument recordings as input
### Style Control
- **STYLE-01**: Caption templates for genre/mood beyond instrument (jazz, rock, classical, etc.)
- **STYLE-02**: Named parameter presets (faithful / creative / loose interpretation)
## Out of Scope
| Feature | Reason |
|---------|--------|
| MusicGen melody debugging via HF transformers | Root cause identified (missing cfg_coef_beta, no Demucs, broken null conditioning). ACE-Step is the better path. |
| Real-time / streaming generation | Batch inference (~3s on RTX 4090) is fast enough for creative workflow |
| Model training or fine-tuning | Pretrained models only — no training budget or data |
| Web/mobile deployment | Local CLI execution on user's machine |
| Multi-track arrangement generation | Different problem domain — single-instrument tracks, user layers in DAW |
| Polyphonic input handling | Assume monophonic humming. Model does best-effort on complex input |
| Singing synthesis / lyrics | Instrumental rendition only |
## Traceability
| Requirement | Phase | Status |
|-------------|-------|--------|
| MEL-01 | — | Pending |
| MEL-02 | — | Pending |
| MEL-03 | — | Pending |
| MEL-04 | — | Pending |
| INST-01 | — | Pending |
| INST-02 | — | Pending |
| INST-03 | — | Pending |
| INP-01 | — | Pending |
| INP-02 | — | Pending |
| INP-03 | — | Pending |
| OUT-01 | — | Pending |
| OUT-02 | — | Pending |
| OUT-03 | — | Pending |
| REPR-01 | — | Pending |
| REPR-02 | — | Pending |
| PIPE-01 | — | Pending |
| PIPE-02 | — | Pending |
| PIPE-03 | — | Pending |
**Coverage:**
- v1 requirements: 18 total
- Mapped to phases: 0
- Unmapped: 18
---
*Requirements defined: 2026-04-11*
*Last updated: 2026-04-11 after initial definition*