docs: define v1 requirements
This commit is contained in:
parent
b51cf7cd6b
commit
93bd57d386
1 changed files with 113 additions and 0 deletions
113
.planning/REQUIREMENTS.md
Normal file
113
.planning/REQUIREMENTS.md
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
# Requirements: AI Music Pipeline
|
||||
|
||||
**Defined:** 2026-04-11
|
||||
**Core Value:** A hummed melody input must produce instrument-specific output that audibly follows the melody's contour and rhythm
|
||||
|
||||
## v1 Requirements
|
||||
|
||||
Requirements for initial release. Each maps to roadmap phases.
|
||||
|
||||
### Melody Fidelity
|
||||
|
||||
- [ ] **MEL-01**: Output audio audibly follows the pitch contour of the hummed input melody
|
||||
- [ ] **MEL-02**: Output audio preserves the rhythmic timing and phrasing of the hummed input
|
||||
- [ ] **MEL-03**: User can control fidelity vs creativity tradeoff via cover_strength parameter
|
||||
- [ ] **MEL-04**: Output is musically coherent — sounds like a real instrument performance, not garbled audio
|
||||
|
||||
### Instrument Selection
|
||||
|
||||
- [ ] **INST-01**: User can specify target instrument via text prompt (piano, guitar, saxophone, etc.)
|
||||
- [ ] **INST-02**: Different instrument prompts produce audibly different timbres in the output
|
||||
- [ ] **INST-03**: Instrument selection works across at least 5 distinct instrument types
|
||||
|
||||
### Input Handling
|
||||
|
||||
- [ ] **INP-01**: Pipeline accepts raw humming WAV audio as input with no manual preprocessing
|
||||
- [ ] **INP-02**: Pipeline auto-detects input audio duration and configures output duration appropriately for quality
|
||||
- [ ] **INP-03**: Input audio at common sample rates (44.1kHz, 48kHz, 16kHz) is handled without errors
|
||||
|
||||
### Output Quality
|
||||
|
||||
- [ ] **OUT-01**: Output audio is at least 44.1kHz sample rate (CD quality)
|
||||
- [ ] **OUT-02**: Output is saved as WAV file to a user-specified or default output directory
|
||||
- [ ] **OUT-03**: Output filenames include instrument name and timestamp for easy identification
|
||||
|
||||
### Reproducibility
|
||||
|
||||
- [ ] **REPR-01**: User can set a seed value to reproduce identical output from the same input + prompt
|
||||
- [ ] **REPR-02**: Different seeds with the same input + prompt produce meaningfully different outputs
|
||||
|
||||
### Pipeline Usability
|
||||
|
||||
- [ ] **PIPE-01**: Single CLI command or script invocation to go from humming WAV to instrument output
|
||||
- [ ] **PIPE-02**: Configuration via TOML file or CLI arguments for instrument, strength, duration, seed
|
||||
- [ ] **PIPE-03**: Clear error messages when input file is missing, corrupted, or in unsupported format
|
||||
|
||||
## v2 Requirements
|
||||
|
||||
Deferred to future release. Tracked but not in current roadmap.
|
||||
|
||||
### Multi-Instrument Rendering
|
||||
|
||||
- **MULTI-01**: Hum once, generate multiple instrument versions automatically (batch over captions)
|
||||
- **MULTI-02**: User specifies list of target instruments, pipeline renders all in one run
|
||||
|
||||
### Evaluation & Metrics
|
||||
|
||||
- **EVAL-01**: Pitch contour comparison score between input humming and output audio
|
||||
- **EVAL-02**: Per-output fidelity report showing how closely melody was followed
|
||||
|
||||
### Input Flexibility
|
||||
|
||||
- **FLEX-01**: Accept MIDI files as input (render via FluidSynth before generation)
|
||||
- **FLEX-02**: Auto-detect BPM from humming input
|
||||
- **FLEX-03**: Accept whistling and played-instrument recordings as input
|
||||
|
||||
### Style Control
|
||||
|
||||
- **STYLE-01**: Caption templates for genre/mood beyond instrument (jazz, rock, classical, etc.)
|
||||
- **STYLE-02**: Named parameter presets (faithful / creative / loose interpretation)
|
||||
|
||||
## Out of Scope
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| MusicGen melody debugging via HF transformers | Root cause identified (missing cfg_coef_beta, no Demucs, broken null conditioning). ACE-Step is the better path. |
|
||||
| Real-time / streaming generation | Batch inference (~3s on RTX 4090) is fast enough for creative workflow |
|
||||
| Model training or fine-tuning | Pretrained models only — no training budget or data |
|
||||
| Web/mobile deployment | Local CLI execution on user's machine |
|
||||
| Multi-track arrangement generation | Different problem domain — single-instrument tracks, user layers in DAW |
|
||||
| Polyphonic input handling | Assume monophonic humming. Model does best-effort on complex input |
|
||||
| Singing synthesis / lyrics | Instrumental rendition only |
|
||||
|
||||
## Traceability
|
||||
|
||||
| Requirement | Phase | Status |
|
||||
|-------------|-------|--------|
|
||||
| MEL-01 | — | Pending |
|
||||
| MEL-02 | — | Pending |
|
||||
| MEL-03 | — | Pending |
|
||||
| MEL-04 | — | Pending |
|
||||
| INST-01 | — | Pending |
|
||||
| INST-02 | — | Pending |
|
||||
| INST-03 | — | Pending |
|
||||
| INP-01 | — | Pending |
|
||||
| INP-02 | — | Pending |
|
||||
| INP-03 | — | Pending |
|
||||
| OUT-01 | — | Pending |
|
||||
| OUT-02 | — | Pending |
|
||||
| OUT-03 | — | Pending |
|
||||
| REPR-01 | — | Pending |
|
||||
| REPR-02 | — | Pending |
|
||||
| PIPE-01 | — | Pending |
|
||||
| PIPE-02 | — | Pending |
|
||||
| PIPE-03 | — | Pending |
|
||||
|
||||
**Coverage:**
|
||||
- v1 requirements: 18 total
|
||||
- Mapped to phases: 0
|
||||
- Unmapped: 18
|
||||
|
||||
---
|
||||
*Requirements defined: 2026-04-11*
|
||||
*Last updated: 2026-04-11 after initial definition*
|
||||
Loading…
Add table
Reference in a new issue