From 93bd57d386e66b736dc60ca5dcc28ea17c1bbe51 Mon Sep 17 00:00:00 2001 From: John Lightner Date: Sat, 11 Apr 2026 01:40:49 -0500 Subject: [PATCH] docs: define v1 requirements --- .planning/REQUIREMENTS.md | 113 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 .planning/REQUIREMENTS.md diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md new file mode 100644 index 0000000..9d2d1c0 --- /dev/null +++ b/.planning/REQUIREMENTS.md @@ -0,0 +1,113 @@ +# Requirements: AI Music Pipeline + +**Defined:** 2026-04-11 +**Core Value:** A hummed melody input must produce instrument-specific output that audibly follows the melody's contour and rhythm + +## v1 Requirements + +Requirements for initial release. Each maps to roadmap phases. + +### Melody Fidelity + +- [ ] **MEL-01**: Output audio audibly follows the pitch contour of the hummed input melody +- [ ] **MEL-02**: Output audio preserves the rhythmic timing and phrasing of the hummed input +- [ ] **MEL-03**: User can control fidelity vs creativity tradeoff via cover_strength parameter +- [ ] **MEL-04**: Output is musically coherent — sounds like a real instrument performance, not garbled audio + +### Instrument Selection + +- [ ] **INST-01**: User can specify target instrument via text prompt (piano, guitar, saxophone, etc.) +- [ ] **INST-02**: Different instrument prompts produce audibly different timbres in the output +- [ ] **INST-03**: Instrument selection works across at least 5 distinct instrument types + +### Input Handling + +- [ ] **INP-01**: Pipeline accepts raw humming WAV audio as input with no manual preprocessing +- [ ] **INP-02**: Pipeline auto-detects input audio duration and configures output duration appropriately for quality +- [ ] **INP-03**: Input audio at common sample rates (44.1kHz, 48kHz, 16kHz) is handled without errors + +### Output Quality + +- [ ] **OUT-01**: Output audio is at least 44.1kHz sample rate (CD quality) +- [ ] **OUT-02**: Output is saved as WAV file to a user-specified or default output directory +- [ ] **OUT-03**: Output filenames include instrument name and timestamp for easy identification + +### Reproducibility + +- [ ] **REPR-01**: User can set a seed value to reproduce identical output from the same input + prompt +- [ ] **REPR-02**: Different seeds with the same input + prompt produce meaningfully different outputs + +### Pipeline Usability + +- [ ] **PIPE-01**: Single CLI command or script invocation to go from humming WAV to instrument output +- [ ] **PIPE-02**: Configuration via TOML file or CLI arguments for instrument, strength, duration, seed +- [ ] **PIPE-03**: Clear error messages when input file is missing, corrupted, or in unsupported format + +## v2 Requirements + +Deferred to future release. Tracked but not in current roadmap. + +### Multi-Instrument Rendering + +- **MULTI-01**: Hum once, generate multiple instrument versions automatically (batch over captions) +- **MULTI-02**: User specifies list of target instruments, pipeline renders all in one run + +### Evaluation & Metrics + +- **EVAL-01**: Pitch contour comparison score between input humming and output audio +- **EVAL-02**: Per-output fidelity report showing how closely melody was followed + +### Input Flexibility + +- **FLEX-01**: Accept MIDI files as input (render via FluidSynth before generation) +- **FLEX-02**: Auto-detect BPM from humming input +- **FLEX-03**: Accept whistling and played-instrument recordings as input + +### Style Control + +- **STYLE-01**: Caption templates for genre/mood beyond instrument (jazz, rock, classical, etc.) +- **STYLE-02**: Named parameter presets (faithful / creative / loose interpretation) + +## Out of Scope + +| Feature | Reason | +|---------|--------| +| MusicGen melody debugging via HF transformers | Root cause identified (missing cfg_coef_beta, no Demucs, broken null conditioning). ACE-Step is the better path. | +| Real-time / streaming generation | Batch inference (~3s on RTX 4090) is fast enough for creative workflow | +| Model training or fine-tuning | Pretrained models only — no training budget or data | +| Web/mobile deployment | Local CLI execution on user's machine | +| Multi-track arrangement generation | Different problem domain — single-instrument tracks, user layers in DAW | +| Polyphonic input handling | Assume monophonic humming. Model does best-effort on complex input | +| Singing synthesis / lyrics | Instrumental rendition only | + +## Traceability + +| Requirement | Phase | Status | +|-------------|-------|--------| +| MEL-01 | — | Pending | +| MEL-02 | — | Pending | +| MEL-03 | — | Pending | +| MEL-04 | — | Pending | +| INST-01 | — | Pending | +| INST-02 | — | Pending | +| INST-03 | — | Pending | +| INP-01 | — | Pending | +| INP-02 | — | Pending | +| INP-03 | — | Pending | +| OUT-01 | — | Pending | +| OUT-02 | — | Pending | +| OUT-03 | — | Pending | +| REPR-01 | — | Pending | +| REPR-02 | — | Pending | +| PIPE-01 | — | Pending | +| PIPE-02 | — | Pending | +| PIPE-03 | — | Pending | + +**Coverage:** +- v1 requirements: 18 total +- Mapped to phases: 0 +- Unmapped: 18 + +--- +*Requirements defined: 2026-04-11* +*Last updated: 2026-04-11 after initial definition*