diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md index 9d2d1c0..7d60ddb 100644 --- a/.planning/REQUIREMENTS.md +++ b/.planning/REQUIREMENTS.md @@ -84,30 +84,30 @@ Deferred to future release. Tracked but not in current roadmap. | Requirement | Phase | Status | |-------------|-------|--------| -| MEL-01 | — | Pending | -| MEL-02 | — | Pending | -| MEL-03 | — | Pending | -| MEL-04 | — | Pending | -| INST-01 | — | Pending | -| INST-02 | — | Pending | -| INST-03 | — | Pending | -| INP-01 | — | Pending | -| INP-02 | — | Pending | -| INP-03 | — | Pending | -| OUT-01 | — | Pending | -| OUT-02 | — | Pending | -| OUT-03 | — | Pending | -| REPR-01 | — | Pending | -| REPR-02 | — | Pending | -| PIPE-01 | — | Pending | -| PIPE-02 | — | Pending | -| PIPE-03 | — | Pending | +| MEL-01 | Phase 1 | Pending | +| MEL-02 | Phase 1 | Pending | +| MEL-03 | Phase 2 | Pending | +| MEL-04 | Phase 1 | Pending | +| INST-01 | Phase 1 | Pending | +| INST-02 | Phase 2 | Pending | +| INST-03 | Phase 2 | Pending | +| INP-01 | Phase 1 | Pending | +| INP-02 | Phase 3 | Pending | +| INP-03 | Phase 3 | Pending | +| OUT-01 | Phase 3 | Pending | +| OUT-02 | Phase 1 | Pending | +| OUT-03 | Phase 3 | Pending | +| REPR-01 | Phase 4 | Pending | +| REPR-02 | Phase 4 | Pending | +| PIPE-01 | Phase 1 | Pending | +| PIPE-02 | Phase 4 | Pending | +| PIPE-03 | Phase 3 | Pending | **Coverage:** - v1 requirements: 18 total -- Mapped to phases: 0 -- Unmapped: 18 +- Mapped to phases: 18 +- Unmapped: 0 --- *Requirements defined: 2026-04-11* -*Last updated: 2026-04-11 after initial definition* +*Last updated: 2026-04-11 after roadmap creation* diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md new file mode 100644 index 0000000..4851e31 --- /dev/null +++ b/.planning/ROADMAP.md @@ -0,0 +1,92 @@ +# Roadmap: AI Music Pipeline + +## Overview + +This roadmap delivers a voice-to-instrument pipeline built on ACE-Step 1.5 XL-SFT cover mode. Phase 1 establishes the core end-to-end flow (hum in, instrument out), Phase 2 validates instrument variety and exposes fidelity control, Phase 3 hardens input/output handling, and Phase 4 adds configuration file support and reproducibility via seed control. The result is a single CLI tool that takes a humming WAV and produces high-quality instrument renditions that faithfully follow the input melody. + +## Phases + +**Phase Numbering:** +- Integer phases (1, 2, 3, 4): Planned milestone work +- Decimal phases (e.g., 2.1): Urgent insertions (marked with INSERTED) + +- [ ] **Phase 1: Core Pipeline** - End-to-end humming WAV to instrument output via ACE-Step cover mode +- [ ] **Phase 2: Instrument Variety & Fidelity Control** - Multiple distinct instruments and cover_strength tuning +- [ ] **Phase 3: Input & Output Robustness** - Sample rate handling, duration detection, CD quality output, error messages +- [ ] **Phase 4: Configuration & Reproducibility** - TOML config support and seed control for reproducible outputs + +## Phase Details + +### Phase 1: Core Pipeline +**Goal**: User can hum a melody, run one command, and get an instrument rendition that audibly follows the melody +**Depends on**: Nothing (first phase) +**Requirements**: MEL-01, MEL-02, MEL-04, INST-01, INP-01, OUT-02, PIPE-01 +**Success Criteria** (what must be TRUE): + 1. User can run a single script/command with a humming WAV file and get instrument audio output + 2. Output audio audibly follows the pitch contour of the input humming + 3. Output audio preserves the rhythmic timing of the input humming + 4. Output sounds like a coherent instrument performance, not garbled noise + 5. User can specify the target instrument (e.g., piano, guitar) and the output reflects that instrument +**Plans**: TBD + +Plans: +- [ ] 01-01: TBD +- [ ] 01-02: TBD +- [ ] 01-03: TBD + +### Phase 2: Instrument Variety & Fidelity Control +**Goal**: User can choose from multiple instruments that sound distinctly different, and control how closely the output follows the input melody +**Depends on**: Phase 1 +**Requirements**: INST-02, INST-03, MEL-03 +**Success Criteria** (what must be TRUE): + 1. Different instrument prompts (piano, guitar, saxophone, violin, flute) produce audibly different timbres from the same input + 2. At least 5 distinct instrument types produce usable output + 3. User can adjust cover_strength parameter and hear the difference -- higher values follow the melody more closely, lower values allow more creative interpretation +**Plans**: TBD + +Plans: +- [ ] 02-01: TBD +- [ ] 02-02: TBD + +### Phase 3: Input & Output Robustness +**Goal**: Pipeline handles real-world input files gracefully and produces properly named CD-quality output +**Depends on**: Phase 1 +**Requirements**: INP-02, INP-03, OUT-01, OUT-03, PIPE-03 +**Success Criteria** (what must be TRUE): + 1. Input WAV files at 44.1kHz, 48kHz, and 16kHz sample rates all work without errors + 2. Pipeline auto-detects input audio duration and configures generation duration appropriately + 3. Output audio is at least 44.1kHz sample rate + 4. Output filenames include the instrument name and a timestamp (e.g., piano_20260411_143022.wav) + 5. Clear error message shown when input file is missing, corrupted, or in an unsupported format +**Plans**: TBD + +Plans: +- [ ] 03-01: TBD +- [ ] 03-02: TBD +- [ ] 03-03: TBD + +### Phase 4: Configuration & Reproducibility +**Goal**: User can configure the pipeline via TOML file and reproduce or vary outputs using seed control +**Depends on**: Phase 1 +**Requirements**: PIPE-02, REPR-01, REPR-02 +**Success Criteria** (what must be TRUE): + 1. User can specify instrument, cover_strength, duration, and seed via a TOML config file instead of CLI arguments + 2. Running the pipeline twice with the same seed, input, and prompt produces identical output + 3. Running with different seeds produces meaningfully different outputs from the same input and prompt +**Plans**: TBD + +Plans: +- [ ] 04-01: TBD +- [ ] 04-02: TBD + +## Progress + +**Execution Order:** +Phases execute in numeric order. Phases 2, 3, and 4 all depend on Phase 1 but are independent of each other. + +| Phase | Plans Complete | Status | Completed | +|-------|----------------|--------|-----------| +| 1. Core Pipeline | 0/3 | Not started | - | +| 2. Instrument Variety & Fidelity Control | 0/2 | Not started | - | +| 3. Input & Output Robustness | 0/3 | Not started | - | +| 4. Configuration & Reproducibility | 0/2 | Not started | - | diff --git a/.planning/STATE.md b/.planning/STATE.md new file mode 100644 index 0000000..9fae491 --- /dev/null +++ b/.planning/STATE.md @@ -0,0 +1,60 @@ +# Project State + +## Project Reference + +See: .planning/PROJECT.md (updated 2026-04-11) + +**Core value:** A hummed melody input must produce instrument-specific output that audibly follows the melody's contour and rhythm +**Current focus:** Phase 1: Core Pipeline + +## Current Position + +Phase: 1 of 4 (Core Pipeline) +Plan: 0 of 3 in current phase +Status: Ready to plan +Last activity: 2026-04-11 -- Roadmap created + +Progress: [..........] 0% + +## Performance Metrics + +**Velocity:** +- Total plans completed: 0 +- Average duration: - +- Total execution time: 0 hours + +**By Phase:** + +| Phase | Plans | Total | Avg/Plan | +|-------|-------|-------|----------| +| - | - | - | - | + +**Recent Trend:** +- Last 5 plans: - +- Trend: - + +*Updated after each plan completion* + +## Accumulated Context + +### Decisions + +Decisions are logged in PROJECT.md Key Decisions table. +Recent decisions affecting current work: + +- [Roadmap]: ACE-Step 1.5 XL-SFT cover mode is the sole generation engine for v1. No MusicGen/AudioCraft. +- [Roadmap]: Phases 2-4 are independent after Phase 1; can be executed in any order. + +### Pending Todos + +None yet. + +### Blockers/Concerns + +None yet. + +## Session Continuity + +Last session: 2026-04-11 +Stopped at: Roadmap created, ready to plan Phase 1 +Resume file: None