docs(01-core-pipeline): create phase plan

2026-04-11 02:07:27 -05:00 · 2026-04-11 02:07:27 -05:00 · b9cc6121f3
commit b9cc6121f3
parent b9932b08b4
3 changed files with 344 additions and 4 deletions
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@ -27,12 +27,11 @@ This roadmap delivers a voice-to-instrument pipeline built on ACE-Step 1.5 XL-SF
  3. Output audio preserves the rhythmic timing of the input humming
  4. Output sounds like a coherent instrument performance, not garbled noise
  5. User can specify the target instrument (e.g., piano, guitar) and the output reflects that instrument
-**Plans**: TBD
+**Plans**: 2 plans
 Plans:
- [ ] 01-01: TBD
+- [ ] 01-01-PLAN.md — Build hum2inst.py CLI pipeline and archive old scripts
- [ ] 01-02: TBD
+- [ ] 01-02-PLAN.md — End-to-end test and human verification of melody fidelity
 - [ ] 01-03: TBD
 ### Phase 2: Instrument Variety & Fidelity Control
 **Goal**: User can choose from multiple instruments that sound distinctly different, and control how closely the output follows the input melody
--- a/.planning/phases/01-core-pipeline/01-01-PLAN.md
+++ b/.planning/phases/01-core-pipeline/01-01-PLAN.md
@ -0,0 +1,197 @@
 ---
 phase: 01-core-pipeline
 plan: 01
 type: execute
 wave: 1
 depends_on: []
 files_modified:
  - hum2inst.py
  - archive/midi_to_audio.py
  - archive/musicgen_melody.py
 autonomous: true
 requirements:
  - MEL-01
  - MEL-02
  - MEL-04
  - INST-01
  - INP-01
  - OUT-02
  - PIPE-01
 must_haves:
  truths:
    - "User can run `python hum2inst.py input.wav --instrument piano` and get a WAV file in ./output/"
    - "Script reads input WAV duration and passes it to ACE-Step so output length matches input"
    - "Script builds a caption from the instrument name that guides timbre selection"
    - "Script checks for CUDA GPU and exits with clear error if unavailable"
    - "Script detects silent/failed output and exits with non-zero code"
    - "Output WAV filename includes instrument name and timestamp"
    - "Old experimental scripts are moved to /archive and no longer in project root"
  artifacts:
    - path: "hum2inst.py"
      provides: "Complete CLI pipeline: argparse, CUDA check, ACE-Step init, generation, output rename, error handling"
      min_lines: 100
    - path: "archive/midi_to_audio.py"
      provides: "Archived experimental script"
    - path: "archive/musicgen_melody.py"
      provides: "Archived experimental script"
  key_links:
    - from: "hum2inst.py"
      to: "acestep.inference.generate_music"
      via: "direct Python import with sys.path insert"
      pattern: "from acestep\\.inference import"
    - from: "hum2inst.py"
      to: "acestep.handler.AceStepHandler"
      via: "handler initialization with XL-SFT config"
      pattern: "initialize_service.*acestep-v15-xl-sft"
    - from: "hum2inst.py"
      to: "torchaudio.info"
      via: "duration detection from input WAV"
      pattern: "torchaudio\\.info"
 ---
 <objective>
 Build the complete hum-to-instrument CLI pipeline as a single Python script, and archive old experimental scripts.
 Purpose: Deliver the core end-to-end functionality -- a user hums into a WAV file, runs one command, and gets an instrument rendition that follows their melody.
 Output: Working `hum2inst.py` script at project root, old scripts moved to `archive/`.
 </objective>
 <execution_context>
@C:/Users/jlightner/.claude/get-shit-done/workflows/execute-plan.md
@C:/Users/jlightner/.claude/get-shit-done/templates/summary.md
 </execution_context>
 <context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-core-pipeline/01-CONTEXT.md
@.planning/phases/01-core-pipeline/01-RESEARCH.md
 </context>
 <tasks>
 <task type="auto">
  <name>Task 1: Archive experimental scripts</name>
  <files>archive/midi_to_audio.py, archive/musicgen_melody.py</files>
  <action>
 Create an `archive/` directory at the project root. Move `midi_to_audio.py` and `musicgen_melody.py` from the project root into `archive/`. Use git mv if the repo tracks these files, otherwise use regular file move.
 Verify both files exist at their new locations and are removed from the project root.
  </action>
  <verify>
    <automated>ls archive/midi_to_audio.py archive/musicgen_melody.py && test ! -f midi_to_audio.py && test ! -f musicgen_melody.py && echo "PASS"</automated>
  </verify>
  <done>Both experimental scripts exist in archive/ and are gone from project root</done>
 </task>
 <task type="auto">
  <name>Task 2: Create hum2inst.py CLI pipeline script</name>
  <files>hum2inst.py</files>
  <action>
 Create `hum2inst.py` at the project root. This is a single-file CLI script that wraps ACE-Step XL-SFT cover mode. The script must implement ALL of the following (per user decisions in CONTEXT.md):
 **CLI interface (argparse):**
 - Positional argument: input WAV file path
 - `--instrument` (required): target instrument name (e.g., piano, guitar, saxophone)
 - `--output` (optional): output directory, defaults to `./output/`
 - `--strength` (optional): audio_cover_strength float, defaults to 0.9 (high fidelity per user preference, within locked range 0.8-1.0)
 - `--duration` (optional): override output duration in seconds; if not provided, auto-detect from input WAV
 **Startup checks:**
 - Validate input WAV file exists and is readable; exit 1 with clear message if not
 - Check `torch.cuda.is_available()`; exit 1 with message "ERROR: CUDA GPU required. No CUDA-capable GPU detected." if false
 - Create output directory if it doesn't exist
 **ACE-Step initialization (per RESEARCH.md Pattern 1):**
 - Add ace-step directory to sys.path: `sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), "ace-step"))`
 - Import from acestep: `AceStepHandler`, `LLMHandler`, `GenerationParams`, `GenerationConfig`, `generate_music`, `get_gpu_config`, `set_global_gpu_config`
 - Call `get_gpu_config()` and `set_global_gpu_config()`
 - Initialize `AceStepHandler` and call `initialize_service()` with `project_root` pointing to the ace-step directory, `config_path="acestep-v15-xl-sft"`, `device="cuda"`, and `use_flash_attention` from handler detection
 - Initialize `LLMHandler()` (instantiate only, no model load -- cover mode is in skip_lm_tasks)
 **Duration detection (per RESEARCH.md Pattern 3):**
 - Use `torchaudio.info(input_path)` to get `num_frames` and `sample_rate`
 - Compute duration as `num_frames / sample_rate`
 - Round to nearest integer for the `duration` parameter
 - Print detected duration: "Input duration: {duration}s"
 **Caption construction (per RESEARCH.md Pattern 4 + code examples):**
 - Include a dictionary of curated caption templates for common instruments: piano, guitar, saxophone, violin, flute (use the exact templates from RESEARCH.md code examples section)
 - For instruments not in the dictionary, use generic fallback: `f"solo {instrument}, clear and expressive melody, warm tone"`
 - Print the caption being used: "Caption: {caption}"
 **Generation (per RESEARCH.md Pattern 2):**
 - Create `GenerationParams` with:
  - `task_type="cover"`
  - `src_audio` = absolute path to input WAV
  - `caption` = built caption string
  - `lyrics=""`
  - `instrumental=True`
  - `duration` = detected or overridden duration (rounded int)
  - `audio_cover_strength` = user's --strength value (default 0.9)
  - `inference_steps=50` (XL-SFT model requirement -- NOT 8 which is turbo)
  - `guidance_scale=5.0`
  - `thinking=False` (no LLM needed for cover)
  - `bpm=120` (default, not critical for cover mode)
 - Create `GenerationConfig` with `batch_size=1`, `use_random_seed=True`, `audio_format="wav"`
 - Call `generate_music(dit_handler, llm_handler, params, config, save_dir=temp_output_dir)` where temp_output_dir is a temporary subdirectory to isolate ACE-Step's UUID-named output
 - Print progress: "Generating {instrument} cover..." before the call
 **Output handling (per RESEARCH.md Pitfall 2 + user decisions):**
 - After generation, check `result.success` and `result.audios` -- if failed, print error to stderr and exit 1
 - Get the generated file path from the result
 - Rename/copy to user-friendly filename: `{instrument}_{YYYYMMDD}_{HHMMSS}.wav` in the user's output directory
 - Use `datetime.now().strftime("%Y%m%d_%H%M%S")` for the timestamp
 - Print the final output path: "Output saved: {path}"
 **Silence detection (per RESEARCH.md Pitfall 4 + user decisions):**
 - After generation, load the output audio with torchaudio and compute RMS energy
 - If RMS is below a conservative threshold (e.g., -60 dBFS), print warning to stderr: "WARNING: Output audio appears to be near-silent. The generation may have failed."
 - Do NOT hard-fail on borderline cases -- just warn (per research recommendation)
 **Error handling (per user decisions):**
 - Wrap the main execution in try/except
 - On any exception during generation: print clear error message to stderr and exit 1
 - No silent failures -- every error path must print something useful
 **Progress feedback (Claude's discretion):**
 - Print status messages at key stages: loading model, detecting duration, generating, saving output
 - Keep it simple -- print statements, no progress bar library needed
  </action>
  <verify>
    <automated>python -c "import ast; ast.parse(open('hum2inst.py').read()); print('SYNTAX OK')" && grep -q "argparse" hum2inst.py && grep -q "generate_music" hum2inst.py && grep -q "cover" hum2inst.py && grep -q "instrument" hum2inst.py && grep -q "torchaudio" hum2inst.py && echo "PASS"</automated>
    <manual>Review script structure: argparse setup, CUDA check, ACE-Step init, generation call, output rename, error handling all present</manual>
  </verify>
  <done>hum2inst.py exists at project root with valid Python syntax, contains argparse CLI with --instrument/--output/--strength/--duration flags, imports ACE-Step API, uses cover mode with XL-SFT config, detects input duration, builds caption from instrument name, renames output with instrument+timestamp, checks for CUDA, detects silence, and handles errors with non-zero exit codes</done>
 </task>
 </tasks>
 <verification>
 1. `hum2inst.py` exists at project root and passes Python syntax check
 2. Script imports from acestep package (generate_music, AceStepHandler, etc.)
 3. Script uses argparse with required --instrument flag and optional --output, --strength, --duration
 4. Script checks CUDA availability before model loading
 5. Old scripts archived in archive/ directory
 6. Script contains cover mode configuration (task_type="cover", inference_steps=50)
 7. Script contains duration detection via torchaudio
 8. Script contains output file renaming with instrument + timestamp pattern
 </verification>
 <success_criteria>
 - hum2inst.py is syntactically valid Python
 - All required CLI flags are present (--instrument, --output, --strength, --duration)
 - ACE-Step cover mode is configured correctly (XL-SFT, 50 steps, cover task type)
 - Input duration auto-detection is implemented
 - Output naming includes instrument and timestamp
 - CUDA check is present
 - Silence detection is present
 - Error handling covers generation failure, missing input, no GPU
 - Old scripts moved to archive/
 </success_criteria>
 <output>
 After completion, create `.planning/phases/01-core-pipeline/01-01-SUMMARY.md`
 </output>
--- a/.planning/phases/01-core-pipeline/01-02-PLAN.md
+++ b/.planning/phases/01-core-pipeline/01-02-PLAN.md
@ -0,0 +1,144 @@
 ---
 phase: 01-core-pipeline
 plan: 02
 type: execute
 wave: 2
 depends_on:
  - 01-01
 files_modified:
  - hum2inst.py
 autonomous: false
 requirements:
  - MEL-01
  - MEL-02
  - MEL-04
  - INST-01
  - INP-01
  - OUT-02
  - PIPE-01
 must_haves:
  truths:
    - "Running hum2inst.py with a real humming WAV produces an output WAV file"
    - "Output audio audibly follows the pitch contour of the input humming"
    - "Output audio preserves the rhythmic timing of the input humming"
    - "Output sounds like the specified instrument, not garbled noise"
    - "Output filename contains the instrument name and a timestamp"
  artifacts:
    - path: "output/"
      provides: "Generated instrument WAV file from test run"
  key_links:
    - from: "hum2inst.py"
      to: "output/*.wav"
      via: "end-to-end generation from humming input"
      pattern: "Output saved:"
 ---
 <objective>
 Run the complete pipeline end-to-end with a real humming WAV file and verify the output meets quality requirements.
 Purpose: Validate that the script actually produces instrument audio that follows the hummed melody -- the core value proposition of the project.
 Output: A verified instrument rendition WAV file and confirmation that the pipeline works.
 </objective>
 <execution_context>
@C:/Users/jlightner/.claude/get-shit-done/workflows/execute-plan.md
@C:/Users/jlightner/.claude/get-shit-done/templates/summary.md
 </execution_context>
 <context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-core-pipeline/01-CONTEXT.md
@.planning/phases/01-core-pipeline/01-01-SUMMARY.md
 </context>
 <tasks>
 <task type="auto">
  <name>Task 1: Run end-to-end pipeline test</name>
  <files>hum2inst.py</files>
  <action>
 Run the hum2inst.py script with a real humming WAV file from the input/ directory. Use the ace-step venv Python interpreter.
 Execute this command:
 ```
 ace-step/.venv/Scripts/python.exe hum2inst.py "input/humming_step_down_jl [2026-04-11 004915].wav" --instrument piano
 ```
 Verify:
 1. The script runs without errors
 2. An output WAV file is created in ./output/ with a filename containing "piano" and a timestamp
 3. The script prints status messages (loading model, detecting duration, generating, output path)
 4. Exit code is 0
 If the script fails:
 - Read the error message carefully
 - Fix the issue in hum2inst.py (common issues: import paths, API parameter names, missing attributes on result object)
 - Re-run until it succeeds
 - Document any fixes made
 After successful run, also test the --help flag to verify argparse setup:
 ```
 ace-step/.venv/Scripts/python.exe hum2inst.py --help
 ```
  </action>
  <verify>
    <automated>ls output/piano_*.wav 2>/dev/null | head -1 | xargs -I{} test -f "{}" && echo "PASS" || echo "FAIL: no piano output WAV found"</automated>
    <manual>Check script console output shows: duration detection, caption, generation progress, output path</manual>
  </verify>
  <done>Pipeline produces a piano WAV output file from real humming input without errors</done>
 </task>
 <task type="checkpoint:human-verify" gate="blocking">
  <name>Task 2: Human verification of melody fidelity and instrument quality</name>
  <files>output/piano_*.wav</files>
  <action>
 Present the generated output to the user for listening verification. The user needs to compare the input humming with the generated instrument output.
 What was built: Complete hum-to-instrument pipeline -- hum2inst.py takes a humming WAV and produces an instrument rendition via ACE-Step XL-SFT cover mode.
 How to verify:
 1. Listen to the INPUT humming file: `input/humming_step_down_jl [2026-04-11 004915].wav`
 2. Listen to the OUTPUT piano file in `output/` (the latest piano_*.wav file)
 3. Verify these qualities:
   - Does the output follow the same melody (pitch contour) as the humming? (MEL-01)
   - Does the output preserve the timing/rhythm of the humming? (MEL-02)
   - Does the output sound like a coherent piano performance, not garbled noise? (MEL-04)
   - Is the output audibly a piano (not generic synth or noise)? (INST-01)
 4. Optionally, try a second instrument:
   `ace-step/.venv/Scripts/python.exe hum2inst.py "input/humming_step_down_jl [2026-04-11 004915].wav" --instrument guitar`
   Listen and verify it sounds distinctly different from the piano output.
 Resume signal: Type "approved" if the output sounds good, or describe any issues with melody fidelity, instrument quality, or other problems.
  </action>
  <verify>
    <automated>ls output/piano_*.wav 2>/dev/null && echo "Output file exists"</automated>
    <manual>Human listens and confirms melody fidelity, rhythm preservation, instrument quality, and musical coherence</manual>
  </verify>
  <done>Human confirms output audio follows the hummed melody's pitch contour and rhythm, sounds like the specified instrument, and is musically coherent</done>
 </task>
 </tasks>
 <verification>
 1. hum2inst.py runs successfully with real humming input
 2. Output WAV file exists with correct naming convention
 3. Human confirms output follows input melody contour
 4. Human confirms output preserves rhythmic timing
 5. Human confirms output sounds like specified instrument
 6. Human confirms output is musically coherent
 </verification>
 <success_criteria>
 - Pipeline runs end-to-end without errors on real humming input
 - Output WAV file produced with instrument+timestamp filename
 - Human verifies melody fidelity (pitch contour and rhythm preserved)
 - Human verifies instrument quality (sounds like specified instrument)
 - Human verifies musical coherence (not garbled noise)
 </success_criteria>
 <output>
 After completion, create `.planning/phases/01-core-pipeline/01-02-SUMMARY.md`
 </output>