From 5a04fc34989d0215dac4a90ec996d74a74a9dcf0 Mon Sep 17 00:00:00 2001 From: John Lightner Date: Sat, 11 Apr 2026 01:58:51 -0500 Subject: [PATCH] docs(01): capture phase context --- .../phases/01-core-pipeline/01-CONTEXT.md | 67 +++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 .planning/phases/01-core-pipeline/01-CONTEXT.md diff --git a/.planning/phases/01-core-pipeline/01-CONTEXT.md b/.planning/phases/01-core-pipeline/01-CONTEXT.md new file mode 100644 index 0000000..9840df5 --- /dev/null +++ b/.planning/phases/01-core-pipeline/01-CONTEXT.md @@ -0,0 +1,67 @@ +# Phase 1: Core Pipeline - Context + +**Gathered:** 2026-04-11 +**Status:** Ready for planning + + +## Phase Boundary + +Single CLI command that takes a humming WAV file and an instrument name, and produces an instrument rendition via ACE-Step XL-SFT cover mode. Output audibly follows the pitch contour and rhythmic timing of the input. No multi-instrument batch, no config files, no advanced error handling — just the core end-to-end flow. + + + + +## Implementation Decisions + +### CLI invocation design +- Invoked as `python hum2inst.py input.wav --instrument piano` +- Python script directly, no installation step +- `--instrument` as a named CLI flag (not positional) +- `--output` flag optional, defaults to `./output/` directory +- Use Python argparse for argument parsing (gives --help for free) + +### ACE-Step generation parameters +- Default cover_strength in the high fidelity range (0.8-1.0) +- `--strength` flag exposed in Phase 1 so users can experiment immediately +- Duration matches input WAV length by default; `--duration` flag to override +- Caption auto-built from instrument name (e.g., "piano cover of a melody") — no custom prompt flag in Phase 1 + +### Output behavior +- Output filename includes instrument and timestamp (exact format at Claude's discretion) +- On generation failure or silence: print clear error message, exit with non-zero code +- No auto-play — just save and print the output path +- No silent failures + +### Pipeline architecture +- Single `hum2inst.py` script — no module splitting in Phase 1 +- Assume CUDA GPU is available; fail with clear message if no GPU detected +- Move existing experimental scripts (midi_to_audio.py, musicgen_melody.py) to an `/archive` folder + +### Claude's Discretion +- ACE-Step invocation method (import Python API vs subprocess call — choose based on what ACE-Step exposes) +- Progress/feedback during generation (print statements, progress bar, or similar — pick what's appropriate) +- Exact output filename format (instrument + timestamp pattern) +- Exact cover_strength default value within the 0.8-1.0 range + + + + +## Specific Ideas + +- ACE-Step XL-SFT cover mode is the generation backend — this is established from prior experimentation +- The `ace-step/` directory already exists in the project root with the model code +- User wants high melodic fidelity as the default — the pipeline should prioritize faithful melody reproduction over creative interpretation + + + + +## Deferred Ideas + +None — discussion stayed within phase scope + + + +--- + +*Phase: 01-core-pipeline* +*Context gathered: 2026-04-11*