diff --git a/.planning/STATE.md b/.planning/STATE.md index 4f4de05..4e9e9f3 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-04-11) ## Current Position Phase: 1 of 4 (Core Pipeline) -Plan: 1 of 3 in current phase -Status: Executing -Last activity: 2026-04-11 -- Completed 01-01-PLAN.md +Plan: 2 of 2 in current phase +Status: Phase complete — awaiting verification +Last activity: 2026-04-11 -- Completed 01-02-PLAN.md -Progress: [#.........] 10% +Progress: [##........] 20% ## Performance Metrics @@ -27,7 +27,7 @@ Progress: [#.........] 10% | Phase | Plans | Total | Avg/Plan | |-------|-------|-------|----------| -| 01-core-pipeline | 01-01 | 2min | 2min | +| 01-core-pipeline | 01-01, 01-02 | 27min | 13.5min | **Recent Trend:** - Last 5 plans: - @@ -48,6 +48,10 @@ Recent decisions affecting current work: - [01-01]: Default audio_cover_strength=0.9 for high melody fidelity - [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail) - [01-01]: Temp directory isolation for ACE-Step UUID output before renaming +- [01-02]: audio_cover_strength=0.3 is optimal (0.9 garbled, 0.5+ copies source) +- [01-02]: cover_noise_strength must be 0.0 (any >0 produces source copies) +- [01-02]: Seed variance dominates quality — multi-take cherry-pick workflow required +- [01-02]: Simple instrument captions outperform verbose descriptive captions ### Pending Todos @@ -60,5 +64,5 @@ None yet. ## Session Continuity Last session: 2026-04-11 -Stopped at: Completed 01-01-PLAN.md +Stopped at: Completed 01-02-PLAN.md — all plans in phase 01 done, awaiting verification Resume file: None diff --git a/.planning/phases/01-core-pipeline/01-02-SUMMARY.md b/.planning/phases/01-core-pipeline/01-02-SUMMARY.md new file mode 100644 index 0000000..8b5e211 --- /dev/null +++ b/.planning/phases/01-core-pipeline/01-02-SUMMARY.md @@ -0,0 +1,83 @@ +--- +phase: 01-core-pipeline +plan: 02 +subsystem: pipeline +tags: [ace-step, cover-mode, tuning, seed-control, quality-testing] + +requires: + - phase: 01-01 + provides: "hum2inst.py CLI script" +provides: + - "Validated end-to-end pipeline with real humming input" + - "Tuned generation parameters (strength=0.3 optimal)" + - "Multi-take seed generation workflow" + - "JSON run logging for reproducibility" +affects: [02-quality-presets] + +tech-stack: + added: [] + patterns: [multi-take-generation, json-sidecar-logging, seed-reproducibility] + +key-files: + created: [] + modified: [hum2inst.py] + +key-decisions: + - "audio_cover_strength=0.3 is optimal default (0.9 garbled, 0.5+ often copies source)" + - "cover_noise_strength=0.0 required (any >0 produces near-identical source copy)" + - "Seed variance dominates output quality — multi-take cherry-pick workflow is necessary" + - "Custom verbose captions hurt quality — simple instrument captions work best" + - "JSON sidecar logging for every output WAV to track parameters" + - "Seed embedded in filename for traceability" + +patterns-established: + - "Generate N takes with --takes, cherry-pick best by ear, reproduce with --seed" + - "JSON sidecar per output with full parameter snapshot" + - "Parameter sweep methodology: isolate one variable, same seed, compare" + +requirements-completed: [MEL-01, MEL-02, MEL-04, INST-01, INP-01, OUT-02, PIPE-01] + +## Self-Check: PARTIAL + +quality-notes: | + Pipeline runs end-to-end without errors. Best outputs follow melody contour and + sound recognizably like target instrument. Quality is seed-dependent — some seeds + produce excellent results, others garbled or off-topic. Strength=0.3 with default + params is the confirmed sweet spot. This is a model capability ceiling, not a + script issue. + +duration: 25min +completed: 2026-04-11 +--- + +# Phase 1 Plan 02: End-to-End Pipeline Verification + +**Validated pipeline with real humming input, tuned parameters through systematic testing, added seed control and run logging** + +## Performance + +- **Duration:** ~25 min (includes iterative tuning with user) +- **Tasks:** 2/2 (Task 2 checkpoint resolved through iterative testing) +- **Files modified:** 1 + +## Accomplishments +- Pipeline ran successfully on first attempt with no code fixes needed +- Systematic parameter sweep across strength, noise-strength, guidance, steps, shift, sampler, vel-clamp, vel-ema +- Identified optimal defaults: strength=0.3, noise-strength=0.0, guidance=5.0, steps=50 +- Added seed control (--seed), multi-take generation (--takes), caption override (--caption) +- Added 6 advanced tuning flags (--guidance, --steps, --shift, --sampler, --vel-clamp, --vel-ema) +- Every output now has JSON sidecar with full parameter log +- Seed embedded in output filename for traceability + +## Key Findings +- **audio_cover_strength** has a narrow sweet spot around 0.3 — above 0.5 often copies source, at 0.9 produces garbled "deep dream" output +- **cover_noise_strength** is cliff-like — any value > 0 produces near-identical source copies +- **Seed variance dominates** — same parameters produce wildly different results across seeds +- **guidance=7.0** improved melody fidelity but shifted timbre too bright (glockenspiel-like) +- **heun sampler** produced most authentic piano sound but melody diverged +- **Custom verbose captions** (descriptive register/tone) degraded output severely +- **Higher steps (100)** and **shift values** did not improve quality + +## Task Commits + +- `45f6863` feat(01-02): add generation tuning params, seed control, multi-take, and run logging