docs(01-02): complete end-to-end verification plan with tuning findings

2026-04-11 03:25:42 -05:00 · 2026-04-11 03:25:42 -05:00 · 38331ca59a
commit 38331ca59a
parent 45f6863131
2 changed files with 93 additions and 6 deletions
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-04-11)
 ## Current Position
 Phase: 1 of 4 (Core Pipeline)
-Plan: 1 of 3 in current phase
+Plan: 2 of 2 in current phase
-Status: Executing
+Status: Phase complete — awaiting verification
-Last activity: 2026-04-11 -- Completed 01-01-PLAN.md
+Last activity: 2026-04-11 -- Completed 01-02-PLAN.md
-Progress: [#.........] 10%
+Progress: [##........] 20%
 ## Performance Metrics
@ -27,7 +27,7 @@ Progress: [#.........] 10%
 | Phase | Plans | Total | Avg/Plan |
 |-------|-------|-------|----------|
-| 01-core-pipeline | 01-01 | 2min | 2min |
+| 01-core-pipeline | 01-01, 01-02 | 27min | 13.5min |
 **Recent Trend:**
 - Last 5 plans: -
@ -48,6 +48,10 @@ Recent decisions affecting current work:
 - [01-01]: Default audio_cover_strength=0.9 for high melody fidelity
 - [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail)
 - [01-01]: Temp directory isolation for ACE-Step UUID output before renaming
 - [01-02]: audio_cover_strength=0.3 is optimal (0.9 garbled, 0.5+ copies source)
 - [01-02]: cover_noise_strength must be 0.0 (any >0 produces source copies)
 - [01-02]: Seed variance dominates quality — multi-take cherry-pick workflow required
 - [01-02]: Simple instrument captions outperform verbose descriptive captions
 ### Pending Todos
@ -60,5 +64,5 @@ None yet.
 ## Session Continuity
 Last session: 2026-04-11
-Stopped at: Completed 01-01-PLAN.md
+Stopped at: Completed 01-02-PLAN.md — all plans in phase 01 done, awaiting verification
 Resume file: None
--- a/.planning/phases/01-core-pipeline/01-02-SUMMARY.md
+++ b/.planning/phases/01-core-pipeline/01-02-SUMMARY.md
@ -0,0 +1,83 @@
 ---
 phase: 01-core-pipeline
 plan: 02
 subsystem: pipeline
 tags: [ace-step, cover-mode, tuning, seed-control, quality-testing]
 requires:
  - phase: 01-01
    provides: "hum2inst.py CLI script"
 provides:
  - "Validated end-to-end pipeline with real humming input"
  - "Tuned generation parameters (strength=0.3 optimal)"
  - "Multi-take seed generation workflow"
  - "JSON run logging for reproducibility"
 affects: [02-quality-presets]
 tech-stack:
  added: []
  patterns: [multi-take-generation, json-sidecar-logging, seed-reproducibility]
 key-files:
  created: []
  modified: [hum2inst.py]
 key-decisions:
  - "audio_cover_strength=0.3 is optimal default (0.9 garbled, 0.5+ often copies source)"
  - "cover_noise_strength=0.0 required (any >0 produces near-identical source copy)"
  - "Seed variance dominates output quality — multi-take cherry-pick workflow is necessary"
  - "Custom verbose captions hurt quality — simple instrument captions work best"
  - "JSON sidecar logging for every output WAV to track parameters"
  - "Seed embedded in filename for traceability"
 patterns-established:
  - "Generate N takes with --takes, cherry-pick best by ear, reproduce with --seed"
  - "JSON sidecar per output with full parameter snapshot"
  - "Parameter sweep methodology: isolate one variable, same seed, compare"
 requirements-completed: [MEL-01, MEL-02, MEL-04, INST-01, INP-01, OUT-02, PIPE-01]
 ## Self-Check: PARTIAL
 quality-notes: |
  Pipeline runs end-to-end without errors. Best outputs follow melody contour and
  sound recognizably like target instrument. Quality is seed-dependent — some seeds
  produce excellent results, others garbled or off-topic. Strength=0.3 with default
  params is the confirmed sweet spot. This is a model capability ceiling, not a
  script issue.
 duration: 25min
 completed: 2026-04-11
 ---
 # Phase 1 Plan 02: End-to-End Pipeline Verification
 **Validated pipeline with real humming input, tuned parameters through systematic testing, added seed control and run logging**
 ## Performance
 - **Duration:** ~25 min (includes iterative tuning with user)
 - **Tasks:** 2/2 (Task 2 checkpoint resolved through iterative testing)
 - **Files modified:** 1
 ## Accomplishments
 - Pipeline ran successfully on first attempt with no code fixes needed
 - Systematic parameter sweep across strength, noise-strength, guidance, steps, shift, sampler, vel-clamp, vel-ema
 - Identified optimal defaults: strength=0.3, noise-strength=0.0, guidance=5.0, steps=50
 - Added seed control (--seed), multi-take generation (--takes), caption override (--caption)
 - Added 6 advanced tuning flags (--guidance, --steps, --shift, --sampler, --vel-clamp, --vel-ema)
 - Every output now has JSON sidecar with full parameter log
 - Seed embedded in output filename for traceability
 ## Key Findings
 - **audio_cover_strength** has a narrow sweet spot around 0.3 — above 0.5 often copies source, at 0.9 produces garbled "deep dream" output
 - **cover_noise_strength** is cliff-like — any value > 0 produces near-identical source copies
 - **Seed variance dominates** — same parameters produce wildly different results across seeds
 - **guidance=7.0** improved melody fidelity but shifted timbre too bright (glockenspiel-like)
 - **heun sampler** produced most authentic piano sound but melody diverged
 - **Custom verbose captions** (descriptive register/tone) degraded output severely
 - **Higher steps (100)** and **shift values** did not improve quality
 ## Task Commits
 - `45f6863` feat(01-02): add generation tuning params, seed control, multi-take, and run logging