docs(01-02): complete end-to-end verification plan with tuning findings

This commit is contained in:
John Lightner 2026-04-11 03:25:42 -05:00
parent 45f6863131
commit 38331ca59a
2 changed files with 93 additions and 6 deletions

View file

@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-04-11)
## Current Position ## Current Position
Phase: 1 of 4 (Core Pipeline) Phase: 1 of 4 (Core Pipeline)
Plan: 1 of 3 in current phase Plan: 2 of 2 in current phase
Status: Executing Status: Phase complete — awaiting verification
Last activity: 2026-04-11 -- Completed 01-01-PLAN.md Last activity: 2026-04-11 -- Completed 01-02-PLAN.md
Progress: [#.........] 10% Progress: [##........] 20%
## Performance Metrics ## Performance Metrics
@ -27,7 +27,7 @@ Progress: [#.........] 10%
| Phase | Plans | Total | Avg/Plan | | Phase | Plans | Total | Avg/Plan |
|-------|-------|-------|----------| |-------|-------|-------|----------|
| 01-core-pipeline | 01-01 | 2min | 2min | | 01-core-pipeline | 01-01, 01-02 | 27min | 13.5min |
**Recent Trend:** **Recent Trend:**
- Last 5 plans: - - Last 5 plans: -
@ -48,6 +48,10 @@ Recent decisions affecting current work:
- [01-01]: Default audio_cover_strength=0.9 for high melody fidelity - [01-01]: Default audio_cover_strength=0.9 for high melody fidelity
- [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail) - [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail)
- [01-01]: Temp directory isolation for ACE-Step UUID output before renaming - [01-01]: Temp directory isolation for ACE-Step UUID output before renaming
- [01-02]: audio_cover_strength=0.3 is optimal (0.9 garbled, 0.5+ copies source)
- [01-02]: cover_noise_strength must be 0.0 (any >0 produces source copies)
- [01-02]: Seed variance dominates quality — multi-take cherry-pick workflow required
- [01-02]: Simple instrument captions outperform verbose descriptive captions
### Pending Todos ### Pending Todos
@ -60,5 +64,5 @@ None yet.
## Session Continuity ## Session Continuity
Last session: 2026-04-11 Last session: 2026-04-11
Stopped at: Completed 01-01-PLAN.md Stopped at: Completed 01-02-PLAN.md — all plans in phase 01 done, awaiting verification
Resume file: None Resume file: None

View file

@ -0,0 +1,83 @@
---
phase: 01-core-pipeline
plan: 02
subsystem: pipeline
tags: [ace-step, cover-mode, tuning, seed-control, quality-testing]
requires:
- phase: 01-01
provides: "hum2inst.py CLI script"
provides:
- "Validated end-to-end pipeline with real humming input"
- "Tuned generation parameters (strength=0.3 optimal)"
- "Multi-take seed generation workflow"
- "JSON run logging for reproducibility"
affects: [02-quality-presets]
tech-stack:
added: []
patterns: [multi-take-generation, json-sidecar-logging, seed-reproducibility]
key-files:
created: []
modified: [hum2inst.py]
key-decisions:
- "audio_cover_strength=0.3 is optimal default (0.9 garbled, 0.5+ often copies source)"
- "cover_noise_strength=0.0 required (any >0 produces near-identical source copy)"
- "Seed variance dominates output quality — multi-take cherry-pick workflow is necessary"
- "Custom verbose captions hurt quality — simple instrument captions work best"
- "JSON sidecar logging for every output WAV to track parameters"
- "Seed embedded in filename for traceability"
patterns-established:
- "Generate N takes with --takes, cherry-pick best by ear, reproduce with --seed"
- "JSON sidecar per output with full parameter snapshot"
- "Parameter sweep methodology: isolate one variable, same seed, compare"
requirements-completed: [MEL-01, MEL-02, MEL-04, INST-01, INP-01, OUT-02, PIPE-01]
## Self-Check: PARTIAL
quality-notes: |
Pipeline runs end-to-end without errors. Best outputs follow melody contour and
sound recognizably like target instrument. Quality is seed-dependent — some seeds
produce excellent results, others garbled or off-topic. Strength=0.3 with default
params is the confirmed sweet spot. This is a model capability ceiling, not a
script issue.
duration: 25min
completed: 2026-04-11
---
# Phase 1 Plan 02: End-to-End Pipeline Verification
**Validated pipeline with real humming input, tuned parameters through systematic testing, added seed control and run logging**
## Performance
- **Duration:** ~25 min (includes iterative tuning with user)
- **Tasks:** 2/2 (Task 2 checkpoint resolved through iterative testing)
- **Files modified:** 1
## Accomplishments
- Pipeline ran successfully on first attempt with no code fixes needed
- Systematic parameter sweep across strength, noise-strength, guidance, steps, shift, sampler, vel-clamp, vel-ema
- Identified optimal defaults: strength=0.3, noise-strength=0.0, guidance=5.0, steps=50
- Added seed control (--seed), multi-take generation (--takes), caption override (--caption)
- Added 6 advanced tuning flags (--guidance, --steps, --shift, --sampler, --vel-clamp, --vel-ema)
- Every output now has JSON sidecar with full parameter log
- Seed embedded in output filename for traceability
## Key Findings
- **audio_cover_strength** has a narrow sweet spot around 0.3 — above 0.5 often copies source, at 0.9 produces garbled "deep dream" output
- **cover_noise_strength** is cliff-like — any value > 0 produces near-identical source copies
- **Seed variance dominates** — same parameters produce wildly different results across seeds
- **guidance=7.0** improved melody fidelity but shifted timbre too bright (glockenspiel-like)
- **heun sampler** produced most authentic piano sound but melody diverged
- **Custom verbose captions** (descriptive register/tone) degraded output severely
- **Higher steps (100)** and **shift values** did not improve quality
## Task Commits
- `45f6863` feat(01-02): add generation tuning params, seed control, multi-take, and run logging