docs(01-02): complete end-to-end verification plan with tuning findings
This commit is contained in:
parent
45f6863131
commit
38331ca59a
2 changed files with 93 additions and 6 deletions
|
|
@ -10,11 +10,11 @@ See: .planning/PROJECT.md (updated 2026-04-11)
|
||||||
## Current Position
|
## Current Position
|
||||||
|
|
||||||
Phase: 1 of 4 (Core Pipeline)
|
Phase: 1 of 4 (Core Pipeline)
|
||||||
Plan: 1 of 3 in current phase
|
Plan: 2 of 2 in current phase
|
||||||
Status: Executing
|
Status: Phase complete — awaiting verification
|
||||||
Last activity: 2026-04-11 -- Completed 01-01-PLAN.md
|
Last activity: 2026-04-11 -- Completed 01-02-PLAN.md
|
||||||
|
|
||||||
Progress: [#.........] 10%
|
Progress: [##........] 20%
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
|
|
@ -27,7 +27,7 @@ Progress: [#.........] 10%
|
||||||
|
|
||||||
| Phase | Plans | Total | Avg/Plan |
|
| Phase | Plans | Total | Avg/Plan |
|
||||||
|-------|-------|-------|----------|
|
|-------|-------|-------|----------|
|
||||||
| 01-core-pipeline | 01-01 | 2min | 2min |
|
| 01-core-pipeline | 01-01, 01-02 | 27min | 13.5min |
|
||||||
|
|
||||||
**Recent Trend:**
|
**Recent Trend:**
|
||||||
- Last 5 plans: -
|
- Last 5 plans: -
|
||||||
|
|
@ -48,6 +48,10 @@ Recent decisions affecting current work:
|
||||||
- [01-01]: Default audio_cover_strength=0.9 for high melody fidelity
|
- [01-01]: Default audio_cover_strength=0.9 for high melody fidelity
|
||||||
- [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail)
|
- [01-01]: Conservative -60 dBFS silence threshold with warning (not hard fail)
|
||||||
- [01-01]: Temp directory isolation for ACE-Step UUID output before renaming
|
- [01-01]: Temp directory isolation for ACE-Step UUID output before renaming
|
||||||
|
- [01-02]: audio_cover_strength=0.3 is optimal (0.9 garbled, 0.5+ copies source)
|
||||||
|
- [01-02]: cover_noise_strength must be 0.0 (any >0 produces source copies)
|
||||||
|
- [01-02]: Seed variance dominates quality — multi-take cherry-pick workflow required
|
||||||
|
- [01-02]: Simple instrument captions outperform verbose descriptive captions
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
|
|
@ -60,5 +64,5 @@ None yet.
|
||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-04-11
|
Last session: 2026-04-11
|
||||||
Stopped at: Completed 01-01-PLAN.md
|
Stopped at: Completed 01-02-PLAN.md — all plans in phase 01 done, awaiting verification
|
||||||
Resume file: None
|
Resume file: None
|
||||||
|
|
|
||||||
83
.planning/phases/01-core-pipeline/01-02-SUMMARY.md
Normal file
83
.planning/phases/01-core-pipeline/01-02-SUMMARY.md
Normal file
|
|
@ -0,0 +1,83 @@
|
||||||
|
---
|
||||||
|
phase: 01-core-pipeline
|
||||||
|
plan: 02
|
||||||
|
subsystem: pipeline
|
||||||
|
tags: [ace-step, cover-mode, tuning, seed-control, quality-testing]
|
||||||
|
|
||||||
|
requires:
|
||||||
|
- phase: 01-01
|
||||||
|
provides: "hum2inst.py CLI script"
|
||||||
|
provides:
|
||||||
|
- "Validated end-to-end pipeline with real humming input"
|
||||||
|
- "Tuned generation parameters (strength=0.3 optimal)"
|
||||||
|
- "Multi-take seed generation workflow"
|
||||||
|
- "JSON run logging for reproducibility"
|
||||||
|
affects: [02-quality-presets]
|
||||||
|
|
||||||
|
tech-stack:
|
||||||
|
added: []
|
||||||
|
patterns: [multi-take-generation, json-sidecar-logging, seed-reproducibility]
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created: []
|
||||||
|
modified: [hum2inst.py]
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "audio_cover_strength=0.3 is optimal default (0.9 garbled, 0.5+ often copies source)"
|
||||||
|
- "cover_noise_strength=0.0 required (any >0 produces near-identical source copy)"
|
||||||
|
- "Seed variance dominates output quality — multi-take cherry-pick workflow is necessary"
|
||||||
|
- "Custom verbose captions hurt quality — simple instrument captions work best"
|
||||||
|
- "JSON sidecar logging for every output WAV to track parameters"
|
||||||
|
- "Seed embedded in filename for traceability"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Generate N takes with --takes, cherry-pick best by ear, reproduce with --seed"
|
||||||
|
- "JSON sidecar per output with full parameter snapshot"
|
||||||
|
- "Parameter sweep methodology: isolate one variable, same seed, compare"
|
||||||
|
|
||||||
|
requirements-completed: [MEL-01, MEL-02, MEL-04, INST-01, INP-01, OUT-02, PIPE-01]
|
||||||
|
|
||||||
|
## Self-Check: PARTIAL
|
||||||
|
|
||||||
|
quality-notes: |
|
||||||
|
Pipeline runs end-to-end without errors. Best outputs follow melody contour and
|
||||||
|
sound recognizably like target instrument. Quality is seed-dependent — some seeds
|
||||||
|
produce excellent results, others garbled or off-topic. Strength=0.3 with default
|
||||||
|
params is the confirmed sweet spot. This is a model capability ceiling, not a
|
||||||
|
script issue.
|
||||||
|
|
||||||
|
duration: 25min
|
||||||
|
completed: 2026-04-11
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 1 Plan 02: End-to-End Pipeline Verification
|
||||||
|
|
||||||
|
**Validated pipeline with real humming input, tuned parameters through systematic testing, added seed control and run logging**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** ~25 min (includes iterative tuning with user)
|
||||||
|
- **Tasks:** 2/2 (Task 2 checkpoint resolved through iterative testing)
|
||||||
|
- **Files modified:** 1
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
- Pipeline ran successfully on first attempt with no code fixes needed
|
||||||
|
- Systematic parameter sweep across strength, noise-strength, guidance, steps, shift, sampler, vel-clamp, vel-ema
|
||||||
|
- Identified optimal defaults: strength=0.3, noise-strength=0.0, guidance=5.0, steps=50
|
||||||
|
- Added seed control (--seed), multi-take generation (--takes), caption override (--caption)
|
||||||
|
- Added 6 advanced tuning flags (--guidance, --steps, --shift, --sampler, --vel-clamp, --vel-ema)
|
||||||
|
- Every output now has JSON sidecar with full parameter log
|
||||||
|
- Seed embedded in output filename for traceability
|
||||||
|
|
||||||
|
## Key Findings
|
||||||
|
- **audio_cover_strength** has a narrow sweet spot around 0.3 — above 0.5 often copies source, at 0.9 produces garbled "deep dream" output
|
||||||
|
- **cover_noise_strength** is cliff-like — any value > 0 produces near-identical source copies
|
||||||
|
- **Seed variance dominates** — same parameters produce wildly different results across seeds
|
||||||
|
- **guidance=7.0** improved melody fidelity but shifted timbre too bright (glockenspiel-like)
|
||||||
|
- **heun sampler** produced most authentic piano sound but melody diverged
|
||||||
|
- **Custom verbose captions** (descriptive register/tone) degraded output severely
|
||||||
|
- **Higher steps (100)** and **shift values** did not improve quality
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
- `45f6863` feat(01-02): add generation tuning params, seed control, multi-take, and run logging
|
||||||
Loading…
Add table
Reference in a new issue