diff --git a/.planning/phases/01-core-pipeline/01-RESEARCH.md b/.planning/phases/01-core-pipeline/01-RESEARCH.md
new file mode 100644
index 0000000..400c065
--- /dev/null
+++ b/.planning/phases/01-core-pipeline/01-RESEARCH.md
@@ -0,0 +1,374 @@
+# Phase 1: Core Pipeline - Research
+
+**Researched:** 2026-04-11
+**Domain:** ACE-Step cover mode CLI wrapper, audio I/O, Python scripting
+**Confidence:** HIGH
+
+## Summary
+
+Phase 1 wraps the existing ACE-Step 1.5 XL-SFT cover mode into a single `hum2inst.py` script. The user's prior experimentation has already validated the entire generation pipeline end-to-end: raw humming WAV goes into ACE-Step cover mode, a caption describes the target instrument, and the output preserves melody contour and rhythm. The technical challenge is purely integration and UX: reading the input WAV duration, constructing the right `GenerationParams`, initializing the handlers, calling `generate_music()`, and saving the output with a meaningful filename.
+
+The ACE-Step codebase exposes a clean Python API via `acestep.inference.generate_music()`, `acestep.handler.AceStepHandler`, and `acestep.llm_inference.LLMHandler`. Cover mode does NOT require the LLM handler (it's in the `skip_lm_tasks` set), which simplifies initialization. The script needs only the DiT handler with the XL-SFT model.
+
+**Primary recommendation:** Import ACE-Step's Python API directly (not subprocess). Initialize `AceStepHandler` + dummy `LLMHandler`, configure `GenerationParams` with `task_type="cover"`, and call `generate_music()`. Use `torchaudio` (already in the ACE-Step venv) to read input WAV duration.
+
+
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+- Invoked as `python hum2inst.py input.wav --instrument piano`
+- Python script directly, no installation step
+- `--instrument` as a named CLI flag (not positional)
+- `--output` flag optional, defaults to `./output/` directory
+- Use Python argparse for argument parsing (gives --help for free)
+- Default cover_strength in the high fidelity range (0.8-1.0)
+- `--strength` flag exposed in Phase 1 so users can experiment immediately
+- Duration matches input WAV length by default; `--duration` flag to override
+- Caption auto-built from instrument name (e.g., "piano cover of a melody") -- no custom prompt flag in Phase 1
+- Output filename includes instrument and timestamp (exact format at Claude's discretion)
+- On generation failure or silence: print clear error message, exit with non-zero code
+- No auto-play -- just save and print the output path
+- No silent failures
+- Single `hum2inst.py` script -- no module splitting in Phase 1
+- Assume CUDA GPU is available; fail with clear message if no GPU detected
+- Move existing experimental scripts (midi_to_audio.py, musicgen_melody.py) to an `/archive` folder
+
+### Claude's Discretion
+- ACE-Step invocation method (import Python API vs subprocess call -- choose based on what ACE-Step exposes)
+- Progress/feedback during generation (print statements, progress bar, or similar -- pick what's appropriate)
+- Exact output filename format (instrument + timestamp pattern)
+- Exact cover_strength default value within the 0.8-1.0 range
+
+### Deferred Ideas (OUT OF SCOPE)
+None -- discussion stayed within phase scope
+
+
+
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| MEL-01 | Output audio audibly follows the pitch contour of the hummed input melody | ACE-Step cover mode with audio_cover_strength 0.8-1.0 preserves pitch contour from source. Validated in prior testing (File 10 comparison). |
+| MEL-02 | Output audio preserves the rhythmic timing and phrasing of the hummed input | Cover mode encodes source into VAE latents, preserving temporal structure. Duration matching ensures timing alignment. |
+| MEL-04 | Output is musically coherent -- sounds like a real instrument performance | XL-SFT model (50 inference steps) produces coherent instrument audio. Caption guides timbre. |
+| INST-01 | User can specify target instrument via text prompt | `--instrument` flag maps to caption string (e.g., "solo acoustic piano, gentle melody"). |
+| INP-01 | Pipeline accepts raw humming WAV audio as input with no manual preprocessing | Cover mode takes raw audio directly via `src_audio` parameter. No MIDI extraction needed. |
+| OUT-02 | Output is saved as WAV file to a user-specified or default output directory | `generate_music()` saves to `save_dir`. Script renames output file with instrument+timestamp. |
+| PIPE-01 | Single CLI command or script invocation to go from humming WAV to instrument output | Single `python hum2inst.py input.wav --instrument piano` command. |
+
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| ACE-Step (acestep) | 1.5 XL-SFT | Music generation via cover mode | Already installed, validated for this exact use case |
+| torch | 2.7.1+cu128 | GPU compute, model inference | Already in ace-step/.venv |
+| torchaudio | 2.7.1+cu128 | WAV file reading (duration detection) | Already in ace-step/.venv, native torch integration |
+| argparse | stdlib | CLI argument parsing | User-specified, gives --help free |
+
+### Supporting
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| datetime | stdlib | Timestamp generation for output filenames | Always (output naming) |
+| pathlib/os | stdlib | Path manipulation, directory creation | Always (file I/O) |
+| sys | stdlib | Exit codes, stderr output | Error handling |
+
+### Alternatives Considered
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| Direct Python API import | subprocess calling `cli.py` | Subprocess adds overhead, harder error handling, but avoids import complexity. **Recommendation: use direct import** -- ACE-Step exposes clean Python API via `acestep.inference.generate_music()` |
+| torchaudio for duration | wave stdlib module | wave is simpler but torchaudio is already loaded and handles more formats |
+| argparse | click/typer | User specifically chose argparse for simplicity and --help |
+
+**Installation:**
+No new packages needed. Everything runs inside the existing `ace-step/.venv` environment.
+
+## Architecture Patterns
+
+### Recommended Project Structure
+```
+AiMusicPipeline/
+├── hum2inst.py # Single CLI entry point (Phase 1)
+├── ace-step/ # ACE-Step installation (existing)
+│ ├── acestep/ # ACE-Step Python package
+│ ├── checkpoints/ # Model weights (XL-SFT, VAE)
+│ └── .venv/ # Python environment
+├── input/ # User's humming WAV files
+├── output/ # Generated instrument audio
+└── archive/ # Moved experimental scripts
+ ├── midi_to_audio.py
+ └── musicgen_melody.py
+```
+
+### Pattern 1: ACE-Step Python API Direct Import
+**What:** Import `AceStepHandler`, `LLMHandler`, `GenerationParams`, `GenerationConfig`, and `generate_music` directly from the `acestep` package.
+**When to use:** Always -- this is the recommended invocation method.
+**Key finding:** Cover mode is in the `skip_lm_tasks` set, meaning the LLM handler is instantiated but NOT loaded with a model. This simplifies initialization significantly.
+
+**Initialization sequence (from cli.py analysis):**
+```python
+import sys
+import os
+
+# Add ace-step to path so acestep package is importable
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "ace-step"))
+
+from acestep.handler import AceStepHandler
+from acestep.llm_inference import LLMHandler
+from acestep.inference import GenerationParams, GenerationConfig, generate_music
+from acestep.gpu_config import get_gpu_config, set_global_gpu_config
+
+# GPU setup
+gpu_config = get_gpu_config()
+set_global_gpu_config(gpu_config)
+
+# Initialize handlers
+dit_handler = AceStepHandler()
+llm_handler = LLMHandler() # Not loaded for cover mode -- just instantiated
+
+# Initialize DiT with XL-SFT model
+dit_handler.initialize_service(
+ project_root=os.path.join(os.path.dirname(__file__), "ace-step"),
+ config_path="acestep-v15-xl-sft",
+ device="cuda",
+ use_flash_attention=dit_handler.is_flash_attention_available("cuda"),
+)
+```
+
+### Pattern 2: Cover Mode Generation Parameters
+**What:** Configure `GenerationParams` for cover mode with raw humming input.
+**Validated configuration (from prior testing):**
+```python
+params = GenerationParams(
+ task_type="cover",
+ src_audio="path/to/humming.wav",
+ caption="solo acoustic piano, gentle melody, warm tone",
+ lyrics="",
+ instrumental=True,
+ duration=10, # Match input WAV length
+ bpm=120, # Default; not critical for cover mode
+ audio_cover_strength=0.9, # High fidelity to source melody
+ inference_steps=50, # XL-SFT model uses 50 steps
+ guidance_scale=5.0,
+ thinking=False, # No LLM needed for cover
+)
+
+config = GenerationConfig(
+ batch_size=1,
+ use_random_seed=True,
+ audio_format="wav",
+)
+
+result = generate_music(
+ dit_handler, llm_handler, params, config,
+ save_dir="./output"
+)
+```
+
+### Pattern 3: Input WAV Duration Detection
+**What:** Read input WAV to determine output duration automatically.
+```python
+import torchaudio
+
+def get_wav_duration(wav_path: str) -> float:
+ info = torchaudio.info(wav_path)
+ return info.num_frames / info.sample_rate
+```
+
+### Pattern 4: Caption Construction from Instrument Name
+**What:** Build the ACE-Step caption from a simple instrument name.
+```python
+def build_caption(instrument: str) -> str:
+ return f"solo {instrument}, clear and expressive melody, warm tone"
+```
+**Note:** The caption strongly influences output timbre. Keep it focused on the instrument. Adding genre/mood modifiers is deferred to later phases.
+
+### Anti-Patterns to Avoid
+- **Subprocess invocation of cli.py:** cli.py has interactive wizard prompts that will hang in non-interactive mode. Use the Python API directly.
+- **Loading LLM model for cover mode:** Cover and repaint tasks are in `skip_lm_tasks`. The LLM handler must be instantiated (generate_music expects it) but should NOT have its model loaded -- this saves ~5GB VRAM and ~10s startup time.
+- **Setting `thinking=True` for cover mode:** This triggers LLM inference which is skipped anyway for cover, but may cause errors if no LLM model is loaded.
+- **Hardcoding `inference_steps=8`:** That's the turbo model default. XL-SFT needs 50 steps for quality output.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Audio file saving | Custom WAV writer | `generate_music()` with `save_dir` parameter | ACE-Step handles format, sample rate, normalization internally |
+| Model loading/initialization | Manual weight loading | `AceStepHandler.initialize_service()` | Handles config paths, checkpoints, device placement, flash attention |
+| Audio duration detection | Manual WAV header parsing | `torchaudio.info()` | Handles all formats, already in dependencies |
+| GPU detection | Custom CUDA checks | `acestep.gpu_config.get_gpu_config()` | Already handles CUDA/MPS/CPU detection with memory tier logic |
+
+**Key insight:** ACE-Step's existing Python API handles all the complex audio/ML plumbing. The wrapper script's job is purely argument parsing, caption construction, and output file management.
+
+## Common Pitfalls
+
+### Pitfall 1: sys.path and Package Import Order
+**What goes wrong:** `acestep` package is not on Python's path since hum2inst.py lives outside the ace-step directory.
+**Why it happens:** ACE-Step is installed as editable in its own venv, but hum2inst.py is at the project root.
+**How to avoid:** Add `sys.path.insert(0, "ace-step")` before importing from `acestep`. Alternatively, ensure the script is run from within the ace-step venv which has acestep installed as editable package.
+**Warning signs:** `ModuleNotFoundError: No module named 'acestep'`
+
+### Pitfall 2: Output File Renaming Race
+**What goes wrong:** `generate_music()` saves output with a UUID-based filename. The script needs to rename it to include instrument+timestamp.
+**Why it happens:** The `generate_music` API uses `generate_uuid_from_params()` for filenames, not user-friendly names.
+**How to avoid:** After `generate_music()` returns, read `result.audios[0]["path"]` to get the saved path, then rename/copy to the desired filename. Or pass a custom `save_dir` per-run and rename afterward.
+**Warning signs:** Output files named like `a3f2b8c1...wav` instead of `piano_20260411_143022.wav`
+
+### Pitfall 3: Duration Mismatch
+**What goes wrong:** Output audio is much longer or shorter than the input humming, causing melody to stretch or truncate.
+**Why it happens:** Not setting `duration` parameter, or setting it to -1 (auto), lets the model choose its own duration.
+**How to avoid:** Always measure input WAV duration with `torchaudio.info()` and pass it as the `duration` parameter. Round to nearest integer (ACE-Step duration is in seconds).
+**Warning signs:** 10-second humming produces 30-second output, or vice versa.
+
+### Pitfall 4: Silence Detection
+**What goes wrong:** Model occasionally generates near-silence, especially with very short inputs or unusual timbres.
+**Why it happens:** Cover mode's noise injection at high audio_cover_strength sometimes produces degenerate outputs.
+**How to avoid:** After generation, check if the output audio tensor (from `result.audios[0]["tensor"]`) has RMS energy above a threshold. If too quiet, report error.
+**Warning signs:** Output WAV file exists but plays as silence or very faint noise.
+
+### Pitfall 5: CUDA Not Available
+**What goes wrong:** Script crashes with CUDA errors on a machine without GPU.
+**Why it happens:** User decision is to assume CUDA and fail clearly.
+**How to avoid:** Check `torch.cuda.is_available()` early and exit with a clear message before any model loading.
+**Warning signs:** RuntimeError about CUDA device.
+
+## Code Examples
+
+### Complete Invocation Flow (verified from cli.py source)
+
+```python
+# 1. Check CUDA
+import torch
+if not torch.cuda.is_available():
+ print("ERROR: CUDA GPU required. No CUDA-capable GPU detected.", file=sys.stderr)
+ sys.exit(1)
+
+# 2. Get input duration
+import torchaudio
+info = torchaudio.info(input_wav_path)
+duration = info.num_frames / info.sample_rate
+
+# 3. Initialize handlers (from cli.py lines 1318-1319, 1411-1419)
+from acestep.gpu_config import get_gpu_config, set_global_gpu_config
+gpu_config = get_gpu_config()
+set_global_gpu_config(gpu_config)
+
+dit_handler = AceStepHandler()
+llm_handler = LLMHandler()
+
+dit_handler.initialize_service(
+ project_root="./ace-step",
+ config_path="acestep-v15-xl-sft",
+ device="cuda",
+ use_flash_attention=dit_handler.is_flash_attention_available("cuda"),
+)
+
+# 4. Configure generation (from prior testing + cli.py lines 1629-1676)
+params = GenerationParams(
+ task_type="cover",
+ src_audio=str(input_wav_path),
+ caption=f"solo {instrument}, clear and expressive melody",
+ lyrics="",
+ instrumental=True,
+ duration=round(duration),
+ audio_cover_strength=0.9,
+ inference_steps=50,
+ guidance_scale=5.0,
+ thinking=False,
+)
+
+config = GenerationConfig(
+ batch_size=1,
+ use_random_seed=True,
+ audio_format="wav",
+)
+
+# 5. Generate
+result = generate_music(dit_handler, llm_handler, params, config, save_dir=output_dir)
+
+# 6. Check result
+if not result.success or not result.audios:
+ print(f"ERROR: Generation failed: {result.error or 'no audio produced'}", file=sys.stderr)
+ sys.exit(1)
+
+output_path = result.audios[0]["path"]
+```
+
+### Caption Construction Examples
+```python
+# Simple instrument mapping -- keep captions focused on instrument
+CAPTION_TEMPLATES = {
+ "piano": "solo acoustic piano, gentle melody, warm tone, clear and expressive",
+ "guitar": "solo acoustic guitar, fingerpicked melody, warm and intimate",
+ "saxophone": "solo saxophone, smooth jazz melody, soulful and expressive",
+ "violin": "solo violin, classical melody, rich and emotional",
+ "flute": "solo flute, gentle melody, airy and delicate",
+}
+
+def build_caption(instrument: str) -> str:
+ instrument_lower = instrument.lower().strip()
+ if instrument_lower in CAPTION_TEMPLATES:
+ return CAPTION_TEMPLATES[instrument_lower]
+ # Generic fallback for any instrument name
+ return f"solo {instrument_lower}, clear and expressive melody, warm tone"
+```
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| MusicGen melody conditioning | ACE-Step XL-SFT cover mode | 2026-04-11 testing | MusicGen chromagram conditioning too lossy; ACE-Step preserves melody+rhythm |
+| MIDI extraction pipeline | Direct raw audio input | 2026-04-11 testing | No intermediate steps needed; raw humming goes directly into cover mode |
+| Turbo model (8 steps) | XL-SFT model (50 steps) | 2026-04-11 testing | Better quality for cover mode; ~3s generation on RTX 4090 |
+
+**Deprecated/outdated:**
+- `musicgen_melody.py`: MusicGen approach abandoned. Moving to `/archive`.
+- `midi_to_audio.py`: MIDI synthesis no longer needed. Moving to `/archive`.
+
+## Open Questions
+
+1. **Optimal audio_cover_strength default**
+ - What we know: 0.8 worked well in testing. Range 0.8-1.0 specified by user.
+ - What's unclear: Whether 0.85 or 0.9 might be more universally good across different instruments.
+ - Recommendation: Default to 0.9 (high fidelity bias per user preference). Expose `--strength` flag so users can tune.
+
+2. **Caption quality impact on different instruments**
+ - What we know: Piano captions worked well in testing.
+ - What's unclear: How well generic captions work for uncommon instruments (e.g., "solo theremin").
+ - Recommendation: Include a small set of curated caption templates for common instruments, with a generic fallback for anything else.
+
+3. **Output silence detection threshold**
+ - What we know: User wants clear error on silence/failure.
+ - What's unclear: What RMS threshold constitutes "silence" vs "very quiet music."
+ - Recommendation: Use a conservative threshold (e.g., RMS < -60 dBFS). Log a warning rather than hard-failing for borderline cases.
+
+4. **Project root path resolution**
+ - What we know: `hum2inst.py` lives at project root, `ace-step/` is a subdirectory.
+ - What's unclear: Whether `initialize_service(project_root=...)` needs an absolute path.
+ - Recommendation: Use `os.path.abspath()` to resolve the ace-step directory path relative to the script location.
+
+## Sources
+
+### Primary (HIGH confidence)
+- `ace-step/cli.py` -- Full CLI implementation showing exact initialization sequence, parameter defaults, and cover mode handling
+- `ace-step/acestep/inference.py` -- `GenerationParams`, `GenerationConfig`, `GenerationResult` dataclasses and `generate_music()` function
+- `ace-step/acestep/handler.py` -- `AceStepHandler` class with `initialize_service()` method
+- `ace-step/acestep/constants.py` -- `TASK_INSTRUCTIONS` dictionary showing cover mode instruction
+- User's prior testing notes (`pipeline_hum_to_instrument.md`) -- Validated configuration and results
+
+### Secondary (MEDIUM confidence)
+- `ace-step/requirements.txt` -- Dependency versions for the project environment
+
+### Tertiary (LOW confidence)
+- None -- all findings verified against actual source code
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH -- all libraries already installed and validated in prior testing
+- Architecture: HIGH -- direct API import path fully traced through cli.py source code
+- Pitfalls: HIGH -- identified from actual code analysis (not speculation)
+
+**Research date:** 2026-04-11
+**Valid until:** 2026-05-11 (stable -- ACE-Step codebase is local and pinned)