jlightner
|
c6c15defee
|
feat: Dynamic token estimation for per-stage max_tokens
- Add estimate_tokens() and estimate_max_tokens() to llm_client with
stage-specific output ratios (0.3x segmentation, 1.2x extraction,
0.15x classification, 1.5x synthesis)
- Add max_tokens override parameter to LLMClient.complete()
- Wire all 4 pipeline stages to estimate max_tokens from actual prompt
content with 20% buffer and 2048 floor
- Add LLM_MAX_TOKENS_HARD_LIMIT=32768 config (dynamic estimator ceiling)
- Log token estimates alongside every LLM request
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-30 05:55:17 -05:00 |
|
jlightner
|
0b0ca598b4
|
feat: Log LLM response token usage (prompt/completion/total, content_len, finish_reason)
|
2026-03-30 06:15:24 +00:00 |
|
jlightner
|
cf759f3739
|
fix: Add max_tokens=16384 to LLM requests (OpenWebUI defaults to 1000, truncating pipeline JSON)
|
2026-03-30 04:08:29 +00:00 |
|
jlightner
|
4aa4b08a7f
|
feat: Per-stage LLM model routing with thinking modality and think-tag stripping
- Added 8 per-stage config fields: llm_stage{2-5}_model and llm_stage{2-5}_modality
- LLMClient.complete() accepts modality ('chat'/'thinking') and model_override
- Thinking modality: appends JSON instructions to system prompt, strips <think> tags
- strip_think_tags() handles multiline, multiple blocks, and edge cases
- Pipeline stages 2-5 read per-stage config and pass to LLM client
- Updated .env.example with per-stage model/modality documentation
- All 59 tests pass including new think-tag stripping test
|
2026-03-30 02:12:14 +00:00 |
|
jlightner
|
12cc86aef9
|
chore: Extended Settings with 12 LLM/embedding/Qdrant config fields, cr…
- "backend/config.py"
- "backend/worker.py"
- "backend/pipeline/schemas.py"
- "backend/pipeline/llm_client.py"
- "backend/requirements.txt"
- "backend/pipeline/__init__.py"
- "backend/pipeline/stages.py"
GSD-Task: S03/T01
|
2026-03-29 22:30:31 +00:00 |
|