Audit findings & fixes: - temperature was never set (API defaulted to 1.0) → now explicit 0.0 for deterministic JSON - llm_max_tokens=65536 exceeded hard_limit=32768 → aligned to 32768 - Output ratio estimates were 5-30x too high (based on actual pipeline data): stage2: 0.6→0.05, stage3: 2.0→0.3, stage4: 0.5→0.3, stage5: 2.5→0.8 - request_params now structured as api_params (what's sent to LLM) vs pipeline_config (internal estimator settings) — no more ambiguous 'hard_limit' in request params - temperature=0.0 sent on both primary and fallback endpoints |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| embedding_client.py | ||
| llm_client.py | ||
| qdrant_client.py | ||
| schemas.py | ||
| stages.py | ||