fix: Reduce Celery worker concurrency from 2 to 1 — concurrent LLM requests cause empty responses

Qwen 3.5 397B (quantized) returns empty content when handling two large-context
extraction requests simultaneously, likely due to vLLM memory pressure. Sequential
processing eliminates this failure mode.
This commit is contained in:
jlightner 2026-03-30 05:37:21 +00:00
parent f67e676264
commit dfaf0481fe

View file

@ -125,7 +125,7 @@ services:
QDRANT_URL: http://chrysopedia-qdrant:6333
EMBEDDING_API_URL: http://chrysopedia-ollama:11434/v1
PROMPTS_PATH: /prompts
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=2"]
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=1"]
healthcheck:
test: ["CMD-SHELL", "celery -A worker inspect ping --timeout=5 2>/dev/null | grep -q pong || exit 1"]
interval: 30s