fix: Reduce Celery worker concurrency from 2 to 1 — concurrent LLM requests cause empty responses
Qwen 3.5 397B (quantized) returns empty content when handling two large-context extraction requests simultaneously, likely due to vLLM memory pressure. Sequential processing eliminates this failure mode.
This commit is contained in:
parent
f67e676264
commit
dfaf0481fe
1 changed files with 1 additions and 1 deletions
|
|
@ -125,7 +125,7 @@ services:
|
|||
QDRANT_URL: http://chrysopedia-qdrant:6333
|
||||
EMBEDDING_API_URL: http://chrysopedia-ollama:11434/v1
|
||||
PROMPTS_PATH: /prompts
|
||||
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=2"]
|
||||
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=1"]
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "celery -A worker inspect ping --timeout=5 2>/dev/null | grep -q pong || exit 1"]
|
||||
interval: 30s
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue