fix: Reduce Celery worker concurrency from 2 to 1 — concurrent LLM requests cause empty responses
Qwen 3.5 397B (quantized) returns empty content when handling two large-context extraction requests simultaneously, likely due to vLLM memory pressure. Sequential processing eliminates this failure mode.
This commit is contained in:
parent
f67e676264
commit
dfaf0481fe
1 changed files with 1 additions and 1 deletions
|
|
@ -125,7 +125,7 @@ services:
|
||||||
QDRANT_URL: http://chrysopedia-qdrant:6333
|
QDRANT_URL: http://chrysopedia-qdrant:6333
|
||||||
EMBEDDING_API_URL: http://chrysopedia-ollama:11434/v1
|
EMBEDDING_API_URL: http://chrysopedia-ollama:11434/v1
|
||||||
PROMPTS_PATH: /prompts
|
PROMPTS_PATH: /prompts
|
||||||
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=2"]
|
command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=1"]
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD-SHELL", "celery -A worker inspect ping --timeout=5 2>/dev/null | grep -q pong || exit 1"]
|
test: ["CMD-SHELL", "celery -A worker inspect ping --timeout=5 2>/dev/null | grep -q pong || exit 1"]
|
||||||
interval: 30s
|
interval: 30s
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue