fix: Reduce Celery worker concurrency from 2 to 1 — concurrent LLM requests cause empty responses

Qwen 3.5 397B (quantized) returns empty content when handling two large-context extraction requests simultaneously, likely due to vLLM memory pressure. Sequential processing eliminates this failure mode.
2026-03-30 05:37:21 +00:00 · 2026-03-30 05:37:21 +00:00 · dfaf0481fe
commit dfaf0481fe
parent f67e676264
1 changed files with 1 additions and 1 deletions
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -125,7 +125,7 @@ services:
      QDRANT_URL: http://chrysopedia-qdrant:6333
      EMBEDDING_API_URL: http://chrysopedia-ollama:11434/v1
      PROMPTS_PATH: /prompts
-    command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=2"]
+    command: ["celery", "-A", "worker", "worker", "--loglevel=info", "--concurrency=1"]
    healthcheck:
      test: ["CMD-SHELL", "celery -A worker inspect ping --timeout=5 2>/dev/null | grep -q pong || exit 1"]
      interval: 30s