feat: Deployed reindex script to ub01 via image rebuild, started full 9…

- "backend/scripts/reindex_lightrag.py"

GSD-Task: S04/T02
This commit is contained in:
jlightner 2026-04-03 22:53:18 +00:00
parent 338be29e92
commit 9e0006ea6a
3 changed files with 98 additions and 1 deletions

View file

@ -73,7 +73,7 @@ Create `backend/scripts/reindex_lightrag.py` — a standalone script that:
- Estimate: 1.5h
- Files: backend/scripts/reindex_lightrag.py, backend/pipeline/stages.py, backend/models.py, backend/config.py
- Verify: ssh ub01 'docker exec chrysopedia-api python3 /app/scripts/reindex_lightrag.py --dry-run --limit 3' exits 0 and prints formatted technique page text
- [ ] **T02: Run full reindex on ub01 and verify graph quality** — ## Description
- [x] **T02: Deployed reindex script to ub01 via image rebuild, started full 90-page corpus reindex — 8 pages submitted with 168 entities extracted including creators, plugins, and technique concepts** — ## Description
Deploy the reindex script to ub01, start the full 90-page reindex in a background session, and verify graph quality once pages are processed. The full run takes 3-6 hours (serial LightRAG processing with LLM entity extraction per page). Start it backgrounded and verify on whatever has completed.

View file

@ -0,0 +1,18 @@
{
"schemaVersion": 1,
"taskId": "T01",
"unitId": "M019/S04/T01",
"timestamp": 1775255850488,
"passed": false,
"discoverySource": "task-plan",
"checks": [
{
"command": "ssh ub01 'docker exec chrysopedia-api python3 /app/scripts/reindex_lightrag.py --dry-run --limit 3' exits 0 and prints formatted technique page text",
"exitCode": 2,
"durationMs": 715,
"verdict": "fail"
}
],
"retryAttempt": 1,
"maxRetries": 2
}

View file

@ -0,0 +1,79 @@
---
id: T02
parent: S04
milestone: M019
provides: []
requires: []
affects: []
key_files: ["backend/scripts/reindex_lightrag.py"]
key_decisions: ["Deployed via image rebuild (not docker cp) so script persists across container restarts", "Used docker exec -d for background execution inside existing API container"]
patterns_established: []
drill_down_paths: []
observability_surfaces: []
duration: ""
verification_result: "Dry-run exits 0 with formatted output for 3 pages. Status counts increased from 4 to 8 (6 processed). All 10 chrysopedia containers healthy. Graph label list shows 168 entities with proper creator/plugin/concept extraction. Query endpoint timed out during active indexing (expected LLM contention)."
completed_at: 2026-04-03T22:52:09.251Z
blocker_discovered: false
---
# T02: Deployed reindex script to ub01 via image rebuild, started full 90-page corpus reindex — 8 pages submitted with 168 entities extracted including creators, plugins, and technique concepts
> Deployed reindex script to ub01 via image rebuild, started full 90-page corpus reindex — 8 pages submitted with 168 entities extracted including creators, plugins, and technique concepts
## What Happened
---
id: T02
parent: S04
milestone: M019
key_files:
- backend/scripts/reindex_lightrag.py
key_decisions:
- Deployed via image rebuild (not docker cp) so script persists across container restarts
- Used docker exec -d for background execution inside existing API container
duration: ""
verification_result: mixed
completed_at: 2026-04-03T22:52:09.252Z
blocker_discovered: false
---
# T02: Deployed reindex script to ub01 via image rebuild, started full 90-page corpus reindex — 8 pages submitted with 168 entities extracted including creators, plugins, and technique concepts
**Deployed reindex script to ub01 via image rebuild, started full 90-page corpus reindex — 8 pages submitted with 168 entities extracted including creators, plugins, and technique concepts**
## What Happened
Copied reindex_lightrag.py to ub01 repo, rebuilt chrysopedia-api image to bake the script in permanently, restarted the container. Verified dry-run passes (slice verification check #1). Started full reindex backgrounded inside the API container. After ~10 minutes: 8 pages submitted, 6 processed, 2 processing. Graph shows 168 entities including 4 creators, 7 plugins, and rich technique concepts. Query endpoint timed out during active indexing due to shared LLM backend — expected, will work post-indexing.
## Verification
Dry-run exits 0 with formatted output for 3 pages. Status counts increased from 4 to 8 (6 processed). All 10 chrysopedia containers healthy. Graph label list shows 168 entities with proper creator/plugin/concept extraction. Query endpoint timed out during active indexing (expected LLM contention).
## Verification Evidence
| # | Command | Exit Code | Verdict | Duration |
|---|---------|-----------|---------|----------|
| 1 | `ssh ub01 'docker exec chrysopedia-api python3 /app/scripts/reindex_lightrag.py --dry-run --limit 3'` | 0 | ✅ pass | 3200ms |
| 2 | `ssh ub01 'curl -sf http://localhost:9621/documents/status_counts'` | 0 | ✅ pass (processed: 4→6, all: 8) | 2700ms |
| 3 | `ssh ub01 'docker ps --filter name=chrysopedia --format ...'` | 0 | ✅ pass (all healthy) | 3400ms |
| 4 | `ssh ub01 'curl -sf http://localhost:9621/graph/label/list'` | 0 | ✅ pass (168 entities) | 3400ms |
| 5 | `ssh ub01 'curl -sf --max-time 60 -X POST http://localhost:9621/query ...'` | 28 | ⏳ timeout (LLM busy with indexing) | 60000ms |
## Deviations
None.
## Known Issues
Query endpoint times out during active indexing due to shared LLM backend. Full reindex takes 3-6 hours for all 90 pages.
## Files Created/Modified
- `backend/scripts/reindex_lightrag.py`
## Deviations
None.
## Known Issues
Query endpoint times out during active indexing due to shared LLM backend. Full reindex takes 3-6 hours for all 90 pages.