Table of Contents

Search & Retrieval

Overview
LightRAG Integration

How It Works
Configuration
Fallback Behavior

4-Tier Creator-Scoped Cascade

Cascade Details
Response Fields

Observability
Key Decisions
Key Files

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Search & Retrieval

LightRAG-first search with automatic Qdrant fallback, plus a 4-tier creator-scoped retrieval cascade. Added in M021/S01–S02.

Overview

Search went through a major upgrade in M021: LightRAG replaced Qdrant as the primary search engine, with Qdrant retained as an automatic fallback. A 4-tier creator-scoped retrieval cascade was added for context-aware search when querying within a creator's content.

LightRAG Integration

LightRAG is a graph-based RAG engine running as a standalone service on port 9621. It replaced Qdrant as the primary search path for GET /api/v1/search.

How It Works

Query — SearchService._lightrag_search() POSTs to LightRAG /query/data with mode: "hybrid"
Parse — Response contains chunks (text passages with file_source metadata) and entities (graph nodes)
Extract — Technique slugs parsed from file_source paths using format technique:{slug}:creator:{uuid}
Lookup — Batch PostgreSQL query maps slugs to full TechniquePage records
Score — Position-based scoring (1.0 → 0.5 descending) since /query/data has no numeric relevance score (D039)
Supplement — Entity names matched against technique page titles as supplementary results

Configuration

Field	Default	Purpose
`lightrag_url`	`http://chrysopedia-lightrag:9621`	LightRAG service URL
`lightrag_search_timeout`	`2.0` (seconds)	Request timeout
`lightrag_min_query_length`	`3` (characters)	Queries shorter than this skip LightRAG

Fallback Behavior

LightRAG failures trigger automatic fallback to the existing Qdrant + keyword search path:

Timeout → fallback + WARNING log with reason=timeout
Connection error → fallback + WARNING log with reason=connection_error
HTTP error (e.g. 500) → fallback + WARNING log with reason=http_error
Empty results → fallback + WARNING log with reason=empty_results
Parse error → fallback + WARNING log with reason=parse_error
Short query (<3 chars) → skips LightRAG entirely, uses Qdrant directly

The fallback_used field in the search response indicates which engine served results.

4-Tier Creator-Scoped Cascade

When a ?creator= parameter is provided (e.g., from a creator profile page or the chat engine), search runs a progressive cascade that widens scope until results are found. Added in M021/S02 (D040).

Tier 1: Creator-scoped
  └─ LightRAG with ll_keywords=[creator_name], post-filter by creator_id (3× oversampling)
        │ empty?
        ▼
Tier 2: Domain-scoped
  └─ LightRAG with ll_keywords=[dominant_category] (requires ≥2 pages in category)
        │ empty?
        ▼
Tier 3: Global
  └─ Standard LightRAG search (no scoping)
        │ empty?
        ▼
Tier 4: None
  └─ cascade_tier="none" — no results from any tier

Cascade Details

Tier	Method	Scoping	Post-Filter
Creator	`_creator_scoped_search()`	`ll_keywords: [creator_name]`	Yes — filter by `creator_id`, request 3× `top_k`
Domain	`_domain_scoped_search()`	`ll_keywords: [domain]`	No — any creator in domain qualifies
Global	`_lightrag_search()`	None	No
None	—	—	— (empty result)

Domain detection: SQL aggregation finds the dominant topic_category across a creator's technique pages. Requires ≥2 pages in the category to declare a domain — fewer means insufficient signal.

Post-filtering with oversampling: Creator tier requests 3× the desired result count from LightRAG, then filters locally by creator_id. This compensates for LightRAG not supporting native creator filtering.

Response Fields

Field	Type	Description
`cascade_tier`	string	Which tier served: `"creator"`, `"domain"`, `"global"`, `"none"`, or `""` (no cascade)
`fallback_used`	boolean	`true` if Qdrant fallback was used instead of LightRAG

Observability

logger.info per LightRAG search: query, latency_ms, result_count
logger.info per cascade tier: query, creator, tier, latency_ms, result_count
logger.warning on any failure path with structured reason= tag
cascade_tier and fallback_used in API response for downstream consumers

Key Decisions

#	Decision	Choice	Rationale
D039	LightRAG scoring	Position-based (1.0 → 0.5)	`/query/data` has no numeric relevance score; sequential fallback to Qdrant
D040	Creator-scoped strategy	4-tier cascade (creator → domain → global → none)	Progressive widening ensures results while preferring creator context

Key Files

backend/search_service.py — SearchService with LightRAG integration and cascade methods
backend/config.py — LightRAG configuration fields
backend/schemas.py — cascade_tier in SearchResponse
backend/routers/search.py — ?creator= query parameter

Chrysopedia Wiki

Architecture

Features

Reference

Operations