GraphRAG on pgvector: production engineering breakdown
How a SQL-first GraphRAG stack works with hybrid retrieval, Neo4j, and quality diagnostics. Built for real enterprise scenarios.
Executive Summary
This is not a demo RAG page, but an applied GraphRAG backend: semantic + keyword/fuzzy retrieval, graph enrichment, reranking, and observable diagnostics.
The system is Docker-deployable, predictable to tune with flags and thresholds, and reduces hallucination risk through a retrieval-first approach.
AI Overview
Architecture
FastAPI + PostgreSQL/pgvector + Neo4j + LLM.
Retrieval
Hybrid search with optional query correction and reranking.
GraphRAG
Graph links enrich context and improve recall.
Diagnostics
retrieval_query, corrected_query, chunks, graph_context, and latency fields.
Architecture Overview
User Question
β
Query Rewrite
β
Hybrid Retrieval
β
Graph Enrichment
β
Reranker
β
Context Builder
β
LLM Generation
β
Answer + Citations
Pipeline behavior is controlled by flags: hybrid_enabled, graph_enabled, reranker_enabled, correction_enabled, score_threshold, and graph_score_threshold.
Components and roles
How POST /ask works
1) Input parameters
question, top_k, graph_top_k, graph_hops, retrieval/graph/rerank/citations flags, optional system_prompt and chat_history.
2) History normalization and rewrite
Conversation history is handled separately, then converted into a retrieval-friendly query.
3) Hybrid retrieval + graph
Semantic and keyword/fuzzy channels are combined, then graph context is added using graph_top_k/graph_hops limits.
4) Generation and diagnostics
LLM answers from assembled context, while API returns answer plus diagnostic fields for quality control.
retrieval_query vs corrected_query
retrieval_query is the actual query sent into the retrieval pipeline (including history rewrite and retrieval-branch correction).
corrected_query reflects correction of the original user question. This split is useful for debugging and product analytics.
Anti-hallucination approach
Retrieval-first generation
Answers are grounded in retrieved context, not free-form generation.
Citations + structured filtering
Sources and metadata keep responses in verifiable boundaries.
Graph enrichment + reranking
Additional links and candidate reordering improve precision.
Thresholds + fallback
Weak candidates are filtered out; if context is insufficient, the system responds transparently.
Docker in production
The stack scales well and remains observable in containerized infrastructure.
Typical set: api, postgres+pgvector, neo4j, sometimes redis/queue for ingestion.
Config flags let you quickly switch between speed and quality profiles.
Retrieval-stage diagnostics simplify operations and incident response.
Consistent dev/stage/prod environments reduce regression risk.
Common RAG failure reasons
Weak chunking
Poor chunk boundaries/size hurt recall and final answer quality.
No hybrid channel
Pure embeddings often miss exact names, codes, and key tokens.
No transparent diagnostics
Without retrieval_query/chunks it is hard to understand why output became irrelevant.
No fallback strategy
The system must gracefully indicate when context is not enough.
Approach comparison
GraphRAG on pgvector
Great fit for SQL-first environments and tight integration with existing BI/data stacks.
Qdrant / Weaviate
Strong specialized vector DB options, convenient for rapid retrieval-layer rollout.
Practical conclusion
Best choice depends on your current stack, observability needs, and ops maturity.
Key point: DB brand is secondary; pipeline quality is primary: chunking, hybrid retrieval, reranking, filters, monitoring, and fallback.
FAQ
Can it run locally without external APIs?
Yes. The GraphRAG pipeline, vector/graph storage, and most retrieval stages can run in local infrastructure.
When is graph actually needed?
When entity relationships and multi-hop indirect dependencies matter and vector search alone is not enough.
What impacts quality more: model or retrieval?
In applied systems, retrieval quality is usually more critical: data, chunking, filters, ranking, and diagnostics.
How quickly can we build a pilot?
With a working Docker stack and prepared documents, a pilot is usually delivered in a few iterations.
Practical takeaway
GraphRAG on pgvector with hybrid retrieval and graph enrichment is a controllable path to enterprise AI search: transparent, reproducible, and quality-driven.
Contact via Telegram β