🧠 GraphRAG · pgvector

GraphRAG on pgvector: production engineering breakdown

How a SQL-first GraphRAG stack works with hybrid retrieval, Neo4j, and quality diagnostics. Built for real enterprise scenarios.

Executive Summary

This is not a demo RAG page, but an applied GraphRAG backend: semantic + keyword/fuzzy retrieval, graph enrichment, reranking, and observable diagnostics.

The system is Docker-deployable, predictable to tune with flags and thresholds, and reduces hallucination risk through a retrieval-first approach.

AI Overview

Architecture

FastAPI + PostgreSQL/pgvector + Neo4j + LLM.

Retrieval

Hybrid search with optional query correction and reranking.

GraphRAG

Graph links enrich context and improve recall.

Diagnostics

retrieval_query, corrected_query, chunks, graph_context, and latency fields.

Architecture Overview

User Question
    ↓
Query Rewrite
    ↓
Hybrid Retrieval
    ↓
Graph Enrichment
    ↓
Reranker
    ↓
Context Builder
    ↓
LLM Generation
    ↓
Answer + Citations

Pipeline behavior is controlled by flags: hybrid_enabled, graph_enabled, reranker_enabled, correction_enabled, score_threshold, and graph_score_threshold.

Components and roles

Component	Purpose
FastAPI	API layer: /ask, ingestion, and debug endpoints.
PostgreSQL + pgvector	Embeddings/chunks/metadata storage and semantic retrieval.
Neo4j	Graph links between entities for enrichment.
Reranker	Post-retrieval relevance optimization before generation.
Docker / Compose	Reproducible and controlled production environment.

How POST /ask works

1) Input parameters

question, top_k, graph_top_k, graph_hops, retrieval/graph/rerank/citations flags, optional system_prompt and chat_history.

2) History normalization and rewrite

Conversation history is handled separately, then converted into a retrieval-friendly query.

3) Hybrid retrieval + graph

Semantic and keyword/fuzzy channels are combined, then graph context is added using graph_top_k/graph_hops limits.

4) Generation and diagnostics

LLM answers from assembled context, while API returns answer plus diagnostic fields for quality control.

retrieval_query vs corrected_query

retrieval_query is the actual query sent into the retrieval pipeline (including history rewrite and retrieval-branch correction).

corrected_query reflects correction of the original user question. This split is useful for debugging and product analytics.

Anti-hallucination approach

Retrieval-first generation

Answers are grounded in retrieved context, not free-form generation.

Citations + structured filtering

Sources and metadata keep responses in verifiable boundaries.

Graph enrichment + reranking

Additional links and candidate reordering improve precision.

Thresholds + fallback

Weak candidates are filtered out; if context is insufficient, the system responds transparently.

Docker in production

The stack scales well and remains observable in containerized infrastructure.

Services

Typical set: api, postgres+pgvector, neo4j, sometimes redis/queue for ingestion.

Control

Config flags let you quickly switch between speed and quality profiles.

Observability

Retrieval-stage diagnostics simplify operations and incident response.

Reproducibility

Consistent dev/stage/prod environments reduce regression risk.

Common RAG failure reasons

Weak chunking

Poor chunk boundaries/size hurt recall and final answer quality.

No hybrid channel

Pure embeddings often miss exact names, codes, and key tokens.

No transparent diagnostics

Without retrieval_query/chunks it is hard to understand why output became irrelevant.

No fallback strategy

The system must gracefully indicate when context is not enough.

Approach comparison

GraphRAG on pgvector

Great fit for SQL-first environments and tight integration with existing BI/data stacks.

Qdrant / Weaviate

Strong specialized vector DB options, convenient for rapid retrieval-layer rollout.

Practical conclusion

Best choice depends on your current stack, observability needs, and ops maturity.

Key point: DB brand is secondary; pipeline quality is primary: chunking, hybrid retrieval, reranking, filters, monitoring, and fallback.

FAQ

Can it run locally without external APIs?

Yes. The GraphRAG pipeline, vector/graph storage, and most retrieval stages can run in local infrastructure.

When is graph actually needed?

When entity relationships and multi-hop indirect dependencies matter and vector search alone is not enough.

What impacts quality more: model or retrieval?

In applied systems, retrieval quality is usually more critical: data, chunking, filters, ranking, and diagnostics.

How quickly can we build a pilot?

With a working Docker stack and prepared documents, a pilot is usually delivered in a few iterations.

Practical takeaway

GraphRAG on pgvector with hybrid retrieval and graph enrichment is a controllable path to enterprise AI search: transparent, reproducible, and quality-driven.

Contact via Telegram →