GitHub - JayLZhou/GraphRAG: In-depth study of the graphrag

👾 DIGIMON: Deep Analysis of Graph-Based Retrieval-Augmented Generation (RAG) Systems

GraphRAG is a popular 🔥🔥🔥 and powerful 💪💪💪 RAG system! 🚀💡 Inspired by systems like Microsoft's, graph-based RAG is unlocking endless possibilities in AI.

Our project focuses on modularizing and decoupling these methods 🧩 to unveil the mystery 🕵️‍♂️🔍✨ behind them and share fun and valuable insights! 🤩💫 Our project🔨 is included in Awesome Graph-based RAG.

If you find our work helpful, please kindly cite our paper.
Download the datasets GraphRAG-dataset

Quick Start 🚀

From Source

# Clone the repository from GitHub
git clone https://github.com/JayLZhou/GraphRAG.git
cd GraphRAG

Run a Method

You can run different GraphRAG methods by specifying the corresponding configuration file (.yaml).

Example: Running RAPTOR

python main.py -opt Option/Method/RAPTOR.yaml -dataset_name your_dataset

Available Methods:

The following methods are available, and each can be run using the same command format:

python main.py -opt Option/Method/<METHOD>.yaml -dataset_name your_dataset

Replace <METHOD> with one of the following:

Dalk
GR
LGraphRAG (Local search in GraphRAG)
GGraphRAG (Global search in GraphRAG)
HippoRAG
KGP
LightRAG
RAPTOR
ToG

For example, to run GraphRAG:

python main.py -opt Option/Method/GraphRAG.yaml -dataset_name your_dataset

Dependencies

Ensure you have the required dependencies installed (The default experiment name is digimon):

conda env create -f experiment.yml -n your_experiment_name

Supported LLM Backends

GraphRAG supports both cloud-based and local deployment of LLMs:

Cloud-based models: OpenAI (e.g., gpt-4, gpt-3.5-turbo)
Locally deployed models: Ollama and LlamaFactory

To use a local model, set api_type to open_llm in the configuration file.

Example Configuration (`config.yaml`):

llm:
  api_type: "openai/open_llm"  # Options: "openai" or "open_llm" (For Ollama and LlamaFactory) 
  model: "YOUR_LOCAL_MODEL_NAME"
  base_url: "YOUR_LOCAL_URL"  # Change this for local models
  api_key: "YOUR_API_KEY"  # Not required for local models

For `LlamaFactory` or `Ollama`, ensure the model is correctly installed and running in your local environment.

You can refer to the Readme of LlamaFactory

llm:
  api_type: "open_llm"  # Options: "openai" or "open_llm" (For Ollama and LlamaFactory) 
  model: "YOUR_LOCAL_MODEL_NAME"
  base_url: "YOUR_LOCAL_URL"  # Change this for local models
  api_key: "ANY_THING_IS_OKAY"  # Not required for local models

Representative Methods

We select the following Graph RAG methods:

Method	Description	Graph Type
RAPTOR	ICLR 2024	Tree
KGP	AAAI 2024	Passage Graph
DALK	EMNLP 2024	KG
HippoRAG	NIPS 2024	KG
G-retriever	NIPS 2024	KG
ToG	ICLR 2024	KG
MS GraphRAG	Microsoft Project	TKG
FastGraphRAG	CircleMind Project	TKG
LightRAG	High Star Project	RKG

Graph Types

Based on the entity and relation, we categorize the graph into the following types:

Chunk Tree: A tree structure formed by document content and summary.
Passage Graph: A relational network composed of passages, tables, and other elements within documents.
KG: knowledge graph (KG) is constructed by extracting entities and relationships from each chunk, which contains only entities and relations, is commonly represented as triples.
TKG: A textual knowledge graph (TKG) is a specialized KG (following the same construction step as KG), which enriches entities with detailed descriptions and type information.
RKG: A rich knowledge graph (RKG), which further incorporates keywords associated with relations.

The criteria for the classification of graph types are as follows:

Graph Attributes	Chunk Tree	Passage Graph	KG	TKG	RKG
Original Content	✅	✅	❌	❌	❌
Entity Name	❌	❌	✅	✅	✅
Entity Type	❌	❌	❌	✅	✅
Entity Description	❌	❌	❌	✅	✅
Relation Name	❌	❌	✅	❌	✅
Relation keyword	❌	❌	❌	❌	✅
Relation Description	❌	❌	❌	✅	✅
Edge Weight	❌	❌	✅	✅	✅

Operators in the Retrieve Stage

The retrieval stage lies the key role ‼️ in the entire GraphRAG process. ✨ The goal is to identify query-relevant content that supports the generation phase, enabling the LLM to provide more accurate responses.

💡💡💡 After thoroughly reviewing all implementations, we've distilled them into a set of 16 operators 🧩🧩. Each method then constructs its retrieval module by combining one or more of these operators 🧩.

Five Types of Operators

We classify the operators into five categories, each offering a different way to retrieve and structure relevant information from graph-based data.

⭕️ Entity Operators

Retrieve entities (e.g., people, places, organizations) that are most relevant to the given query.

Name	Description	Example Methods
VDB	Select top-k nodes from the vector database	G-retriever, RAPTOR, KGP
RelNode	Extract nodes from given relationships	LightRAG
PPR	Run PPR on the graph, return top-k nodes with PPR scores	FastGraphRAG
Agent	Utilizes LLM to find the useful entities	ToG
Onehop	Selects the one-hop neighbor entities of the given entities	LightRAG
Link	Return top-1 similar entity for each given entity	HippoRAG
TF-IDF	Rank entities based on the TF-IFG matrix	KGP

➡️ Relationship Operators

Extracting useful relationships for the given query.

Name	Description	Example Methods
VDB	Retrieve relationships by vector-database	LightRAG、G-retriever
Onehop	Selects relationships linked by one-hop neighbors of the given selected entities	Local Search for MS GraphRAG
Aggregator	Compute relationship scores from entity PPR matrix, return top-k	FastGraphRAG
Agent	Utilizes LLM to find the useful entities	ToG

📄 Chunk Operators

Retrieve the most relevant text segments (chunks) related to the query.

Name	Description	Example Methods
Aggregator	Use the relationship scores and the relationship-chunk interactions to select the top-k chunks	HippoRAG
FromRel	Return chunks containing given relationships	LightRAG
Occurrence	Rank top-k chunks based on occurrence of both entities in relationships	Local Search for MS GraphRAG

📈 Subgraph Operators

Extract a relevant subgraph for the given query

Name	Description	Example Methods
KhopPath	Find k-hop paths with start and endpoints in the given entity set	DALK
Steiner	Compute Steiner tree based on given entities and relationships	G-retriever
AgentPath	Identify the most relevant 𝑘-hop paths to a given question, by using LLM to filter out the irrelevant paths	TOG

🔗 Community Operators

Identify high-level information, which is only used for MS GraphRAG.

Name	Description	Example Methods
Entity	Detects communities containing specified entities	Local Search for MS GraphRAG
Layer	Returns all communities below a required layer	Global Search for MS GraphRAG

You can freely 🪽 combine those operators 🧩 to create more and more GraphRAG methods.

🌰 Examples

Below, we present some examples illustrating how existing algorithms leverage these operators.

Name	Operators
HippoRAG	Chunk (Aggregator)
LightRAG	Chunk (FromRel) + Entity (RelNode) + Relationship (VDB)
FastGraphRAG	Chunk (Aggregator) + Entity (PPR) + Relationship (Aggregator)

🏹 Our future plans

Detailed readme
Support RoG, PathRAG, etc.
Provide a docker image for easy deployment.
Support more LLMs, such as AZURE.

🧭 Cite Our Paper

If you find this work useful, please consider citing our papers:

In-depth Analysis of Graph-based RAG in a Unified Framework

@article{zhou2025depth,
  title={In-depth Analysis of Graph-based RAG in a Unified Framework},
  author={Zhou, Yingli and Su, Yaodong and Sun, Youran and Wang, Shu and Wang, Taotao and He, Runyuan and Zhang, Yongwei and Liang, Sicong and Liu, Xilin and Ma, Yuchi and others},
  journal={arXiv preprint arXiv:2503.04338},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
Config		Config
Core		Core
Data		Data
Doc		Doc
Option		Option
.gitignore		.gitignore
README.md		README.md
VLDB2025_GraphRAG.pdf		VLDB2025_GraphRAG.pdf
experiment.yml		experiment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👾 DIGIMON: Deep Analysis of Graph-Based Retrieval-Augmented Generation (RAG) Systems

Quick Start 🚀

From Source

Run a Method

Example: Running RAPTOR

Available Methods:

Dependencies

Supported LLM Backends

Example Configuration (`config.yaml`):

For `LlamaFactory` or `Ollama`, ensure the model is correctly installed and running in your local environment.

Representative Methods

Graph Types

Operators in the Retrieve Stage

Five Types of Operators

⭕️ Entity Operators

➡️ Relationship Operators

📄 Chunk Operators

📈 Subgraph Operators

🔗 Community Operators

🌰 Examples

🏹 Our future plans

🧭 Cite Our Paper

In-depth Analysis of Graph-based RAG in a Unified Framework

About

Releases

Packages

Contributors 5

Languages

JayLZhou/GraphRAG

Folders and files

Latest commit

History

Repository files navigation

👾 DIGIMON: Deep Analysis of Graph-Based Retrieval-Augmented Generation (RAG) Systems

Quick Start 🚀

From Source

Run a Method

Example: Running RAPTOR

Available Methods:

Dependencies

Supported LLM Backends

Example Configuration (config.yaml):

For LlamaFactory or Ollama, ensure the model is correctly installed and running in your local environment.

Representative Methods

Graph Types

Operators in the Retrieve Stage

Five Types of Operators

⭕️ Entity Operators

➡️ Relationship Operators

📄 Chunk Operators

📈 Subgraph Operators

🔗 Community Operators

🌰 Examples

🏹 Our future plans

🧭 Cite Our Paper

In-depth Analysis of Graph-based RAG in a Unified Framework

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Example Configuration (`config.yaml`):

For `LlamaFactory` or `Ollama`, ensure the model is correctly installed and running in your local environment.

Packages