RAG Error Classification

Framework for RAG experimentation, and RAG Error Classification (RAGEC), accompanying the paper "Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems".

Getting Started

Create virtual environment using venv.

python3.10 -m venv ./venv310

Activate the virtual environment.

source ./venv310/bin/activate

Install dependencies

pip install -e .

Setup OpenAI key. Look at .envrc.example and rename or add an .envrc file.
Download and preprocess Dragonball and CLAPnq. Note that the shell script contains python calls. Edit the shell script if necessary.

bash ./download_data.sh

Run Dragonball.

python -m scripts.dragonball_run

Run CLAPnq

python -m scripts.clapnq_run

Artifact Managements

The config is specified in ./conf/. We use hydra to parse the config. See here for a description of the config.

Components and artifacts

As you can see from the code, it is decomposed into components. Each component will have artifacts as inputs and outputs. Artifacts are data or results that could be saved to or loaded from the disk. An artifact contains the underlying data and the path of where it should be saved or loaded.

One can "dry run" a component, i.e. to prepare the output artifact(s) of that component, without running the component itself. This allows easy continuation of the running a sequence of components.

Logging and artifacts

For each run, the log of the run is saved at ./outputs/{RUN_DATE}/{RUN_TIME}/. In the directory, you can also see the .hydra folder directory that indicates the config for that run as well. The artifact will be shared across runs. The path of the artifact will be specified as artifact_path in the main config. The default path is ./outputs/{DATA}/{RUN_NAME}/. It will be changed to include the dataset names later.

Human-annotated Data

We have released our dataset of 377 RAG errors annotated by their error stage (i.e. one of Chunking, Retreival, Reranking, and Generation), and error type (i.e. one of the 16 error types in our taxonomy). The data is located at annotation/RAGEC_annotations.csv. The queries and answers come from the English DragonBall dataset under the RAGEval framework, with the query_id field matching the value from the original dataset to facilitate joining the data back. The erroneous RAG answers were generated by our default RAG pipeline implemented in this repo. Errors were initially identified automatically with an LLM judge. Then, the authors of this work manually annotated 406 potential errors, confirming 377 of them as errors and annotating their stage and type.

Citing

@article{leung2025,
    title={Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems}, 
    author={Kin Kwan Leung and Mouloud Belbahri and Yi Sui and Alex Labach and Xueying Zhang and Stephen Rose and Jesse C. Cresswell},
    journal={arXiv:2510.13975},
    year={2025},
}

License

This data and code is licensed under the MIT License, copyright by Layer 6 AI.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
annotation		annotation
autoeval		autoeval
base		base
clients		clients
conf		conf
data		data
local_datasets		local_datasets
scripts		scripts
tokenizer		tokenizer
.envrc.example		.envrc.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseconfig.py		baseconfig.py
download_data.sh		download_data.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Error Classification

Getting Started

Artifact Managements

Components and artifacts

Logging and artifacts

Human-annotated Data

Citing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

layer6ai-labs/rag-error-classification

Folders and files

Latest commit

History

Repository files navigation

RAG Error Classification

Getting Started

Artifact Managements

Components and artifacts

Logging and artifacts

Human-annotated Data

Citing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages