RaPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery

This is the official implementation of our ACL 2025 Findings paper "RaPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery".

Overview

RaPID is an efficient framework for generating knowledge-intensive and comprehensive long texts, such as wiki-style articles. RaPID consists of three main modules:

Retrieval-augmented Preliminary Outline Generation
- Reduces hallucinations by grounding the generation in retrieved facts
- Ensures factual accuracy in the generated outline
Attribute-constrained Search
- Enables efficient information discovery
- Optimizes the retrieval process for relevant information
Plan-guided Article Generation
- Enhances thematic coherence throughout the article
- Maintains structural consistency in long-form content

Installation

conda create -n rapid python=3.10
conda activate rapid
pip install -r requirements.txt

Configuration

RaPID requires API keys for both LLM and search engine services. These should be configured in a secrets.toml file:

Create a secrets.toml file in the project root directory:

touch secrets.toml

Add your keys to secrets.toml:

# LLM API Configuration
OPENAI_API_KEY = "your-llm-api-key"

# Search Engine API Configuration
GOOGLE_API_KEY = "your-search-api-key"
GOOGLE_CX = "your-search-engine-id"

Make sure to add secrets.toml to your .gitignore file to prevent accidentally committing sensitive information:

echo "secrets.toml" >> .gitignore

Data

The data files can be downloaded from our Google Drive. After downloading, please follow these steps to set up the data directory:

Create the data directory structure:

mkdir -p wiki_dump/encode wiki_dump/original

Download and extract the files from Google Drive to the appropriate directories. The final directory structure should look like this:

wiki_dump/
- encode/
  - merged_encoded_vectors.pkl
- original/
  - combined.jsonl
- titles.csv

Usage

Console example:

python example.py --retriever google \
        --output-dir ./results \
        --max-thread-num 3 \
        --do-clarify \
        --do-research \
        --do-generate-outline \
        --do-generate-article \
        --do-topo-generation \
        --do-polish-article \
        --interface console

Batch example:

python example.py --retriever google \
        --output-dir ./results \
        --max-thread-num 3 \
        --do-clarify \
        --do-research \
        --do-generate-outline \
        --do-generate-article \
        --do-topo-generation \
        --do-polish-article \
        --interface file \
        --input-dir ./FreshWiki/final.csv

Citation

If you use this code in your research, please cite our paper:

@inproceedings{gu-etal-2025-rapid,
    title = "{RAPID}: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery",
    author = "Gu, Hongchao  and Li, Dexun  and Dong, Kuicai  and Zhang, Hao  and Lv, Hang  and Wang, Hao  and Lian, Defu  and Liu, Yong  and Chen, Enhong",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.859/",
    doi = "10.18653/v1/2025.findings-acl.859",
    pages = "16742--16763",
    ISBN = "979-8-89176-256-5",
}

Acknowledgments

This codebase is primarily based on the original STORM implementation. We would like to express our gratitude to the STORM authors for their valuable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
FreshWiki-2024		FreshWiki-2024
src		src
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt
secrets.toml		secrets.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RaPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery

Overview

Installation

Configuration

Data

Usage

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

USTC-StarTeam/RaPID

Folders and files

Latest commit

History

Repository files navigation

RaPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery

Overview

Installation

Configuration

Data

Usage

Citation

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages