Skip to content
/ REACT Public
forked from wbw520/DiReCT

REACT extends our NeurIPS 2024 D&B track work DiReCT on diagnostic reasoning, moving from diagnostic reasoning in clinical notes to a broader framework for multi-domain reasoning, and automatic construction of reasoning graphs.

Notifications You must be signed in to change notification settings

3244we/REACT

 
 

Repository files navigation

REACT: Reasoning for Clinical Notes via Large Language Models

This repository is the official implementation of REACT, an extension of DiReCT. It includes all DiReCT functionality (code for running the baseline method and automatic evaluation, plus a dataset that assesses how well large language models align with human doctors in clinical reasoning for clinical notes) and additionally incorporates legal-domain data and automated reasoning-graph construction.

Reasoning Procedure


Dataset

The medical dataset is available on PhysioNet: https://doi.org/10.13026/yf96-kc87.
The legal dataset is available on Google Drive: https://drive.google.com/drive/folders/12KPQzw0ztk-XoSIJbDb9VFNUL6F1K2a9?usp=sharing


Baseline Implementation

We provide examples for Llama 3 and GPT (Azure).

Llama 3 (Meta)

Use the official Llama 3 repo for environment setup and model download. The output is a JSON dict {o: [z, r, d], ...} where r is the note span for o. A folder predict_... will be generated.

Example command

torchrun --nproc_per_node 1 llama3_reasons.py \
  --ckpt_dir Meta-Llama-3.1-8B-Instruct/ \
  --tokenizer_path Meta-Llama-3.1-8B-Instruct/tokenizer.model \
  --root samples \
  --use_p False

GPT (Azure)

Fill in your Azure GPT API credentials. A folder predict_Your Model will be generated.

from gpts_reasons import USE_GPT_API
USE_GPT_API(
    root="samples",
    use_p=False,
    api_key="Your key",
    azure_endpoint="Your endpoint",
    api_version="Your API version",
    model="Your Model"
)

Automatic Evaluation

We use Llama-3-8B for evaluation. Prompts are in utils/data_extraction.py:

  • discriminate_similarity_observation()
  • discriminate_similarity_reason()

Run completeness & faithfulness evaluation

torchrun --nproc_per_node 1 evaluation.py \
  --ckpt_dir Meta-Llama-3-8B-Instruct/ \
  --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
  --root samples \
  --pred_name predict_Meta-Llama-3-8B-Instruct

Aggregate metrics

from statistics import process
process(root="samples", pred_name="predict_Meta-Llama-3-8B-Instruct_eval")

Build Reasoning Graphs (doc2kg)

These scripts generate reasoning graphs from PDFs and flowchart templates (covering both clinical and legal use cases).

Inputs

  • Clinical:

    • docs/Diagnosis_flowchart/ — one JSON per disease (graph skeleton, contains a "knowledge" tree).
    • docs/pdf/ — PDFs matched by disease name pattern *{disease}*.pdf.
    • docs/temple/ — prompt templates: prompt_suspected.txt, temple_0.txt, temple.txt, temple_2.txt.
  • Legal (criminal):

    • docs/Diagnosis_flowchart_law/ — one JSON per crime (judgment/grounds skeleton).
    • docs/laws/law.txt — statute lines used in prompts.

How to Run

1) Local LLM

python doc2kg/generate_kg.py

2) OpenAI GPT-4o (with token cost summary)

Set your API key by editing the script (or export an env var) and run:

python doc2kg/generate_kg_gpt4o.py

3) Legal (criminal judgment/grounds)

python doc2kg/generate_kg_law.py

Outputs & Where to View

  • Local LLM:

    • Final graphs: results/kgs/{disease}.json
    • Intermediates: results/temp/…
    • Logs: results/logs/log_*.txt
  • GPT-4o:

    • Final graphs: results_4o/kgs/{disease}.json
    • Intermediates: results_4o/temp/…
    • Logs: results_4o/logs/log_*.txt
    • Token cost summary: results_4o/disease_costs.txt
  • Legal:

    • Final graphs (judgment/grounds): results/kgs_law/{crime}.json

What these graphs are for: structured reasoning graphs (clinical & legal) automatically filled from documents and templates to support downstream analysis or visualization.


Publication

If you use this work, please cite:

@inproceedings{wangdirect,
 author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming and Chen, Guoxin and Chen, Junhao and Jiang, Zhouqiang and Zhang, Jiahao and Nakashima, Yuta and Nagahara, Hajime},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {74999--75011},
 title = {DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
 volume = {37},
 year = {2024}
}

About

REACT extends our NeurIPS 2024 D&B track work DiReCT on diagnostic reasoning, moving from diagnostic reasoning in clinical notes to a broader framework for multi-domain reasoning, and automatic construction of reasoning graphs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%