REACT: Reasoning for Clinical Notes via Large Language Models

This repository is the official implementation of REACT, an extension of DiReCT. It includes all DiReCT functionality (code for running the baseline method and automatic evaluation, plus a dataset that assesses how well large language models align with human doctors in clinical reasoning for clinical notes) and additionally incorporates legal-domain data and automated reasoning-graph construction.

Dataset

The medical dataset is available on PhysioNet: https://doi.org/10.13026/yf96-kc87.
The legal dataset is available on Google Drive: https://drive.google.com/drive/folders/12KPQzw0ztk-XoSIJbDb9VFNUL6F1K2a9?usp=sharing

Data annotation and structure: utils/data_annotation
Data listing and loading: utils/data_loading_analysisi

Baseline Implementation

We provide examples for Llama 3 and GPT (Azure).

Llama 3 (Meta)

Use the official Llama 3 repo for environment setup and model download. The output is a JSON dict {o: [z, r, d], ...} where r is the note span for o. A folder predict_... will be generated.

Example command

torchrun --nproc_per_node 1 llama3_reasons.py \
  --ckpt_dir Meta-Llama-3.1-8B-Instruct/ \
  --tokenizer_path Meta-Llama-3.1-8B-Instruct/tokenizer.model \
  --root samples \
  --use_p False

GPT (Azure)

Fill in your Azure GPT API credentials. A folder predict_Your Model will be generated.

from gpts_reasons import USE_GPT_API
USE_GPT_API(
    root="samples",
    use_p=False,
    api_key="Your key",
    azure_endpoint="Your endpoint",
    api_version="Your API version",
    model="Your Model"
)

Automatic Evaluation

We use Llama-3-8B for evaluation. Prompts are in utils/data_extraction.py:

discriminate_similarity_observation()
discriminate_similarity_reason()

Run completeness & faithfulness evaluation

torchrun --nproc_per_node 1 evaluation.py \
  --ckpt_dir Meta-Llama-3-8B-Instruct/ \
  --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
  --root samples \
  --pred_name predict_Meta-Llama-3-8B-Instruct

Aggregate metrics

from statistics import process
process(root="samples", pred_name="predict_Meta-Llama-3-8B-Instruct_eval")

Build Reasoning Graphs (doc2kg)

These scripts generate reasoning graphs from PDFs and flowchart templates (covering both clinical and legal use cases).

Inputs

Clinical:
- docs/Diagnosis_flowchart/ — one JSON per disease (graph skeleton, contains a "knowledge" tree).
- docs/pdf/ — PDFs matched by disease name pattern *{disease}*.pdf.
- docs/temple/ — prompt templates: prompt_suspected.txt, temple_0.txt, temple.txt, temple_2.txt.
Legal (criminal):
- docs/Diagnosis_flowchart_law/ — one JSON per crime (judgment/grounds skeleton).
- docs/laws/law.txt — statute lines used in prompts.

How to Run

1) Local LLM

python doc2kg/generate_kg.py

2) OpenAI GPT-4o (with token cost summary)

Set your API key by editing the script (or export an env var) and run:

python doc2kg/generate_kg_gpt4o.py

3) Legal (criminal judgment/grounds)

python doc2kg/generate_kg_law.py

Outputs & Where to View

Local LLM:
- Final graphs: results/kgs/{disease}.json
- Intermediates: results/temp/…
- Logs: results/logs/log_*.txt
GPT-4o:
- Final graphs: results_4o/kgs/{disease}.json
- Intermediates: results_4o/temp/…
- Logs: results_4o/logs/log_*.txt
- Token cost summary: results_4o/disease_costs.txt
Legal:
- Final graphs (judgment/grounds): results/kgs_law/{crime}.json

What these graphs are for: structured reasoning graphs (clinical & legal) automatically filled from documents and templates to support downstream analysis or visualization.

Publication

If you use this work, please cite:

@inproceedings{wangdirect,
 author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming and Chen, Guoxin and Chen, Junhao and Jiang, Zhouqiang and Zhang, Jiahao and Nakashima, Yuta and Nagahara, Hajime},
 booktitle = {Advances in Neural Information Processing Systems},
 pages = {74999--75011},
 title = {DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
 volume = {37},
 year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
diagnostic_kg		diagnostic_kg
doc2kg		doc2kg
imgs		imgs
llama		llama
samples		samples
utils		utils
README.md		README.md
evaluation.py		evaluation.py
gpts_reasons.py		gpts_reasons.py
llama3_reasons.py		llama3_reasons.py
statistics.py		statistics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REACT: Reasoning for Clinical Notes via Large Language Models

Dataset

Baseline Implementation

Llama 3 (Meta)

GPT (Azure)

Automatic Evaluation

Build Reasoning Graphs (doc2kg)

Inputs

How to Run

1) Local LLM

2) OpenAI GPT-4o (with token cost summary)

3) Legal (criminal judgment/grounds)

Outputs & Where to View

Publication

About

Uh oh!

Releases

Packages

Languages

3244we/REACT

Folders and files

Latest commit

History

Repository files navigation

REACT: Reasoning for Clinical Notes via Large Language Models

Dataset

Baseline Implementation

Llama 3 (Meta)

GPT (Azure)

Automatic Evaluation

Build Reasoning Graphs (doc2kg)

Inputs

How to Run

1) Local LLM

2) OpenAI GPT-4o (with token cost summary)

3) Legal (criminal judgment/grounds)

Outputs & Where to View

Publication

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages