This repository is the official implementation of REACT, an extension of DiReCT. It includes all DiReCT functionality (code for running the baseline method and automatic evaluation, plus a dataset that assesses how well large language models align with human doctors in clinical reasoning for clinical notes) and additionally incorporates legal-domain data and automated reasoning-graph construction.
The medical dataset is available on PhysioNet: https://doi.org/10.13026/yf96-kc87.
The legal dataset is available on Google Drive: https://drive.google.com/drive/folders/12KPQzw0ztk-XoSIJbDb9VFNUL6F1K2a9?usp=sharing
- Data annotation and structure:
utils/data_annotation - Data listing and loading:
utils/data_loading_analysisi
We provide examples for Llama 3 and GPT (Azure).
Use the official Llama 3 repo for environment setup and model download. The output is a JSON dict {o: [z, r, d], ...} where r is the note span for o.
A folder predict_... will be generated.
Example command
torchrun --nproc_per_node 1 llama3_reasons.py \
--ckpt_dir Meta-Llama-3.1-8B-Instruct/ \
--tokenizer_path Meta-Llama-3.1-8B-Instruct/tokenizer.model \
--root samples \
--use_p FalseFill in your Azure GPT API credentials. A folder predict_Your Model will be generated.
from gpts_reasons import USE_GPT_API
USE_GPT_API(
root="samples",
use_p=False,
api_key="Your key",
azure_endpoint="Your endpoint",
api_version="Your API version",
model="Your Model"
)We use Llama-3-8B for evaluation. Prompts are in utils/data_extraction.py:
discriminate_similarity_observation()discriminate_similarity_reason()
Run completeness & faithfulness evaluation
torchrun --nproc_per_node 1 evaluation.py \
--ckpt_dir Meta-Llama-3-8B-Instruct/ \
--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
--root samples \
--pred_name predict_Meta-Llama-3-8B-InstructAggregate metrics
from statistics import process
process(root="samples", pred_name="predict_Meta-Llama-3-8B-Instruct_eval")These scripts generate reasoning graphs from PDFs and flowchart templates (covering both clinical and legal use cases).
-
Clinical:
docs/Diagnosis_flowchart/— one JSON per disease (graph skeleton, contains a"knowledge"tree).docs/pdf/— PDFs matched by disease name pattern*{disease}*.pdf.docs/temple/— prompt templates:prompt_suspected.txt,temple_0.txt,temple.txt,temple_2.txt.
-
Legal (criminal):
docs/Diagnosis_flowchart_law/— one JSON per crime (judgment/grounds skeleton).docs/laws/law.txt— statute lines used in prompts.
python doc2kg/generate_kg.pySet your API key by editing the script (or export an env var) and run:
python doc2kg/generate_kg_gpt4o.pypython doc2kg/generate_kg_law.py-
Local LLM:
- Final graphs:
results/kgs/{disease}.json - Intermediates:
results/temp/… - Logs:
results/logs/log_*.txt
- Final graphs:
-
GPT-4o:
- Final graphs:
results_4o/kgs/{disease}.json - Intermediates:
results_4o/temp/… - Logs:
results_4o/logs/log_*.txt - Token cost summary:
results_4o/disease_costs.txt
- Final graphs:
-
Legal:
- Final graphs (judgment/grounds):
results/kgs_law/{crime}.json
- Final graphs (judgment/grounds):
What these graphs are for: structured reasoning graphs (clinical & legal) automatically filled from documents and templates to support downstream analysis or visualization.
If you use this work, please cite:
@inproceedings{wangdirect,
author = {Wang, Bowen and Chang, Jiuyang and Qian, Yiming and Chen, Guoxin and Chen, Junhao and Jiang, Zhouqiang and Zhang, Jiahao and Nakashima, Yuta and Nagahara, Hajime},
booktitle = {Advances in Neural Information Processing Systems},
pages = {74999--75011},
title = {DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models},
volume = {37},
year = {2024}
}
