Skip to content

Code for The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation

Notifications You must be signed in to change notification settings

Raman1121/diffusion_memorization

 
 

Repository files navigation

The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation

Core Dependencies

  • PyTorch==2.5.1
  • transformers==4.48.2
  • diffusers==0.18.2
  • accelerate==0.21.0
  • datasets==2.19.0

You can create the environment using:

conda env create -f env.yaml

Memorized Prompts

The list of memorized prompts in the MIMIC-CXR dataset will be available here.

Detecting Memorization

Setting up the Dataset and DataLoaders

First, create a standard pytorch dataset (For MIMIC-CXR, see the mimic_cxr_dataset.py file.)
Add the data loading logic in the get_dataset function within the optim_utils.py file.

Using (any) Stable Diffusion Model

python detect_mem.py --model_id <pretrained_model_name_or_path> --dataset mimic

Using the RadEdit Model

NOTE: First, follow the instructions in the Setup-RadEdit.md file.

The following command will run memorization detection on the whole dataset.

python detect_mem_radedit.py --run_name memorized_prompts --dataset mimic

NOTE: Due to the large size of MIMIC-CXR, it is recommended to run the detection on smaller subsets (shards) of the dataset in parallel.

The following command will divide the dataset into 4 shards. You now need to run 4 scripts in parallel and only change the --shard parameter in the following way:

python detect_mem_radedit.py --run_name memorized_prompts --dataset mimic --num_shards 4 --shard 0
python detect_mem_radedit.py --run_name memorized_prompts --dataset mimic --num_shards 4 --shard 1
python detect_mem_radedit.py --run_name memorized_prompts --dataset mimic --num_shards 4 --shard 2 
python detect_mem_radedit.py --run_name memorized_prompts --dataset mimic --num_shards 4 --shard 3

For even faster computation, you can increase --num_shards.

About

Code for The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 85.7%
  • Python 14.3%