Skip to content

Using inference-only approaches, LLMs generate high-quality synthetic data by prompting with sensitive examples and aggregating outputs with differential privacy, greatly expanding safe data sharing for development and benchmarking

License

Notifications You must be signed in to change notification settings

sharikalog7/Differentially-Private-Synthetic-Data-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Differentially Private Synthetic Data Generation

This project demonstrates how inference-only Large Language Models (LLMs) can generate synthetic datasets from sensitive data, while ensuring Differential Privacy (DP).

🚀 Hosted on Streamlit Cloud for free and open public access.


✨ Features

  • Generate realistic synthetic tabular data
  • Apply differential privacy (Laplace noise) during generation
  • Download synthetic datasets for safe sharing
  • Runs entirely in-browser via Streamlit

🛠️ Tech Stack

  • Streamlit for UI
  • Faker for mock data
  • numpy for DP noise injection
  • Python 3.9+

▶️ Run Locally

git clone https://github.com/yourusername/dp-synth-data.git
cd dp-synth-data
pip install -r requirements.txt
streamlit run app.py

About

Using inference-only approaches, LLMs generate high-quality synthetic data by prompting with sensitive examples and aggregating outputs with differential privacy, greatly expanding safe data sharing for development and benchmarking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages