Skip to content

AIDASLab/Awesome-Diffusion-LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 

Repository files navigation

Awesome-Large-Language-Diffusion-Models

Awesome Maintained

A comprehensive and structured list of research papers about Large-Language-Diffusion-Models (dLLMs).


⚙️ Framework (Taxonomy)

  1. Surveys & Useful Resources
  2. Core Methodologies
  3. Reasoning & Policy Optimization
  4. Token Ordering
  5. System Efficiency & Acceleration
  6. Multi-modal & Physical AI
  7. Theory, Guidance & Applications
  8. Seminal Diffusion Papers

1. Surveys & Useful Resources

📚 Blogs & Reports

📝 Survey Papers

Paper Title Year Venue Remark
Discrete Diffusion in Large Language and Multimodal Models: A Survey 2025 Arxiv -
Diffusion-based Large Language Models Survey 2025 Arxiv -
A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models 2025 Arxiv -

2. Core Methodologies

2.1 Discrete & Masked Diffusion

Paper Title Year Venue Remark
Large Language Diffusion Models (LLaDA) 2025 Arxiv >7B, LLaDA-8B
Scaling up Masked Diffusion Models on Text 2024 ICLR <7B, 1.1B Scaling
Simple and Effective Masked Diffusion Language Models 2024 NeurIPS <7B, Masked
Simplified and Generalized Masked Diffusion for Discrete Data 2024 NeurIPS -
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution 2024 ICML <7B, Discrete
Dream 7B: Diffusion Large Language Models 2025 Arxiv >7B
UltraLLaDA: Scaling Context to 128K 2025 Arxiv >7B, Context Scaling
Esoteric Language Models 2025 Arxiv -
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models 2025 Arxiv -
DiffusionBERT: Improving Generative Masked Language Models 2023 ACL <7B, Masked
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning 2023 Arxiv >7B, Scaling
David helps Goliath: Inference-Time Collaboration Between Small and Large Diffusion LMs 2023 NAACL >7B, Scale-collaboration

2.2 Continuous & Latent Space Diffusion

Paper Title Year Venue Remark
Diffusion-LM Improves Controllable Text Generation 2022 NeurIPS <7B, Embedding
DiffuSeq: Sequence to Sequence Text Generation 2023 ICLR <7B, Embedding
Latent Diffusion for Language Generation 2023 NeurIPS <7B, Latent
Likelihood-Based Diffusion Language Models 2023 NeurIPS <7B, Plaid1B
Edit Flows: Flow Matching with Edit Operations 2025 Arxiv -

2.3 AR-to-Diffusion Adaptation

Paper Title Year Venue Remark
SDAR: A Synergistic Diffusion-AutoRegression Paradigm 2025 Arxiv >7B, Synergistic Training
From Next-Token to Next-Block: Principled Adaptation Path 2025 Arxiv >7B, Adaptation Path
Scaling Diffusion Language Models via Adaptation from Autoregressive Models 2025 ICLR >7B, GPT2/LLaMA2 Adaptation
TESS 2: A Large-Scale Generalist Diffusion Language Model 2025 ACL >7B, Adapted from Mistral
Large Language Models to Diffusion Finetuning 2025 Arxiv >7B

3. Reasoning & Policy Optimization

3.1 Reasoning & Planning

Paper Title Year Venue Remark
d1: Scaling Reasoning in dLLMs via RL 2025 Arxiv >7B, Reasoning scaling
d2: Improved Techniques for Training Reasoning dLLMs 2025 Arxiv >7B
Diffusion of Thought: Chain-of-Thought Reasoning in dLLMs 2024 NeurIPS >7B, CoT Foundation
Thinking Inside the Mask: In-Place Prompting in dLLMs 2025 Arxiv >7B
Beyond Surface Reasoning: Unveiling Long CoT Capacity 2025 Arxiv >7B
Reinforcing the Diffusion Chain of Lateral Thought 2025 Arxiv >7B
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning 2025 Arxiv >7B
Reinforced Context Order Recovery for Adaptive Reasoning 2025 Arxiv <7B, Planning
Beyond Autoregression: Discrete Diffusion for Complex Reasoning 2025 ICLR <7B

3.2 Alignment & Reinforcement Learning

Paper Title Year Venue Remark
DiFFPO: Training dLLMs to Reason Fast and Furious via RL 2025 Arxiv >7B, Direct Preference
LLaDA 1.5: Variance-Reduced Preference Optimization 2025 Arxiv >7B
MDPO: Overcoming the Training-Inference Divide 2025 Arxiv >7B
wd1: Weighted Policy Optimization for Reasoning 2025 Arxiv >7B
Principled and Tractable RL for Reasoning with dLLMs 2025 Arxiv >7B
Improving Reasoning via Group Diffusion Policy Optimization 2025 Arxiv >7B
Step-Aware Policy Optimization for Reasoning 2025 Arxiv >7B
Inpainting-Guided Policy Optimization for dLLMs 2025 Arxiv >7B
MRO: Enhancing Reasoning via Multi-Reward Optimization 2025 Arxiv >7B
Enhancing Reasoning via Distribution Matching Policy Optimization 2025 Arxiv >7B
Boundary-Guided Policy Optimization for Memory-efficient RL 2025 Arxiv >7B
Taming Masked Diffusion via Consistency Trajectory RL 2025 Arxiv >7B
SPG: Sandwiched Policy Gradient for Masked Diffusion 2025 Arxiv >7B
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning 2025 Arxiv >7B
Preference-Based Alignment of Discrete Diffusion Models 2025 Arxiv >7B
Revolutionizing RL Framework for Diffusion Large Language Models 2025 Arxiv >7B
Improving Discrete Diffusion Unmasking Policies Beyond Reference Policies 2025 Arxiv >7B
Coevolutionary Continuous Discrete Diffusion: Latent Reasoner 2025 Arxiv >7B

4. Token Ordering

Paper Title Year Venue Remark
Train for the Worst, Plan for the Best: Understanding Token Ordering 2025 ICML <7B, Ordering Analysis
Block Diffusion: Interpolating Between Autoregressive and Diffusion LMs 2025 ICLR <7B, Interpolation
SSD-LM: Semi-autoregressive Simplex-based Diffusion for Modular Control 2023 ACL <7B, Blockwise
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation 2023 NeurIPS <7B, AR-like noise
Any-Order Flexible Length Masked Diffusion 2025 Arxiv <7B, Order Flexibility
Review, Remask, Refine (R3): Process-Guided Block Diffusion 2025 ICML >7B, Block-wise
Don't Let It Fade: Preserving Edits via Token Timestep Allocation 2025 NeurIPS <7B, Edit preservation

5. System Efficiency & Acceleration

5.1 Caching & Memory Strategy

Paper Title Year Venue Remark
dKV-Cache: The Cache for Diffusion Language Models 2025 Arxiv >7B
d^2Cache: Accelerating via Dual Adaptive Caching 2025 Arxiv >7B
Accelerating dLLM Inference via Efficient KV Caching 2025 Arxiv >7B
Attention Is All You Need for KV Cache in dLLMs 2025 Arxiv >7B
Attention Sinks in Diffusion Language Models 2025 Arxiv >7B

5.2 Decoding & Sampling

Paper Title Year Venue Remark
Fast-dLLM: Training-free Acceleration via Parallel Decoding 2025 Arxiv >7B, Parallel Decoding
Fast-dLLM v2: Efficient Block-Diffusion LLM 2025 Arxiv >7B, Block Decoding
Spiffy: Multiplying Acceleration via Lossless Speculative Decoding 2025 Arxiv >7B
DiffuSpec: Unlocking dLLMs for Speculative Decoding 2025 Arxiv >7B
Saber: Efficient Sampling with Backtracking Enhanced Remasking 2025 Arxiv >7B
CreditDecoding: Parallel Decoding with Trace Credits 2025 Arxiv >7B
Accelerating dLLM Inference via Local Determinism Propagation 2025 Arxiv >7B
Self Speculative Decoding for Diffusion Large Language Models 2025 Arxiv >7B
Wide-In, Narrow-Out: Revokable Decoding for Effective dLLMs 2025 Arxiv >7B
SpecDiff-2: Scaling Diffusion Drafter Alignment 2025 Arxiv >7B
Fast-Decoding via Progress-Aware Confidence Schedules 2025 Arxiv >7B
DLM-One: Diffusion Language Models for One-Step Generation 2025 Arxiv <7B
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall 2025 Arxiv >7B
Accelerating LLMs via Adaptive Parallel Decoding 2025 Arxiv >7B
Accelerating dLLMs with SlowFast Sampling 2025 Arxiv >7B
AdaBlock-dLLM: Semantic-Aware Inference via Adaptive Block Size 2025 Arxiv >7B
dParallel: Learnable Parallel Decoding for dLLMs 2025 Arxiv >7B
Learning to Parallel: Accelerating dLLMs via Learnable Parallel Decoding 2025 Arxiv >7B

5.3 Distillation, Quantization & Sparsity

Paper Title Year Venue Remark
Beyond Autoregression: Fast LLMs via Self-Distillation 2025 ICLR <7B, Distillation
CDLM: Consistency Diffusion Language Models For Faster Sampling 2025 Arxiv >7B, Consistency
FS-DFM: Few-Step Diffusion Language Model 2025 Arxiv >7B
Quantization Meets dLLMs: Post-training Quantization Study 2025 Arxiv >7B, Quantization
SparseD: Sparse Attention for Diffusion Language Models 2025 Arxiv >7B, Sparsity
LLaDA-MoE: A Sparse MoE Diffusion Language Model 2025 Arxiv >7B, MoE

6. Multi-modal & Physical AI

6.1 Multi-modal dLLMs

Paper Title Year Venue Remark
MMaDA: Multimodal Large Diffusion Language Models 2025 Arxiv Native Multimodal
MMaDA-Parallel: Thinking-Aware Editing and Generation 2025 Arxiv Parallel Multimodal
Show-o2: Improved Native Unified Multimodal Models 2025 Arxiv Unified Generation
Lumina-DiMOO: Omni Diffusion LLM for Generation 2025 Arxiv Omni-generation
DiffusionVL: Translating AR Models into Diffusion VL Models 2025 Arxiv VL Adaptation
Diffuse Everything: Multimodal Diffusion on Arbitrary Spaces 2025 ICML Arbitrary Spaces
LLaDA-V: Diffusion LLMs with Visual Instruction Tuning 2025 Arxiv Visual Tuning
Unified Multimodal Discrete Diffusion 2025 Arxiv Unified Diffusion
LaViDa: A Large Diffusion LLM for Multimodal Understanding 2025 Arxiv Understanding
Dimple: Discrete Diffusion Multimodal LLM with Parallel Decoding 2025 Arxiv Parallel Multimodal
Dual Diffusion for Unified Image Generation and Understanding 2025 Arxiv Unified Task
Muddit: Liberating Generation Beyond Text-to-Image 2025 Arxiv Multi-modal

6.2 Vision-Language-Action (VLA)

Paper Title Year Venue Remark
Discrete Diffusion VLA: Action Decoding in VLA Policies 2025 Arxiv VLA Action Decoding
LLaDA-VLA: Vision Language Diffusion Action Models 2025 Arxiv VLA Framework
dVLA: Diffusion VLA with Multimodal Chain-of-Thought 2025 Arxiv VLA Reasoning

7. Theory, Guidance & Applications

7.1 Theory & Analysis

Paper Title Year Venue Remark
Time Is a Feature: Exploiting Temporal Dynamics in dLLMs 2025 Arxiv Temporal focus
Theoretical Benefit and Limitation of Diffusion Language Model 2025 NeurIPS Limits analysis
What Makes Diffusion Language Models Super Data Learners? 2025 Arxiv Data efficiency
Why mask diffusion does not work 2025 Arxiv Failure analysis
The Diffusion Duality 2025 ICML <7B, Theoretical Duality
Diffusion LLMs Know the Answer Before Decoding 2025 Arxiv Semantic focus
Generalized Interpolating Discrete Diffusion 2025 ICML <7B
Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior 2025 ArXiv <7B

7.2 Guidance & Downstream Applications

Paper Title Year Venue Remark
DINGO: Constrained Inference for Diffusion LLMs 2025 Arxiv Constrained Decoding
DiffuCoder: Improving Masked Diffusion for Code Generation 2025 Arxiv Code
Beyond Autoregression: Empirical Study for Code Generation 2025 Arxiv Code
Seed Diffusion: Large-Scale dLLM with High-Speed Inference 2025 Arxiv Code
Planning with Diffusion Models for Target-Oriented Dialogue 2025 ACL Dialogue
The Devil behind the mask: An emergent safety vulnerability 2025 Arxiv Safety
CtrlDiff: Boosting dLLMs with Dynamic Block Prediction 2025 Arxiv Control

8. Seminal Diffusion Papers

Paper Title Year Venue Remark
Deep Unsupervised Learning using Nonequilibrium Thermodynamics 2015 ICML Formulation
Denoising Diffusion Probabilistic Models (DDPM) 2020 NeurIPS -
Denoising Diffusion Implicit Models (DDIM) 2021 ICLR -
Score-Based Generative Modeling through SDEs 2021 ICLR -
High-Resolution Image Synthesis with Latent Diffusion 2022 CVPR -
Scalable Diffusion Models with Transformers (DiT) 2023 ICCV Scalable focus
Consistency Models 2023 ICML -
Diffusion Models Beat GANs on Image Synthesis 2021 NeurIPS CG
Classifier-Free Diffusion Guidance 2021 NeurIPS CFG
DPM-Solver: Fast ODE Solver for Sampling 2022 NeurIPS -
Vector Quantized Diffusion Model (VQ-Diffusion) 2022 CVPR VQ
Analog Bits: Generating Discrete Data using Diffusion 2023 ICLR Self-conditioning
Progressive Distillation for Fast Sampling 2022 ICLR Distillation
Structured Denoising Diffusion in Discrete State-Spaces 2021 NeurIPS Discrete

🤝 Contact

About

A comprehensive list of papers about Large-Language-Diffusion-Models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •