A comprehensive and structured list of research papers about Large-Language-Diffusion-Models (dLLMs).
- Surveys & Useful Resources
- Core Methodologies
- Reasoning & Policy Optimization
- Token Ordering
- System Efficiency & Acceleration
- Multi-modal & Physical AI
- Theory, Guidance & Applications
- Seminal Diffusion Papers
- Gemini Diffusion
- Dream-7B
- DreamOn
- What are Diffusion Language Models?
- Generative Modeling by Estimating Gradients
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Discrete Diffusion in Large Language and Multimodal Models: A Survey | 2025 | Arxiv | - |
| Diffusion-based Large Language Models Survey | 2025 | Arxiv | - |
| A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models | 2025 | Arxiv | - |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Large Language Diffusion Models (LLaDA) | 2025 | Arxiv | >7B, LLaDA-8B |
| Scaling up Masked Diffusion Models on Text | 2024 | ICLR | <7B, 1.1B Scaling |
| Simple and Effective Masked Diffusion Language Models | 2024 | NeurIPS | <7B, Masked |
| Simplified and Generalized Masked Diffusion for Discrete Data | 2024 | NeurIPS | - |
| Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution | 2024 | ICML | <7B, Discrete |
| Dream 7B: Diffusion Large Language Models | 2025 | Arxiv | >7B |
| UltraLLaDA: Scaling Context to 128K | 2025 | Arxiv | >7B, Context Scaling |
| Esoteric Language Models | 2025 | Arxiv | - |
| Next Semantic Scale Prediction via Hierarchical Diffusion Language Models | 2025 | Arxiv | - |
| DiffusionBERT: Improving Generative Masked Language Models | 2023 | ACL | <7B, Masked |
| Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | 2023 | Arxiv | >7B, Scaling |
| David helps Goliath: Inference-Time Collaboration Between Small and Large Diffusion LMs | 2023 | NAACL | >7B, Scale-collaboration |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Diffusion-LM Improves Controllable Text Generation | 2022 | NeurIPS | <7B, Embedding |
| DiffuSeq: Sequence to Sequence Text Generation | 2023 | ICLR | <7B, Embedding |
| Latent Diffusion for Language Generation | 2023 | NeurIPS | <7B, Latent |
| Likelihood-Based Diffusion Language Models | 2023 | NeurIPS | <7B, Plaid1B |
| Edit Flows: Flow Matching with Edit Operations | 2025 | Arxiv | - |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| SDAR: A Synergistic Diffusion-AutoRegression Paradigm | 2025 | Arxiv | >7B, Synergistic Training |
| From Next-Token to Next-Block: Principled Adaptation Path | 2025 | Arxiv | >7B, Adaptation Path |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | 2025 | ICLR | >7B, GPT2/LLaMA2 Adaptation |
| TESS 2: A Large-Scale Generalist Diffusion Language Model | 2025 | ACL | >7B, Adapted from Mistral |
| Large Language Models to Diffusion Finetuning | 2025 | Arxiv | >7B |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| d1: Scaling Reasoning in dLLMs via RL | 2025 | Arxiv | >7B, Reasoning scaling |
| d2: Improved Techniques for Training Reasoning dLLMs | 2025 | Arxiv | >7B |
| Diffusion of Thought: Chain-of-Thought Reasoning in dLLMs | 2024 | NeurIPS | >7B, CoT Foundation |
| Thinking Inside the Mask: In-Place Prompting in dLLMs | 2025 | Arxiv | >7B |
| Beyond Surface Reasoning: Unveiling Long CoT Capacity | 2025 | Arxiv | >7B |
| Reinforcing the Diffusion Chain of Lateral Thought | 2025 | Arxiv | >7B |
| LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning | 2025 | Arxiv | >7B |
| Reinforced Context Order Recovery for Adaptive Reasoning | 2025 | Arxiv | <7B, Planning |
| Beyond Autoregression: Discrete Diffusion for Complex Reasoning | 2025 | ICLR | <7B |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Train for the Worst, Plan for the Best: Understanding Token Ordering | 2025 | ICML | <7B, Ordering Analysis |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion LMs | 2025 | ICLR | <7B, Interpolation |
| SSD-LM: Semi-autoregressive Simplex-based Diffusion for Modular Control | 2023 | ACL | <7B, Blockwise |
| AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation | 2023 | NeurIPS | <7B, AR-like noise |
| Any-Order Flexible Length Masked Diffusion | 2025 | Arxiv | <7B, Order Flexibility |
| Review, Remask, Refine (R3): Process-Guided Block Diffusion | 2025 | ICML | >7B, Block-wise |
| Don't Let It Fade: Preserving Edits via Token Timestep Allocation | 2025 | NeurIPS | <7B, Edit preservation |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| dKV-Cache: The Cache for Diffusion Language Models | 2025 | Arxiv | >7B |
| d^2Cache: Accelerating via Dual Adaptive Caching | 2025 | Arxiv | >7B |
| Accelerating dLLM Inference via Efficient KV Caching | 2025 | Arxiv | >7B |
| Attention Is All You Need for KV Cache in dLLMs | 2025 | Arxiv | >7B |
| Attention Sinks in Diffusion Language Models | 2025 | Arxiv | >7B |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Beyond Autoregression: Fast LLMs via Self-Distillation | 2025 | ICLR | <7B, Distillation |
| CDLM: Consistency Diffusion Language Models For Faster Sampling | 2025 | Arxiv | >7B, Consistency |
| FS-DFM: Few-Step Diffusion Language Model | 2025 | Arxiv | >7B |
| Quantization Meets dLLMs: Post-training Quantization Study | 2025 | Arxiv | >7B, Quantization |
| SparseD: Sparse Attention for Diffusion Language Models | 2025 | Arxiv | >7B, Sparsity |
| LLaDA-MoE: A Sparse MoE Diffusion Language Model | 2025 | Arxiv | >7B, MoE |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Discrete Diffusion VLA: Action Decoding in VLA Policies | 2025 | Arxiv | VLA Action Decoding |
| LLaDA-VLA: Vision Language Diffusion Action Models | 2025 | Arxiv | VLA Framework |
| dVLA: Diffusion VLA with Multimodal Chain-of-Thought | 2025 | Arxiv | VLA Reasoning |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Time Is a Feature: Exploiting Temporal Dynamics in dLLMs | 2025 | Arxiv | Temporal focus |
| Theoretical Benefit and Limitation of Diffusion Language Model | 2025 | NeurIPS | Limits analysis |
| What Makes Diffusion Language Models Super Data Learners? | 2025 | Arxiv | Data efficiency |
| Why mask diffusion does not work | 2025 | Arxiv | Failure analysis |
| The Diffusion Duality | 2025 | ICML | <7B, Theoretical Duality |
| Diffusion LLMs Know the Answer Before Decoding | 2025 | Arxiv | Semantic focus |
| Generalized Interpolating Discrete Diffusion | 2025 | ICML | <7B |
| Your Absorbing Discrete Diffusion Secretly Models the Bayesian Posterior | 2025 | ArXiv | <7B |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| DINGO: Constrained Inference for Diffusion LLMs | 2025 | Arxiv | Constrained Decoding |
| DiffuCoder: Improving Masked Diffusion for Code Generation | 2025 | Arxiv | Code |
| Beyond Autoregression: Empirical Study for Code Generation | 2025 | Arxiv | Code |
| Seed Diffusion: Large-Scale dLLM with High-Speed Inference | 2025 | Arxiv | Code |
| Planning with Diffusion Models for Target-Oriented Dialogue | 2025 | ACL | Dialogue |
| The Devil behind the mask: An emergent safety vulnerability | 2025 | Arxiv | Safety |
| CtrlDiff: Boosting dLLMs with Dynamic Block Prediction | 2025 | Arxiv | Control |
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Deep Unsupervised Learning using Nonequilibrium Thermodynamics | 2015 | ICML | Formulation |
| Denoising Diffusion Probabilistic Models (DDPM) | 2020 | NeurIPS | - |
| Denoising Diffusion Implicit Models (DDIM) | 2021 | ICLR | - |
| Score-Based Generative Modeling through SDEs | 2021 | ICLR | - |
| High-Resolution Image Synthesis with Latent Diffusion | 2022 | CVPR | - |
| Scalable Diffusion Models with Transformers (DiT) | 2023 | ICCV | Scalable focus |
| Consistency Models | 2023 | ICML | - |
| Diffusion Models Beat GANs on Image Synthesis | 2021 | NeurIPS | CG |
| Classifier-Free Diffusion Guidance | 2021 | NeurIPS | CFG |
| DPM-Solver: Fast ODE Solver for Sampling | 2022 | NeurIPS | - |
| Vector Quantized Diffusion Model (VQ-Diffusion) | 2022 | CVPR | VQ |
| Analog Bits: Generating Discrete Data using Diffusion | 2023 | ICLR | Self-conditioning |
| Progressive Distillation for Fast Sampling | 2022 | ICLR | Distillation |
| Structured Denoising Diffusion in Discrete State-Spaces | 2021 | NeurIPS | Discrete |
- Maintainers: [email protected] / [email protected] / [email protected]
- Contributions via Pull Requests are welcome!