Dream-Coder 7B

Introduction

Dream-Coder 7B is a diffusion LLM for code trained exclusively on open-source data across its development stages—adaptation, supervised fine-tuning, and reinforcement learning. It achieves an impressive 21.4% pass@1 on LiveCodeBench (2410-2505), outperforming other open-source diffusion LLMs by a wide margin.

News

Sep 25, 2025: Released data processing, training, and evaluation scripts for the instruct model. See instruct.
Sep 21, 2025: Released data details and evaluation scripts for the base model. See base.
Sep 1, 2025: Our technical report was out.
July 23, 2025: Try our online demo via HF space!
July 15, 2025: Released Dream-Coder checkpoints, along with our blog post and Notion page.

Features

Flexible Code Generation

We observe Dream-Coder 7B exhibits emergent any-order generation that adaptively determines its decoding style based on the coding task. For example, Dream-Coder 7B Instruct displays patterns such as:

Sketch-First Generation	Left-to-Right Generation
	Iterative Back-and-Forth Generation

These demos were collected using consistent sampling parameters: temperature=0.1, diffusion_steps=512, max_new_tokens=512, alg="entropy", top_p=1.0, alg_temp=0.0, eos_penalty=3.0.

Variable-Length Code Infilling

We also introduce an infilling variant, DreamOn-7B, that naturally adjusts the length of masked spans during generation for variable-length code infilling. For more details, please refer to our accompanying blog post.

Quickstart

To get start with, please install transformers==4.46.2 and torch==2.5.1. Here is an example to use Dream-Coder 7B:

import torch
from transformers import AutoModel, AutoTokenizer

model_path = "Dream-org/Dream-Coder-v0-Instruct-7B"
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = model.to("cuda").eval()

messages = [
    {"role": "user", "content": "Write a quick sort algorithm."}
]
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True, add_generation_prompt=True
)
input_ids = inputs.input_ids.to(device="cuda")
attention_mask = inputs.attention_mask.to(device="cuda")

output = model.diffusion_generate(
    input_ids,
    attention_mask=attention_mask,
    max_new_tokens=768,
    output_history=True,
    return_dict_in_generate=True,
    steps=768,
    temperature=0.1,
    top_p=0.95,
    alg="entropy",
    alg_temp=0.,
)
generations = [
    tokenizer.decode(g[len(p) :].tolist())
    for p, g in zip(input_ids, output.sequences)
]

print(generations[0].split(tokenizer.eos_token)[0])

Acknowledgement

We gratefully acknowledge the following open-source projects, which have been instrumental to this work:

verl: Reinforcement learning training framework
SandboxFusion: Secure code execution environment
Fast-dLLM: Inference acceleration for dLLMs

Citation

@article{xie2025dream,
  title={Dream-coder 7b: An open diffusion language model for code},
  author={Xie, Zhihui and Ye, Jiacheng and Zheng, Lin and Gao, Jiahui and Dong, Jingwei and Wu, Zirui and Zhao, Xueliang and Gong, Shansan and Jiang, Xin and Li, Zhenguo and others},
  journal={arXiv preprint arXiv:2509.01142},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
base		base
instruct		instruct
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dream-Coder 7B

Introduction

News

Features

Flexible Code Generation

Variable-Length Code Infilling

Quickstart

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

DreamLM/Dream-Coder

Folders and files

Latest commit

History

Repository files navigation

Dream-Coder 7B

Introduction

News

Features

Flexible Code Generation

Variable-Length Code Infilling

Quickstart

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages