Skip to content

Releases: google/tunix

Tunix v0.1.3 — JAX 0.8 and new Qwen / Llama3 model support

20 Oct 17:43

Choose a tag to compare

A maintenance and feature release focused on TPU readiness, test hardening, and model additions. Highlights include a JAX upgrade, SFT/CI improvements, new Qwen and Llama3 model variants, and multiple bugfixes across training and distillation tooling.

Highlights

  • Bumped JAX to 0.8.0 for improved compatibility and performance. Jax 0.7.2 has performance degradation on compilation and we are passing over this version.
  • Add vLLM TPU to the dev mode.
  • Qwen2.5 (including 1.5B) and Llama3 (70B & 405B) support added.

What's Changed

Full Changelog: v0.1.2...v0.1.3

Tunix v0.1.2: Expanded Model Support and Enhanced Flexibility

10 Oct 18:14

Choose a tag to compare

This release of Tunix introduces support for new models, enhances core functionalities for more flexible and efficient workflows, and includes several important fixes.

Highlights

  • Expanded Model Support: We've added a configuration for qwen-8b and ported the Llama3 example to the Tunix CLI. Additionally, GRPO disaggregated llama3.1-70b is now supported through MaxText, including checkpoint saving.
  • Enhanced Flexibility: Users can now specify a different data type for the rollout model and take advantage of more flexible PyTree support in the checkpoint manager. This release also introduces flexible collect modes and tokenization support, along with support for multiple EOS tokens in the vanilla sampler.

Other Changes

  • Downgraded Jax version to 0.7.1 in prod mode due to performance regression, dev mode still supports Jax v0.7.2
  • Fixes to the front page pip install command and GRPO examples.
  • Improvements to the checkpoint manager and resharding library.
  • Added a backward compatibility test for Orbax checkpoint restoration.
  • Various code simplifications, refactoring, and documentation updates.

Full Changelog: v0.1.1...v0.1.2

What's Changed

Full Changelog: v0.1.1...v0.1.2

Tunix v0.1.1 — Improved Stability, New Features, and TPU Optimizations

08 Oct 01:58

Choose a tag to compare

This release focuses on improving performance and stability across TPU and Kaggle environments, introducing new utilities for agentic RL workflows, and adding broader model and configuration support. It also includes several important bug fixes and developer experience improvements.

Run Tunix on Kaggle TPU

We’re excited to announce that Tunix can now be launched directly in Kaggle notebooks with TPU acceleration — making it easier than ever to experiment, prototype, and run reinforcement learning workflows without complex setup.

Key highlights

First-class TPU support on Kaggle – run GRPO and other RL pipelines end-to-end in a Kaggle notebook.

Pre-configured runtime – no manual dependency juggling needed; version compatibility and performance tuning are handled automatically.

Launch the notebook here:
Knowledge Distillation Demo
QLoRA Demo
DPO Demo
GRPO Demo

New Features & Improvements

  • Model & Training Options
  • Added support for Gemma-3-270M model configuration.
  • Enabled setting default parameter dtype for Gemma-3 models.
  • Added remat options to models to improve memory efficiency.
  • Created a new list container type to support both Flax ≤0.11.2 and ≥0.12.0 versions.
  • Pathways & TPU Performance
  • Introduced experimental pre-sharding (experimental_reshard) for Pathways on Cloud TPU.
  • Improved weight synchronization logic to handle KV head duplication.
  • Disabled certain profiler options by default to improve stability on Pathways backend.
  • Configuration & CLI Improvements
  • Enabled generic creation of optax.optimizer and optax.learning_rate_schedule directly from CLI.
  • Relaxed JAX version constraints to ensure compatibility with Kaggle images.
  • Added minimum resource requirements for launch scripts in the README.
  • Documentation
  • Added ReadTheDocs link in README.
  • Expanded external notebooks with step-by-step guidance for long-running tasks.

Bug Fixes

  • Fixed a bug in reward function logic causing incorrect training signals.
  • Fixed a checkpoint handling issue where Colab failed to locate the final checkpoint and now cleans up intermediate directories.
  • Fixed Kaggle image performance issues.
  • Fixed type errors in agents/ modules.
  • Optimized masked index lookups using jnp.where for better runtime efficiency.
  • Resharded prompt and completion tokens to the REFERENCE mesh when rollout and reference models are distributed.

Dependency & Version Updates

  • JAX pinned to 0.7.1 and libtpu downgraded to resolve Cloud TPU performance regressions.
  • Relaxed JAX version requirement for Kaggle compatibility.

Full Changelog:

New Contributors

Full Changelog: v0.1.0...v0.1.1

Tunix v0.1.0 — First Public Release of Google’s Reinforcement Learning Library for LLM Post-Training

30 Sep 15:42

Choose a tag to compare

We’re thrilled to announce Tunix v0.1.0, the first public release of Google’s lightweight, JAX-native library for post-training large language models (LLMs) using both reinforcement learning (RL) and supervised fine-tuning (SFT). Tunix is built for researchers and production teams who want maximum control and scalability when aligning and improving foundation models — from data loading to distributed rollout and training on TPUs.

Highlights of v0.1.0

SFT (Supervised Fine-Tuning): Seamlessly train your LLMs with labeled datasets to bootstrap alignment before RL or as a standalone approach.

High-efficiency Reinforced Learning (RL) policies such as GRPO, GSPO, PPO, DPO, etc. designed for instruction-tuning and reward-based LLM alignment.

End-to-End RL Pipeline: From reward function definition to rollout and policy optimization, everything is fully integrated and composable.

Multi-Model Support: Works out of the box with leading open-weight models, including Gemma 2/3, LLaMA 3, and Qwen 2/3 — and can be extended to other Hugging Face models with minimal effort.

Seamless TPU / CPU Execution: Tunix is built on top of JAX and Flax with first-class support for multi-device and multi-host environments.

Dataset Flexibility: Use tensorflow datasets, Kaggle datasets, or custom Grain datasets with minimal changes.

Modular Design: Clean abstractions for samplers, reward functions, trainers, and optimizers — making it easy to extend or plug into your own workflows.

Get Started

Install Tunix from PyPI:

pip install google-tunix[prod]

We recommend starting with the GRPO demo notebook
to see how reinforcement learning can be applied to real LLM training.

Tunix 0.1.0.dev1 – Development Preview

30 Sep 07:03

Choose a tag to compare

This is the first development release of Tunix, Google’s reinforcement learning library for language model post-training.

Note: This is a pre-release (.dev1) version meant for testing and feedback.

APIs and behavior may change before the official 0.1.0 stable release.

Use this build to validate early integrations, experiment with new features, and provide feedback.

Install this dev release:

pip install --pre google-tunix[prod]==0.1.0.dev1

Tunix 0.1.0.dev0 – Development Preview

30 Sep 04:45

Choose a tag to compare

This is the first development release of Tunix, Google’s reinforcement learning library for language model post-training.

Note: This is a pre-release (.dev0) version meant for testing and feedback.

APIs and behavior may change before the official 0.1.0 stable release.

Use this build to validate early integrations, experiment with new features, and provide feedback.

Install this dev release:

pip install --pre google-tunix==0.1.0.dev0