Skip to content

[Proposal] Add native support for chunked (N-step) rewards and statistics wrappers in Gymnasium #1463

@Revivekirin

Description

@Revivekirin

Proposal

Introduce a first-class mechanism for handling chunked rewards — where each env.step() may return a vector of N-step rewards instead of a scalar — through new wrappers and minimal API extensions.

This proposal suggests:

  1. Adding a new family of wrappers (ChunkedRewardWrapper, VectorChunkedRewardWrapper) that can aggregate or emit N-step reward vectors efficiently.

  2. Providing a chunk-aware statistics recorder (RecordChunkStatistics) that extends the existing RecordEpisodeStatistics to handle (num_envs, N)-shaped rewards directly.

  3. Supporting this feature without breaking backward compatibility for existing scalar-reward environments.

Motivation

Modern reinforcement learning research increasingly uses action and reward chunking to improve temporal consistency and efficiency. Algorithms such as Flow-Matching Policy (FQL, QC-FQL), Diffusion Policy, and Multi-step SAC depend on chunked actions and corresponding N-step rewards to compute stable gradients and reward variance.

However, Gymnasium currently assumes that rewards are scalars ((num_envs,) arrays in VectorEnv.step).
As a result, wrappers like RecordEpisodeStatistics fail when rewards are vectorized, as shown below:

Image

(This error occurred when testing a 5-step chunked reward setup.)

Because of this, using chunked rewards requires awkward workarounds:

  • Averaging rewards into a scalar, which loses intra-chunk variance information, making fine-grained reward visualization impossible.

  • Storing chunked rewards inside info, which increases memory usage and Python overhead

To align Gymnasium with current RL research trends, it would be highly beneficial to support chunked reward vectors natively in both the VectorEnv and statistics wrappers.

Pitch

Remain fully backward compatible (scalar reward by default).

Allow optional vector-shaped rewards (num_envs, N) for N-step chunking.

Maintain memory efficiency — no redundant copies or per-step Python allocations.

Provide direct integration with Gymnasium’s logging/statistics system.


class ChunkedRewardWrapper(gym.Wrapper):
    """
    Converts scalar rewards into N-step rolling reward chunks.
    """
    def __init__(self, env, n: int, mode: str = "sliding"):
        super().__init__(env)
        self._buf = np.zeros(n, dtype=np.float32)

    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        self._buf = np.roll(self._buf, -1)
        self._buf[-1] = reward
        return obs, self._buf.copy(), terminated, truncated, info

Alternatives

No response

Additional context

Tested on Gymnasium 0.29.1

Checklist

  • I have checked that there is no similar issue in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions