[Proposal] Add native support for chunked (N-step) rewards and statistics wrappers in Gymnasium

### Proposal

Introduce a first-class mechanism for handling chunked rewards — where each env.step() may return a vector of N-step rewards instead of a scalar — through new wrappers and minimal API extensions.

This proposal suggests:

1. Adding a new family of wrappers (ChunkedRewardWrapper, VectorChunkedRewardWrapper) that can aggregate or emit N-step reward vectors efficiently.

2. Providing a chunk-aware statistics recorder (RecordChunkStatistics) that extends the existing RecordEpisodeStatistics to handle (num_envs, N)-shaped rewards directly.

3. Supporting this feature without breaking backward compatibility for existing scalar-reward environments.

### Motivation

Modern reinforcement learning research increasingly uses action and reward chunking to improve temporal consistency and efficiency. Algorithms such as Flow-Matching Policy (FQL, QC-FQL), Diffusion Policy, and Multi-step SAC depend on chunked actions and corresponding N-step rewards to compute stable gradients and reward variance.

However, Gymnasium currently assumes that rewards are scalars ((num_envs,) arrays in VectorEnv.step).
As a result, wrappers like RecordEpisodeStatistics fail when rewards are vectorized, as shown below:

![Image](https://github.com/user-attachments/assets/a9aad991-c0ad-49e8-811f-c5eb0f592975)

(This error occurred when testing a 5-step chunked reward setup.)

Because of this, using chunked rewards requires awkward workarounds:

- Averaging rewards into a scalar, which loses intra-chunk variance information, making fine-grained reward visualization impossible.

- Storing chunked rewards inside info, which increases memory usage and Python overhead

To align Gymnasium with current RL research trends, it would be highly beneficial to support chunked reward vectors natively in both the VectorEnv and statistics wrappers.

### Pitch

Remain fully backward compatible (scalar reward by default).

Allow optional vector-shaped rewards (num_envs, N) for N-step chunking.

Maintain memory efficiency — no redundant copies or per-step Python allocations.

Provide direct integration with Gymnasium’s logging/statistics system.


```

class ChunkedRewardWrapper(gym.Wrapper):
    """
    Converts scalar rewards into N-step rolling reward chunks.
    """
    def __init__(self, env, n: int, mode: str = "sliding"):
        super().__init__(env)
        self._buf = np.zeros(n, dtype=np.float32)

    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)
        self._buf = np.roll(self._buf, -1)
        self._buf[-1] = reward
        return obs, self._buf.copy(), terminated, truncated, info

```

### Alternatives

_No response_

### Additional context

Tested on Gymnasium 0.29.1

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/Gymnasium/issues) in the repo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Proposal] Add native support for chunked (N-step) rewards and statistics wrappers in Gymnasium #1463

Proposal

Motivation

Pitch

Alternatives

Additional context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Proposal] Add native support for chunked (N-step) rewards and statistics wrappers in Gymnasium #1463

Description

Proposal

Motivation

Pitch

Alternatives

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions