-
-
Couldn't load subscription status.
- Fork 1.2k
Description
Proposal
Introduce a first-class mechanism for handling chunked rewards — where each env.step() may return a vector of N-step rewards instead of a scalar — through new wrappers and minimal API extensions.
This proposal suggests:
-
Adding a new family of wrappers (ChunkedRewardWrapper, VectorChunkedRewardWrapper) that can aggregate or emit N-step reward vectors efficiently.
-
Providing a chunk-aware statistics recorder (RecordChunkStatistics) that extends the existing RecordEpisodeStatistics to handle (num_envs, N)-shaped rewards directly.
-
Supporting this feature without breaking backward compatibility for existing scalar-reward environments.
Motivation
Modern reinforcement learning research increasingly uses action and reward chunking to improve temporal consistency and efficiency. Algorithms such as Flow-Matching Policy (FQL, QC-FQL), Diffusion Policy, and Multi-step SAC depend on chunked actions and corresponding N-step rewards to compute stable gradients and reward variance.
However, Gymnasium currently assumes that rewards are scalars ((num_envs,) arrays in VectorEnv.step).
As a result, wrappers like RecordEpisodeStatistics fail when rewards are vectorized, as shown below:
(This error occurred when testing a 5-step chunked reward setup.)
Because of this, using chunked rewards requires awkward workarounds:
-
Averaging rewards into a scalar, which loses intra-chunk variance information, making fine-grained reward visualization impossible.
-
Storing chunked rewards inside info, which increases memory usage and Python overhead
To align Gymnasium with current RL research trends, it would be highly beneficial to support chunked reward vectors natively in both the VectorEnv and statistics wrappers.
Pitch
Remain fully backward compatible (scalar reward by default).
Allow optional vector-shaped rewards (num_envs, N) for N-step chunking.
Maintain memory efficiency — no redundant copies or per-step Python allocations.
Provide direct integration with Gymnasium’s logging/statistics system.
class ChunkedRewardWrapper(gym.Wrapper):
"""
Converts scalar rewards into N-step rolling reward chunks.
"""
def __init__(self, env, n: int, mode: str = "sliding"):
super().__init__(env)
self._buf = np.zeros(n, dtype=np.float32)
def step(self, action):
obs, reward, terminated, truncated, info = self.env.step(action)
self._buf = np.roll(self._buf, -1)
self._buf[-1] = reward
return obs, self._buf.copy(), terminated, truncated, info
Alternatives
No response
Additional context
Tested on Gymnasium 0.29.1
Checklist
- I have checked that there is no similar issue in the repo
