Skip to content

[Bug Report] NormalizeReward count error during autoreset in environments vectors #1451

@mballuet

Description

@mballuet

Describe the bug

Hi,

I noticed a difference in the behavior of the NormalizeReward wrapper depending on whether it’s applied to an Env or a SyncVectorEnv.

When used on SyncVectorEnv, the autoreset reward appears to be included in the normalization calculation. I don’t think this should happen. This behavior seems to have started in version 1.0.0, with the introduction of the new autoreset mechanism for vectorized environments.

In contrast, when NormalizeReward is applied directly to an Env, the reset reward is not included in the normalization calculation.

This leads to two different results with the code below, even though I would expect them to be the same. Am I right?

Code example

import gymnasium as gym

# SyncVectorEnv -> NormalizeReward -> Env
def make_env(env_id: str, seed: int):
    def thunk():
        env = gym.make(env_id)
        env = gym.wrappers.NormalizeReward(env, gamma=0.99, epsilon=1e-8)
        env.observation_space.seed(seed)
        env.action_space.seed(seed)
        return env

    return thunk

envs = gym.vector.SyncVectorEnv(
    [make_env("Hopper-v5", 123)]
)

_ = envs.reset(seed=123)

for _ in range(50):
    observation, reward, terminated, truncated, info = envs.step(envs.action_space.sample())

print("SyncVectorEnv -> NormalizeReward -> Env")
print("Return RMS count:", envs.envs[0].return_rms.count)
print("Return RMS mean:", envs.envs[0].return_rms.mean)
print("Return RMS var:", envs.envs[0].return_rms.var)

envs.close()

# NormalizeReward -> SyncVectorEnv -> Env
def make_env(env_id: str, seed: int):
    def thunk():
        env = gym.make(env_id)
        env.observation_space.seed(seed)
        env.action_space.seed(seed)
        return env

    return thunk

envs = gym.vector.SyncVectorEnv(
    [make_env("Hopper-v5", 123)]
)
envs = gym.wrappers.vector.NormalizeReward(envs, gamma=0.99, epsilon=1e-8)

_ = envs.reset(seed=123)

for _ in range(50):
    observation, reward, terminated, truncated, info = envs.step(envs.action_space.sample())

print("NormalizeReward -> SyncVectorEnv -> Env")
print("Return RMS count:", envs.return_rms.count)
print("Return RMS mean:", envs.return_rms.mean)
print("Return RMS var:", envs.return_rms.var)

envs.close()

Output:

SyncVectorEnv -> NormalizeReward -> Env
Return RMS count: 48.0001
Return RMS mean: 6.056133384415303
Return RMS var: 14.202972971858856
NormalizeReward -> SyncVectorEnv -> Env
Return RMS count: 50.0001
Return RMS mean: 5.781981257241382
Return RMS var: 15.481151179083554

System info

Gymnasium v1.2.0 installed with pip (Python 3.12.3) on Ubuntu 24.04 WSL2 (Window 11).

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions