[Bug Report] NormalizeReward count error during autoreset in environments vectors

### Describe the bug

Hi,

I noticed a difference in the behavior of the NormalizeReward wrapper depending on whether it’s applied to an Env or a SyncVectorEnv.

When used on SyncVectorEnv, the autoreset reward appears to be included in the normalization calculation. I don’t think this should happen. This behavior seems to have started in version 1.0.0, with the introduction of the new autoreset mechanism for vectorized environments.

In contrast, when NormalizeReward is applied directly to an Env, the reset reward is not included in the normalization calculation.

This leads to two different results with the code below, even though I would expect them to be the same. Am I right?

### Code example

```shell
import gymnasium as gym

# SyncVectorEnv -> NormalizeReward -> Env
def make_env(env_id: str, seed: int):
    def thunk():
        env = gym.make(env_id)
        env = gym.wrappers.NormalizeReward(env, gamma=0.99, epsilon=1e-8)
        env.observation_space.seed(seed)
        env.action_space.seed(seed)
        return env

    return thunk

envs = gym.vector.SyncVectorEnv(
    [make_env("Hopper-v5", 123)]
)

_ = envs.reset(seed=123)

for _ in range(50):
    observation, reward, terminated, truncated, info = envs.step(envs.action_space.sample())

print("SyncVectorEnv -> NormalizeReward -> Env")
print("Return RMS count:", envs.envs[0].return_rms.count)
print("Return RMS mean:", envs.envs[0].return_rms.mean)
print("Return RMS var:", envs.envs[0].return_rms.var)

envs.close()

# NormalizeReward -> SyncVectorEnv -> Env
def make_env(env_id: str, seed: int):
    def thunk():
        env = gym.make(env_id)
        env.observation_space.seed(seed)
        env.action_space.seed(seed)
        return env

    return thunk

envs = gym.vector.SyncVectorEnv(
    [make_env("Hopper-v5", 123)]
)
envs = gym.wrappers.vector.NormalizeReward(envs, gamma=0.99, epsilon=1e-8)

_ = envs.reset(seed=123)

for _ in range(50):
    observation, reward, terminated, truncated, info = envs.step(envs.action_space.sample())

print("NormalizeReward -> SyncVectorEnv -> Env")
print("Return RMS count:", envs.return_rms.count)
print("Return RMS mean:", envs.return_rms.mean)
print("Return RMS var:", envs.return_rms.var)

envs.close()

Output:

SyncVectorEnv -> NormalizeReward -> Env
Return RMS count: 48.0001
Return RMS mean: 6.056133384415303
Return RMS var: 14.202972971858856
NormalizeReward -> SyncVectorEnv -> Env
Return RMS count: 50.0001
Return RMS mean: 5.781981257241382
Return RMS var: 15.481151179083554
```

### System info

Gymnasium v1.2.0 installed with pip (Python 3.12.3) on Ubuntu 24.04 WSL2 (Window 11).

### Additional context

_No response_

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/Farama-Foundation/Gymnasium/issues) in the repo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug Report] NormalizeReward count error during autoreset in environments vectors #1451

Describe the bug

Code example

System info

Additional context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug Report] NormalizeReward count error during autoreset in environments vectors #1451

Description

Describe the bug

Code example

System info

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions