Fixes device decoupling of RL Games training vs. sim device #3997

kellyguo11 · 2025-11-11T01:02:15Z

Description

Allows rlgames to decouple devices for simulation and training. This should allow running simulation on CPU and training on GPU

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2025-11-11T01:08:05Z

Greptile Overview

Greptile Summary

This PR enables device decoupling for RL Games, allowing simulation to run on one device (e.g., CPU) while training runs on another (e.g., GPU). The key changes are:

Core Implementation:

Added device transfer logic in RlGamesVecEnvWrapper._process_obs() to move observations from sim_device to rl_device
Existing step() method already handled action transfers (rl_device → sim_device) and reward/done transfers (sim_device → rl_device)
Removed the forced device coupling in train.py and play.py that previously overrode agent config to match sim device

How It Works:

The wrapper now reads rl_device from the agent config's params.config.device field
Actions generated on rl_device are transferred to sim_device before env.step()
Observations, rewards, dones, and extras are transferred from sim_device to rl_device after each step
The implementation correctly handles the case where devices are the same (no-op transfers)

Testing:

Comprehensive test suite added covering RL Games, RSL-RL, SB3, and skrl
Tests verify GPU→GPU, GPU→CPU, and CPU→GPU device combinations
All device transfers are validated to ensure data arrives on the correct device

The implementation is clean, well-documented, and follows the existing architecture. Device transfers use .to(device=...) with proper cloning where needed to avoid in-place modification issues.

Confidence Score: 5/5

This PR is safe to merge with minimal risk - implementation is straightforward and well-tested
The changes are minimal, focused, and well-architected. The core logic simply adds observation device transfers to match existing action/reward transfer patterns. The removed code that forced device coupling was the actual bug being fixed. Comprehensive tests validate all device combinations across multiple RL libraries. No breaking changes to APIs or existing functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py	5/5	Added device transfer logic in `_process_obs` to move observations from sim device to RL device when they differ
scripts/reinforcement_learning/rl_games/train.py	5/5	Removed manual device override logic that forced agent device to match sim device, enabling proper device decoupling
scripts/reinforcement_learning/rl_games/play.py	5/5	Removed manual device override logic that forced agent device to match sim device during inference

Sequence Diagram

sequenceDiagram
    participant Policy as RL Policy<br/>(rl_device)
    participant Wrapper as RlGamesVecEnvWrapper
    participant Env as Isaac Environment<br/>(sim_device)
    
    Note over Policy,Env: Training Step Flow
    
    Policy->>Wrapper: step(actions)<br/>[on rl_device]
    Wrapper->>Wrapper: actions.to(sim_device)<br/>Transfer to simulation
    Wrapper->>Env: step(actions)<br/>[on sim_device]
    Env-->>Wrapper: obs, rew, term, trunc<br/>[on sim_device]
    
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer obs to rl_device
    Wrapper->>Wrapper: rew.to(rl_device)<br/>dones.to(rl_device)<br/>extras.to(rl_device)
    
    Wrapper-->>Policy: obs, rew, dones, extras<br/>[on rl_device]
    
    Note over Policy,Env: Reset Flow
    
    Policy->>Wrapper: reset()
    Wrapper->>Env: reset()
    Env-->>Wrapper: obs_dict<br/>[on sim_device]
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer to rl_device
    Wrapper-->>Policy: obs<br/>[on rl_device]

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ooctipus

Test is a bit long to read, and feels like could be refactored to make a bit nicer. But everything else looks pretty good

ooctipus · 2025-11-11T01:40:16Z

source/isaaclab_tasks/test/test_rl_device_separation.py

It feels like this test file can be refactored a bit :D, that way the step 1 - 5 maybe more clear, easier for future.

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

kellyguo11 added 4 commits November 10, 2025 16:44

Adds RL device for allowing CPU sim and RL device

6588f85

add test case

ee5a07e

isolate pick_and_place change

b40845e

update

c176374

kellyguo11 requested review from ClemensSchwarke, Mayankm96, Toni-SM and ooctipus as code owners November 11, 2025 01:02

github-actions bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Nov 11, 2025

format

cd1764c

kellyguo11 added this to Isaac Lab Nov 11, 2025

kellyguo11 moved this to In review in Isaac Lab Nov 11, 2025

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

ooctipus approved these changes Nov 11, 2025

View reviewed changes

kellyguo11 added 2 commits November 10, 2025 19:41

fix test

422f8e1

format

ec417d4

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes device decoupling of RL Games training vs. sim device #3997

Fixes device decoupling of RL Games training vs. sim device #3997

kellyguo11 commented Nov 11, 2025

Uh oh!

greptile-apps bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

ooctipus left a comment

Uh oh!

ooctipus Nov 11, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fixes device decoupling of RL Games training vs. sim device #3997

Are you sure you want to change the base?

Fixes device decoupling of RL Games training vs. sim device #3997

Conversation

kellyguo11 commented Nov 11, 2025

Description

Type of change

Checklist

Uh oh!

greptile-apps bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

ooctipus left a comment

Choose a reason for hiding this comment

Uh oh!

ooctipus Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Nov 11, 2025 •

edited

Loading