Skip to content

Conversation

@kellyguo11
Copy link
Contributor

Description

Allows rlgames to decouple devices for simulation and training. This should allow running simulation on CPU and training on GPU

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added bug Something isn't working isaac-lab Related to Isaac Lab team labels Nov 11, 2025
@kellyguo11 kellyguo11 moved this to In review in Isaac Lab Nov 11, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 11, 2025

Greptile Overview

Greptile Summary

This PR enables device decoupling for RL Games, allowing simulation to run on one device (e.g., CPU) while training runs on another (e.g., GPU). The key changes are:

Core Implementation:

  • Added device transfer logic in RlGamesVecEnvWrapper._process_obs() to move observations from sim_device to rl_device
  • Existing step() method already handled action transfers (rl_device → sim_device) and reward/done transfers (sim_device → rl_device)
  • Removed the forced device coupling in train.py and play.py that previously overrode agent config to match sim device

How It Works:

  • The wrapper now reads rl_device from the agent config's params.config.device field
  • Actions generated on rl_device are transferred to sim_device before env.step()
  • Observations, rewards, dones, and extras are transferred from sim_device to rl_device after each step
  • The implementation correctly handles the case where devices are the same (no-op transfers)

Testing:

  • Comprehensive test suite added covering RL Games, RSL-RL, SB3, and skrl
  • Tests verify GPU→GPU, GPU→CPU, and CPU→GPU device combinations
  • All device transfers are validated to ensure data arrives on the correct device

The implementation is clean, well-documented, and follows the existing architecture. Device transfers use .to(device=...) with proper cloning where needed to avoid in-place modification issues.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - implementation is straightforward and well-tested
  • The changes are minimal, focused, and well-architected. The core logic simply adds observation device transfers to match existing action/reward transfer patterns. The removed code that forced device coupling was the actual bug being fixed. Comprehensive tests validate all device combinations across multiple RL libraries. No breaking changes to APIs or existing functionality.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py 5/5 Added device transfer logic in _process_obs to move observations from sim device to RL device when they differ
scripts/reinforcement_learning/rl_games/train.py 5/5 Removed manual device override logic that forced agent device to match sim device, enabling proper device decoupling
scripts/reinforcement_learning/rl_games/play.py 5/5 Removed manual device override logic that forced agent device to match sim device during inference

Sequence Diagram

sequenceDiagram
    participant Policy as RL Policy<br/>(rl_device)
    participant Wrapper as RlGamesVecEnvWrapper
    participant Env as Isaac Environment<br/>(sim_device)
    
    Note over Policy,Env: Training Step Flow
    
    Policy->>Wrapper: step(actions)<br/>[on rl_device]
    Wrapper->>Wrapper: actions.to(sim_device)<br/>Transfer to simulation
    Wrapper->>Env: step(actions)<br/>[on sim_device]
    Env-->>Wrapper: obs, rew, term, trunc<br/>[on sim_device]
    
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer obs to rl_device
    Wrapper->>Wrapper: rew.to(rl_device)<br/>dones.to(rl_device)<br/>extras.to(rl_device)
    
    Wrapper-->>Policy: obs, rew, dones, extras<br/>[on rl_device]
    
    Note over Policy,Env: Reset Flow
    
    Policy->>Wrapper: reset()
    Wrapper->>Env: reset()
    Env-->>Wrapper: obs_dict<br/>[on sim_device]
    Wrapper->>Wrapper: _process_obs(obs_dict)<br/>Transfer to rl_device
    Wrapper-->>Policy: obs<br/>[on rl_device]
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@ooctipus ooctipus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is a bit long to read, and feels like could be refactored to make a bit nicer. But everything else looks pretty good

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this test file can be refactored a bit :D, that way the step 1 - 5 maybe more clear, easier for future.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working isaac-lab Related to Isaac Lab team

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants