Skip to content

HybridRobotics/Ego-VCP

Repository files navigation

Ego-Vision World Model for Humanoid Contact Planning

IsaacSim Isaac Lab Python arXiv

Overview

🌐 Project Page |  📄 arXiv |  HuggingFace Dataset |  ⚙️ checkpoint

This repository provides official implementation of the paper "Ego-Vision World Model for Humanoid Contact Planning", which provides a complete workflow for learning humanoid ego-vision world model and planning for contact in Isaac Lab.

Maintainer: Hang Liu
Contact: [email protected]

Table of Contents

  1. Installation
  2. Quick Start
  3. Datasets and Checkpoints
  4. Data Collection
  5. Offline World Model Training
  6. Playing with the World Model
  7. References and Acknowledgements

Installation

  1. Install Isaac Lab by following the official guide. Using the conda-based workflow is recommended to simplify Python environment management.
  2. Clone this repository outside the Isaac Lab directory:
    git clone [email protected]:HybridRobotics/Ego-VCP.git
  3. Install the Python packages using an Isaac Lab-enabled interpreter:
     cd Ego-VCP
     python -m pip install -e ./rsl_rl -e .

Quick Start

Verify that your setup can launch the Isaac Lab environments and stream sensor data:

# With desktop rendering
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=16

# Headless execution
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=16 --headless

Successful execution should spawn a set of simulated robots performing random interactions.

Data Collection
# We also offer trained checkpoint for you to play quickly
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
Data Collection

Datasets and Checkpoints

All datasets and checkpoints are accessible from the repository and HuggingFace:

The dataset contains depth imagery and can take time to download or regenerate. Ensure sufficient storage and bandwidth before starting collection jobs.

Data Preparation

We recommend directly clone our collected dataset from EgoVCP_Dataset

cd Ego-VCP
mkdir dataset
git clone https://huggingface.co/datasets/Hang917/EgoVCP_Dataset.git dataset
Collect data by your own(Optional)

In case you want to collect data by your own, we also provide the script for data collection sampling from uniform distribution in normalized action space.

Use the provided scripts to gather demonstration-free rollouts for each manipulation scenario:

# Task 1: Support the wall
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=512 --headless

# Task 2: Block the ball
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=512 --headless

# Task 3: Traverse the tunnel
python ego_vcp/scripts/collect.py --task=g1_tunnel --enable_cameras --num_envs=512 --headless

Each job logs depth and RGB observations; expect long runtimes when collecting full coverage datasets.

Offline World Model Training

Launch offline training on the collected dataset:

python ego_vcp/scripts/train_wm.py

Training parameters are controlled via the configuration file ego_vcp/scripts/train_wm_config.yaml.

Dataset Configuration:

  • Single-task training: Specify the path to individual task datasets:
    • Wall manipulation: dataset/dataset/wall
    • Ball blocking: dataset/dataset/ball
    • Tunnel traversal: dataset/dataset/tunnel
  • Multi-task training: Manually combine all three task datasets into a unified directory:
    • Combined dataset: dataset/dataset/all

Update the data.data_dir parameter in the config file to point to your desired dataset path.

Training for 20–25 iterations usually offers the best balance between convergence and overfitting.

The trained models are placed in /wm_logs.

Planning with the World Model

Planning with trained world model:

# multi tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
# single tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/ball/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/wall/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/tunnel/world_model.pt'
# custom
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test

Training parameters are controlled via the configuration file ego_vcp/scripts/play_wm_config.yaml.

Troubleshooting

Ball Contact Model (Compliant vs. Rigid)

In our real-world experiments, we use a yoga ball to block motion. By default, the ball is set to compliant contact, which makes it bounce more realistically. However, compliant contact can reduce the accuracy of contact sensing in Isaac Lab and affect reward calculations. Recommendation: For data collection and evaluation, switch the ball to rigid contact.

physics_material=sim_utils.RigidBodyMaterialCfg(
            static_friction=0.4,
            dynamic_friction=0.4,
            restitution=0.1,                 
            # restitution_combine_mode="max",
            # compliant_contact_stiffness = 1000,
            # compliant_contact_damping = 2,
        ),

Network Redundant Design

Our network retains some redundant designs, such as reward prediction, Q_target, etc., for the convenience of fair ablation experiments, but they are not actually used during play.

Training Iteration

We recommend playing and evaluating using checkpoint from 20-25 training epoches, or tune latent_pretrain_epoch to pretrain latent z_t and h_t first before full training. The reason is since the distribution of data collecting(Uniform) is quite different with the optimal distribution. val_dataset is also not a reliable indicator of value overfitting under this distribution shift.

References and Acknowledgements

This project builds upon extensive prior work:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages