Ego-Vision World Model for Humanoid Contact Planning

Overview

🌐 Project Page | 📄 arXiv | Dataset | ⚙️ checkpoint

This repository provides official implementation of the paper "Ego-Vision World Model for Humanoid Contact Planning", which provides a complete workflow for learning humanoid ego-vision world model and planning for contact in Isaac Lab.

Maintainer: Hang Liu
Contact: [email protected]

Installation

Install Isaac Lab by following the official guide. Using the conda-based workflow is recommended to simplify Python environment management.

Clone this repository outside the Isaac Lab directory:

git clone [email protected]:HybridRobotics/Ego-VCP.git

Install the Python packages using an Isaac Lab-enabled interpreter:
```
 cd Ego-VCP
 python -m pip install -e ./rsl_rl -e .
```

Quick Start

Verify that your setup can launch the Isaac Lab environments and stream sensor data:

# With desktop rendering
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=16

# Headless execution
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=16 --headless

Successful execution should spawn a set of simulated robots performing random interactions.

# We also offer trained checkpoint for you to play quickly
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'

Datasets and Checkpoints

All datasets and checkpoints are accessible from the repository and HuggingFace:

Low-level controller: EgoVCP_Controller
High level world model checkpoint: EgoVCP_Checkpoint
World-model training dataset: EgoVCP_Dataset

The dataset contains depth imagery and can take time to download or regenerate. Ensure sufficient storage and bandwidth before starting collection jobs.

Data Preparation

We recommend directly clone our collected dataset from EgoVCP_Dataset

cd Ego-VCP
mkdir dataset
git clone https://huggingface.co/datasets/Hang917/EgoVCP_Dataset.git dataset

Collect data by your own(Optional)

In case you want to collect data by your own, we also provide the script for data collection sampling from uniform distribution in normalized action space.

Use the provided scripts to gather demonstration-free rollouts for each manipulation scenario:

# Task 1: Support the wall
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=512 --headless

# Task 2: Block the ball
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=512 --headless

# Task 3: Traverse the tunnel
python ego_vcp/scripts/collect.py --task=g1_tunnel --enable_cameras --num_envs=512 --headless

Each job logs depth and RGB observations; expect long runtimes when collecting full coverage datasets.

Offline World Model Training

Launch offline training on the collected dataset:

python ego_vcp/scripts/train_wm.py

Training parameters are controlled via the configuration file ego_vcp/scripts/train_wm_config.yaml.

Dataset Configuration:

Single-task training: Specify the path to individual task datasets:
- Wall manipulation: dataset/dataset/wall
- Ball blocking: dataset/dataset/ball
- Tunnel traversal: dataset/dataset/tunnel
Multi-task training: Manually combine all three task datasets into a unified directory:
- Combined dataset: dataset/dataset/all

Update the data.data_dir parameter in the config file to point to your desired dataset path.

Training for 20–25 iterations usually offers the best balance between convergence and overfitting.

The trained models are placed in /wm_logs.

Planning with the World Model

Planning with trained world model:

# multi tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
# single tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/ball/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/wall/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/tunnel/world_model.pt'
# custom
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test

Training parameters are controlled via the configuration file ego_vcp/scripts/play_wm_config.yaml.

Troubleshooting

Ball Contact Model (Compliant vs. Rigid)

In our real-world experiments, we use a yoga ball to block motion. By default, the ball is set to compliant contact, which makes it bounce more realistically. However, compliant contact can reduce the accuracy of contact sensing in Isaac Lab and affect reward calculations. Recommendation: For data collection and evaluation, switch the ball to rigid contact.

physics_material=sim_utils.RigidBodyMaterialCfg(
            static_friction=0.4,
            dynamic_friction=0.4,
            restitution=0.1,                 
            # restitution_combine_mode="max",
            # compliant_contact_stiffness = 1000,
            # compliant_contact_damping = 2,
        ),

Network Redundant Design

Our network retains some redundant designs, such as reward prediction, Q_target, etc., for the convenience of fair ablation experiments, but they are not actually used during play.

Training Iteration

We recommend playing and evaluating using checkpoint from 20-25 training epoches, or tune latent_pretrain_epoch to pretrain latent z_t and h_t first before full training. The reason is since the distribution of data collecting(Uniform) is quite different with the optimal distribution. val_dataset is also not a reliable indicator of value overfitting under this distribution shift.

References and Acknowledgements

This project builds upon extensive prior work:

Infrastructure and reinforcement learning foundations from LeggedLab and RSL_RL
World-model design inspired by DreamerV3 and TDMPC2

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ego_vcp		ego_vcp
logs/g1_flat/pre-trained/exported		logs/g1_flat/pre-trained/exported
media		media
rsl_rl		rsl_rl
wm_logs		wm_logs
.flake8		.flake8
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ego-Vision World Model for Humanoid Contact Planning

Overview

Table of Contents

Installation

Quick Start

Datasets and Checkpoints

Data Preparation

Offline World Model Training

Planning with the World Model

Troubleshooting

Ball Contact Model (Compliant vs. Rigid)

Network Redundant Design

Training Iteration

References and Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

HybridRobotics/Ego-VCP

Folders and files

Latest commit

History

Repository files navigation

Ego-Vision World Model for Humanoid Contact Planning

Overview

Table of Contents

Installation

Quick Start

Datasets and Checkpoints

Data Preparation

Offline World Model Training

Planning with the World Model

Troubleshooting

Ball Contact Model (Compliant vs. Rigid)

Network Redundant Design

Training Iteration

References and Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages