🌐 Project Page |
📄 arXiv |
Dataset |
⚙️ checkpoint
This repository provides official implementation of the paper "Ego-Vision World Model for Humanoid Contact Planning", which provides a complete workflow for learning humanoid ego-vision world model and planning for contact in Isaac Lab.
Maintainer: Hang Liu
Contact: [email protected]
- Installation
- Quick Start
- Datasets and Checkpoints
- Data Collection
- Offline World Model Training
- Playing with the World Model
- References and Acknowledgements
- Install Isaac Lab by following the official guide. Using the conda-based workflow is recommended to simplify Python environment management.
- Clone this repository outside the Isaac Lab directory:
git clone [email protected]:HybridRobotics/Ego-VCP.git
- Install the Python packages using an Isaac Lab-enabled interpreter:
cd Ego-VCP python -m pip install -e ./rsl_rl -e .
Verify that your setup can launch the Isaac Lab environments and stream sensor data:
# With desktop rendering
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=16
# Headless execution
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=16 --headlessSuccessful execution should spawn a set of simulated robots performing random interactions.
# We also offer trained checkpoint for you to play quickly
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'All datasets and checkpoints are accessible from the repository and HuggingFace:
- Low-level controller: EgoVCP_Controller
- High level world model checkpoint: EgoVCP_Checkpoint
- World-model training dataset: EgoVCP_Dataset
The dataset contains depth imagery and can take time to download or regenerate. Ensure sufficient storage and bandwidth before starting collection jobs.
We recommend directly clone our collected dataset from EgoVCP_Dataset
cd Ego-VCP
mkdir dataset
git clone https://huggingface.co/datasets/Hang917/EgoVCP_Dataset.git datasetCollect data by your own(Optional)
In case you want to collect data by your own, we also provide the script for data collection sampling from uniform distribution in normalized action space.
Use the provided scripts to gather demonstration-free rollouts for each manipulation scenario:
# Task 1: Support the wall
python ego_vcp/scripts/collect.py --task=g1_wall --enable_cameras --num_envs=512 --headless
# Task 2: Block the ball
python ego_vcp/scripts/collect.py --task=g1_ball --enable_cameras --num_envs=512 --headless
# Task 3: Traverse the tunnel
python ego_vcp/scripts/collect.py --task=g1_tunnel --enable_cameras --num_envs=512 --headlessEach job logs depth and RGB observations; expect long runtimes when collecting full coverage datasets.
Launch offline training on the collected dataset:
python ego_vcp/scripts/train_wm.pyTraining parameters are controlled via the configuration file ego_vcp/scripts/train_wm_config.yaml.
Dataset Configuration:
- Single-task training: Specify the path to individual task datasets:
- Wall manipulation:
dataset/dataset/wall - Ball blocking:
dataset/dataset/ball - Tunnel traversal:
dataset/dataset/tunnel
- Wall manipulation:
- Multi-task training: Manually combine all three task datasets into a unified directory:
- Combined dataset:
dataset/dataset/all
- Combined dataset:
Update the data.data_dir parameter in the config file to point to your desired dataset path.
Training for 20–25 iterations usually offers the best balance between convergence and overfitting.
The trained models are placed in /wm_logs.
Planning with trained world model:
# multi tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/all/world_model.pt'
# single tasks - trained ckpt
python ego_vcp/scripts/play_wm.py --task=g1_ball --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/ball/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/wall/world_model.pt'
python ego_vcp/scripts/play_wm.py --task=g1_tunnel --enable_cameras --num_envs=1 --device=cpu --test --model_path='wm_logs/tunnel/world_model.pt'
# custom
python ego_vcp/scripts/play_wm.py --task=g1_wall --enable_cameras --num_envs=1 --device=cpu --testTraining parameters are controlled via the configuration file ego_vcp/scripts/play_wm_config.yaml.
In our real-world experiments, we use a yoga ball to block motion. By default, the ball is set to compliant contact, which makes it bounce more realistically. However, compliant contact can reduce the accuracy of contact sensing in Isaac Lab and affect reward calculations. Recommendation: For data collection and evaluation, switch the ball to rigid contact.
physics_material=sim_utils.RigidBodyMaterialCfg(
static_friction=0.4,
dynamic_friction=0.4,
restitution=0.1,
# restitution_combine_mode="max",
# compliant_contact_stiffness = 1000,
# compliant_contact_damping = 2,
),Our network retains some redundant designs, such as reward prediction, Q_target, etc., for the convenience of fair ablation experiments, but they are not actually used during play.
We recommend playing and evaluating using checkpoint from 20-25 training epoches, or tune latent_pretrain_epoch to pretrain latent z_t and h_t first before full training. The reason is since the distribution of data collecting(Uniform) is quite different with the optimal distribution. val_dataset is also not a reliable indicator of value overfitting under this distribution shift.
This project builds upon extensive prior work:

