Skip to content

nianticlabs/mvsanywhere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MVSAnywhere: Zero Shot Multi-View Stereo

A multi-view stereo depth estimation model which works anywhere, in any scene, with any range of depths

MVSAnywhere: Zero Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow and Jamie Watson.

Paper, CVPR 2025 (arXiv pdf), Project Page

dron_mtb_both.mp4

This code is for non-commercial use; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link this repo. Thanks!

Table of Contents

โš™๏ธ Setup

We are going to create a new Mamba environment called mvsanywhere. If you don't have Mamba, you can install it with:

make install-mamba
make create-mamba-env
mamba activate mvsanywhere

In the code directory, install the repo as a pip package:

pip install -e .

To use our Gaussian splatting regularization also install that module:

pip install -e src/regsplatfacto/

๐Ÿ“ฆ Pretrained Models

We provide 2 variants of our models: mvsanywhere_hero.ckpt and mvsanywhere_dot.ckpt. mvsanywhere_hero is "Ours" from the main paper, and mvsanywhere_dot is ours with no metadata MLP.

๐Ÿƒ Running out of the box!

We've now included two scans for people to try out immediately with the code. You can download these scans from here.

Steps:

  1. Download weights for the hero_model into the weights directory.
  2. Download the scans and unzip them to a directory of your choosing.
  3. You should be able to run it! Something like this will work:
CUDA_VISIBLE_DEVICES=0 python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/vdr/vdr_dense.yaml \
    --scan_parent_directory /path/to/vdr/ \
    --scan_name house \ # Scan name (house or living_room)
    --num_workers 8 \
    --batch_size 2 \
    --fast_cost_volume \
    --run_fusion \
    --depth_fuser custom_open3d \
    --fuse_color \
    --fusion_max_depth 3.5 \
    --fusion_resolution 0.02 \
    --extended_neg_truncation \
    --dump_depth_visualization

This will output meshes, quick depth viz, and socres when benchmarked against LiDAR depth under OUTPUT_PATH.

If you run out of GPU memory, you can try removing the --fast_cost_colume flag.

Running on recordings from your own device!

๐Ÿ iOS

How to use NeRF Capture to record videos
  1. Download the NeRF Capture app from the App Store. Capture a recording of your favourite environment and save it.

  2. Place your recordings in a directory with the following structure:

/path/to/recordings/
โ”‚-- recording_0/
โ”‚   โ”‚-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
โ”‚   โ”‚-- transforms.json
โ”‚-- recording_1/
|   ...
  1. And run the model ๐Ÿš€๐Ÿš€๐Ÿš€
python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/nerfstudio/nerfstudio_empty.yaml \
    --scan_parent_directory /path/to/recordings/ \
    --scan_name recording_0 \
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization \
    --rotate_images # Only if you recorded in portrait

๐Ÿ“ฑ Android

Use ARCorder to get a video in Android with camera poses
  1. Download the ARCorder app from releases. This very simple app relies on Android AR Core system, accuracy of the computed poses might be limited. Capture a recording of your favourite environment and save it.

  2. Place your recordings in a directory with the following structure:

/path/to/recordings/
โ”‚-- recording_0/
โ”‚   โ”‚-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
โ”‚   โ”‚-- transforms.json
โ”‚-- recording_1/
|   ...
  1. And run the model ๐Ÿš€๐Ÿš€๐Ÿš€
python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/nerfstudio/nerfstudio_empty.yaml \
    --scan_parent_directory /path/to/recordings/ \
    --scan_name recording_0 \
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization \
    --rotate_images # Only if you recorded in portrait

๐Ÿ“ท Custom data

Use COLMAP to obtain a sparse reconstruction

If you already have a COLMAP reconstruction skip to 4.

  1. Install nerfstudio
  2. Install COLMAP using conda install -c conda-forge colmap.
  3. Process your video/sequence using
ns-process-data {images, video} --data {DATA_PATH} --output-dir {PROCESSED_DATA_DIR}
  1. Your reconstructions should have the following structure:
/path/to/reconstruction/
โ”‚-- reconstruction_0/
โ”‚   โ”‚-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
โ”‚   โ”‚-- colmap/
|   |   |-- database.db
|   |   |-- sparse/
|   |   |   |-- 0/
|   |   |   |   |-- cameras.bin
|   |   |   |   |-- images.bin
|   |   |   |   ...
|   |   |   |-- 1/
|   |   |   |   ...
โ”‚-- reconstruction_1/
|   ...
  1. And run the model ๐Ÿš€๐Ÿš€๐Ÿš€
python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/colmap/colmap_empty.yaml \
    --scan_parent_directory /path/to/reconstruction \
    --scan_name reconstruction_0:0 \ # reconstruction_name:n where n is the colmap sparse model
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization

Running Gaussian splatting with MVSAnywhere regularisation!

splats_reg.mp4

We release code regsplatfacto to run splatting using MVSAnywhere depths as regularisation. This is heavily inspired by techniques such as DN-Splatter and VCR-Gauss.

You can use any data in the nerfstudio format - e.g. existing nerfstudio data, or data from the 3 sources listed above.

If you are using data which has camera distortion, you will need to run our script scripts/data_scripts/undistort_nerfstudio_data.py:

python3 scripts/data_scripts/undistort_nerfstudio_data.py \
    --data-dir /path/to/input/scene \
    --output-dir /path/to/output/scene

Additionally, the NeRF Capture app saves frame metadata without file extension. To run splatting you will need to run our script scripts/data_scripts/fix_nerfcapture_filenames.py.

To train a splat, you can use

ns-train regsplatfacto \
    --data path/to/data \
    --experiment-name mvsanywhere-splatting \
    --pipeline.datamanager.load_weights_from_checkpoint path/to/model \
    --pipeline.model.use-skybox False

This will first run mvsanywhere inference and save outputs to disk, and then start training your splat.

Tips:

  • If your data was captured with a phone in portrait mode, you can append the flag --pipeline.datamanager.rotate_images True.
  • If your data contains a lot of sky, you can try adding a background skybox using --pipeline.model.use-skybox True.

Once you have a splat, you can extract a mesh using TSDF fusion, using

ns-render-for-meshing \
    --load-config /path/to/splat/config \
    --rescale_to_world True \
    --output_path /path/to/render/outputs
ns-meshing \
    --renders-path /path/to/render/outputs \
    --max_depth 20.0  \
    --save-name mvsanywhere_mesh  \
    --voxel_size 0.04

If you are running on a scene reconstructed without metric scale (e.g. COLMAP), then you will need to adjust the max_depth and voxel_size to be something sensible for your scale.

Congratulations - you now have a splat and a mesh!

๐Ÿ“Š Testing and Evaluation

Robust Multi-View Depth Benchmark (RMVD)

We used the Robust Multi-View Depth Benchmark to evaluate MVSAnywhere depth estimation on a zero-shot environment with multiple datasets.

To evaluate MVSAnywhere on this benchmark, first, download the benchmark code in your system:

git clone https://github.com/lmb-freiburg/robustmvd.git

Now, download and preprocess the evaluation datasets following this guide. You should download:

  • KITTI
  • Scannet
  • ETH3D
  • DTU
  • Tanks and Temples

Don't forget to set the path to these datasets in rmvd/data/paths.toml. Now you are ready to evaluate MVSAnywhere by just running:

export PYTHONPATH="/path/to/robustmvd/:$PYTHONPATH"

python src/mvsanywhere/test_rmvd.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt

๐Ÿ”จ Training

To train MVSAnywhere:

  1. Download all the required synthetic datasets (and val dataset):

    Hypersim
    • Download following instructions from here:
    python code/python/tools/dataset_download_images.py \
      --downloads_dir path/to/download \
      --decompress_dir /path/to/hypersim/raw
    • Update configs/data/hypersim/hypersim_default_train.yaml to point to the correct location.
    • Convert distances into planar depth using the provided script in this repo:
    python ./data_scripts/generate_hypersim_planar_depths.py \
            --data_config configs/data/hypersim_default_train.yaml \
            --num_workers 8 
    TartanAir
    • Download following instructions from here:
    python download_training.py \
      --output-dir /path/to/tartan \
      --rgb \
      --depth \
      --seg \
      --only-left \
      --unzip
    • Update configs/data/tartanair/tartanair_default_train.yaml to point to the correct location.
    BlendedMVG
    • Download following instructions from here.
    • You should download BlendedMVS, BlendedMVS+ and BlendedMVS++, all low-res. Place all on the same folder.
    • Update configs/data/blendedmvg/blendedmvg_default_train.yaml to point to the correct location.
    MatrixCity
    • Download following instructions from here.
    • You should download big_city, big_city_depth, big_city_depth_float32.
    • Update configs/data/matrix_city/matrix_city_default_train.yaml to point to the correct location.
    VKITTI2
    • Download following instructions from here.
    • You should download rgb, depth, classSegmentation and textgt.
    • Update configs/data/vkitti/vkitti_default_train.yaml to point to the correct location.
    Dynamic Replica
    • Download following instructions from here.
    • After download you can remove unused stuff to save disk space (segmentation, optical flow and pixel trajectories.)
    • Update configs/data/dynamic_replica/dynamic_replica_default_train.yaml to point to the correct location.
    MVSSynth
    • Download following instructions from here.
    • You should download the 960x540 version.
    • Update configs/data/mvssynth/mvssynth_default_train.yaml to point to the correct location.
    SAIL-VOS 3D
    • Download following instructions from here.
    • You will need to contact the authors to download the data.
    • Buy Grand Theft Auto V.
    • (optional, recommended) Play Grand Theft Auto V and relax a little bit.
    • Update configs/data/sailvos3d/sailvos3d_default_train.yaml to point to the correct location.
    ScanNet v2 (Optional, val only)
    • Follow the instructions from here.
  2. Download Depth Anything v2 base weights from here.

  3. Now you can train the model using:

python src/mvsanywhere/train.py \
  --log_dir logs/ \
  --name mvsanywhere_training \
  --config_file configs/models/mvsanywhere_model.yaml \
  --data_config configs/data/hypersim/hypersim_default_train.yaml:configs/data/tartanair/tartanair_default_train.yaml:configs/data/blendedmvg/blendedmvg_default_train.yaml:configs/data/matrix_city/matrix_city_default_train.yaml:configs/data/vkitti/vkitti_default_train.yaml:configs/data/dynamic_replica/dynamic_replica_default_train.yaml:configs/data/mvssynth/mvssynth_default_train.yaml:configs/data/sailvos3d/sailvos3d_default_train.yaml \
  --val_data_config configs/data/scannet/scannet_default_val.yaml \
  --batch_size 6 \
  --val_batch_size 6 \
  --da_weights_path /path/to/depth_anything_v2_vitb.pth \
  --gpus 2

๐Ÿ“๐Ÿงฎ๐Ÿ‘ฉโ€๐Ÿ’ป Notation for Transformation Matrices

TL;DR: world_T_cam == world_from_cam
This repo uses the notation "cam_T_world" to denote a transformation from world to camera points (extrinsics). The intention is to make it so that the coordinate frame names would match on either side of the variable when used in multiplication from right to left:

cam_points = cam_T_world @ world_points

world_T_cam denotes camera pose (from cam to world coords). ref_T_src denotes a transformation from a source to a reference view.
Finally this notation allows for representing both rotations and translations such as: world_R_cam and world_t_cam

๐Ÿ—บ๏ธ World Coordinate System

This repo is geared towards ScanNet, so while its functionality should allow for any coordinate system (signaled via input flags), the model weights we provide assume a ScanNet coordinate system. This is important since we include ray information as part of metadata. Other datasets used with these weights should be transformed to the ScanNet system. The dataset classes we include will perform the appropriate transforms.

๐Ÿ™ Acknowledgements

The tuple generation scripts make heavy use of a modified version of DeepVideoMVS's Keyframe buffer (thanks Arda and co!).

We'd like to thank the Niantic Raptor R&D infrastructure team - Saki Shinoda, Jakub Powierza, and Stanimir Vichev - for their valuable infrastructure support.

๐Ÿ“œ BibTeX

If you find our work useful in your research please consider citing our paper:

@inproceedings{izquierdo2025mvsanywhere,
  title={{MVSAnywhere}: Zero Shot Multi-View Stereo},
  author={Izquierdo, Sergio and Sayed, Mohamed and Firman, Michael and Garcia-Hernando, Guillermo and Turmukhambetov, Daniyar and Civera, Javier and Mac Aodha, Oisin and Brostow, Gabriel J. and Watson, Jamie},
  booktitle={CVPR},
  year={2025}
}

๐Ÿ‘ฉโ€โš–๏ธ License

Copyright ยฉ Niantic, Inc. 2024. Patent Pending. All rights reserved. Please see the license file for terms.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5