MVSAnywhere: Zero Shot Multi-View Stereo

A multi-view stereo depth estimation model which works anywhere, in any scene, with any range of depths

MVSAnywhere: Zero Shot Multi-View Stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow and Jamie Watson.

Paper, CVPR 2025 (arXiv pdf), Project Page

dron_mtb_both.mp4

This code is for non-commercial use; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below and link this repo. Thanks!

⚙️ Setup

We are going to create a new Mamba environment called mvsanywhere. If you don't have Mamba, you can install it with:

make install-mamba

make create-mamba-env
mamba activate mvsanywhere

In the code directory, install the repo as a pip package:

pip install -e .

To use our Gaussian splatting regularization also install that module:

pip install -e src/regsplatfacto/

📦 Pretrained Models

We provide 2 variants of our models: mvsanywhere_hero.ckpt and mvsanywhere_dot.ckpt. mvsanywhere_hero is "Ours" from the main paper, and mvsanywhere_dot is ours with no metadata MLP.

🏃 Running out of the box!

We've now included two scans for people to try out immediately with the code. You can download these scans from here.

Steps:

Download weights for the hero_model into the weights directory.
Download the scans and unzip them to a directory of your choosing.
You should be able to run it! Something like this will work:

CUDA_VISIBLE_DEVICES=0 python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/vdr/vdr_dense.yaml \
    --scan_parent_directory /path/to/vdr/ \
    --scan_name house \ # Scan name (house or living_room)
    --num_workers 8 \
    --batch_size 2 \
    --fast_cost_volume \
    --run_fusion \
    --depth_fuser custom_open3d \
    --fuse_color \
    --fusion_max_depth 3.5 \
    --fusion_resolution 0.02 \
    --extended_neg_truncation \
    --dump_depth_visualization

This will output meshes, quick depth viz, and socres when benchmarked against LiDAR depth under OUTPUT_PATH.

If you run out of GPU memory, you can try removing the --fast_cost_colume flag.

Running on recordings from your own device!

🍏 iOS

How to use NeRF Capture to record videos

Download the NeRF Capture app from the App Store. Capture a recording of your favourite environment and save it.
Place your recordings in a directory with the following structure:

/path/to/recordings/
│-- recording_0/
│   │-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
│   │-- transforms.json
│-- recording_1/
|   ...

And run the model 🚀🚀🚀

python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/nerfstudio/nerfstudio_empty.yaml \
    --scan_parent_directory /path/to/recordings/ \
    --scan_name recording_0 \
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization \
    --rotate_images # Only if you recorded in portrait

📱 Android

Use ARCorder to get a video in Android with camera poses

Download the ARCorder app from releases. This very simple app relies on Android AR Core system, accuracy of the computed poses might be limited. Capture a recording of your favourite environment and save it.
Place your recordings in a directory with the following structure:

/path/to/recordings/
│-- recording_0/
│   │-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
│   │-- transforms.json
│-- recording_1/
|   ...

And run the model 🚀🚀🚀

python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/nerfstudio/nerfstudio_empty.yaml \
    --scan_parent_directory /path/to/recordings/ \
    --scan_name recording_0 \
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization \
    --rotate_images # Only if you recorded in portrait

📷 Custom data

Use COLMAP to obtain a sparse reconstruction

If you already have a COLMAP reconstruction skip to 4.

Install nerfstudio
Install COLMAP using conda install -c conda-forge colmap.
Process your video/sequence using

ns-process-data {images, video} --data {DATA_PATH} --output-dir {PROCESSED_DATA_DIR}

Your reconstructions should have the following structure:

/path/to/reconstruction/
│-- reconstruction_0/
│   │-- images/
|   |   |-- image_0.png
|   |   |-- image_1.png
|   |   ...
│   │-- colmap/
|   |   |-- database.db
|   |   |-- sparse/
|   |   |   |-- 0/
|   |   |   |   |-- cameras.bin
|   |   |   |   |-- images.bin
|   |   |   |   ...
|   |   |   |-- 1/
|   |   |   |   ...
│-- reconstruction_1/
|   ...

And run the model 🚀🚀🚀

python src/mvsanywhere/run_demo.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt \
    --data_config_file configs/data/colmap/colmap_empty.yaml \
    --scan_parent_directory /path/to/reconstruction \
    --scan_name reconstruction_0:0 \ # reconstruction_name:n where n is the colmap sparse model
    --fast_cost_volume \
    --num_workers 8 \
    --batch_size 2 \
    --image_height 480 \
    --image_width 640 \
    --dump_depth_visualization

Running Gaussian splatting with MVSAnywhere regularisation!

splats_reg.mp4

We release code regsplatfacto to run splatting using MVSAnywhere depths as regularisation. This is heavily inspired by techniques such as DN-Splatter and VCR-Gauss.

You can use any data in the nerfstudio format - e.g. existing nerfstudio data, or data from the 3 sources listed above.

If you are using data which has camera distortion, you will need to run our script scripts/data_scripts/undistort_nerfstudio_data.py:

python3 scripts/data_scripts/undistort_nerfstudio_data.py \
    --data-dir /path/to/input/scene \
    --output-dir /path/to/output/scene

Additionally, the NeRF Capture app saves frame metadata without file extension. To run splatting you will need to run our script scripts/data_scripts/fix_nerfcapture_filenames.py.

To train a splat, you can use

ns-train regsplatfacto \
    --data path/to/data \
    --experiment-name mvsanywhere-splatting \
    --pipeline.datamanager.load_weights_from_checkpoint path/to/model \
    --pipeline.model.use-skybox False

This will first run mvsanywhere inference and save outputs to disk, and then start training your splat.

Tips:

If your data was captured with a phone in portrait mode, you can append the flag --pipeline.datamanager.rotate_images True.

If your data contains a lot of sky, you can try adding a background skybox using --pipeline.model.use-skybox True.

Once you have a splat, you can extract a mesh using TSDF fusion, using

ns-render-for-meshing \
    --load-config /path/to/splat/config \
    --rescale_to_world True \
    --output_path /path/to/render/outputs
ns-meshing \
    --renders-path /path/to/render/outputs \
    --max_depth 20.0  \
    --save-name mvsanywhere_mesh  \
    --voxel_size 0.04

If you are running on a scene reconstructed without metric scale (e.g. COLMAP), then you will need to adjust the max_depth and voxel_size to be something sensible for your scale.

Congratulations - you now have a splat and a mesh!

📊 Testing and Evaluation

Robust Multi-View Depth Benchmark (RMVD)

We used the Robust Multi-View Depth Benchmark to evaluate MVSAnywhere depth estimation on a zero-shot environment with multiple datasets.

To evaluate MVSAnywhere on this benchmark, first, download the benchmark code in your system:

git clone https://github.com/lmb-freiburg/robustmvd.git

Now, download and preprocess the evaluation datasets following this guide. You should download:

KITTI
Scannet
ETH3D
DTU
Tanks and Temples

Don't forget to set the path to these datasets in rmvd/data/paths.toml. Now you are ready to evaluate MVSAnywhere by just running:

export PYTHONPATH="/path/to/robustmvd/:$PYTHONPATH"

python src/mvsanywhere/test_rmvd.py \
    --name mvsanywhere \
    --output_base_path OUTPUT_PATH \
    --config_file configs/models/mvsanywhere_model.yaml \
    --load_weights_from_checkpoint weights/mvsanywhere_hero.ckpt

🔨 Training

To train MVSAnywhere:

Download all the required synthetic datasets (and val dataset):
Hypersim
- Download following instructions from here:
```
python code/python/tools/dataset_download_images.py \
  --downloads_dir path/to/download \
  --decompress_dir /path/to/hypersim/raw
```
- Update configs/data/hypersim/hypersim_default_train.yaml to point to the correct location.
- Convert distances into planar depth using the provided script in this repo:
```
python ./data_scripts/generate_hypersim_planar_depths.py \
        --data_config configs/data/hypersim_default_train.yaml \
        --num_workers 8 
```
TartanAir
- Download following instructions from here:
```
python download_training.py \
  --output-dir /path/to/tartan \
  --rgb \
  --depth \
  --seg \
  --only-left \
  --unzip
```
- Update configs/data/tartanair/tartanair_default_train.yaml to point to the correct location.
BlendedMVG
- Download following instructions from here.
- You should download BlendedMVS, BlendedMVS+ and BlendedMVS++, all low-res. Place all on the same folder.
- Update configs/data/blendedmvg/blendedmvg_default_train.yaml to point to the correct location.
MatrixCity
- Download following instructions from here.
- You should download big_city, big_city_depth, big_city_depth_float32.
- Update configs/data/matrix_city/matrix_city_default_train.yaml to point to the correct location.
VKITTI2
- Download following instructions from here.
- You should download rgb, depth, classSegmentation and textgt.
- Update configs/data/vkitti/vkitti_default_train.yaml to point to the correct location.
Dynamic Replica
- Download following instructions from here.
- After download you can remove unused stuff to save disk space (segmentation, optical flow and pixel trajectories.)
- Update configs/data/dynamic_replica/dynamic_replica_default_train.yaml to point to the correct location.
MVSSynth
- Download following instructions from here.
- You should download the 960x540 version.
- Update configs/data/mvssynth/mvssynth_default_train.yaml to point to the correct location.
SAIL-VOS 3D
- Download following instructions from here.
- You will need to contact the authors to download the data.
- Buy Grand Theft Auto V.
- (optional, recommended) Play Grand Theft Auto V and relax a little bit.
- Update configs/data/sailvos3d/sailvos3d_default_train.yaml to point to the correct location.
ScanNet v2 (Optional, val only)
- Follow the instructions from here.
Download Depth Anything v2 base weights from here.
Now you can train the model using:

python src/mvsanywhere/train.py \
  --log_dir logs/ \
  --name mvsanywhere_training \
  --config_file configs/models/mvsanywhere_model.yaml \
  --data_config configs/data/hypersim/hypersim_default_train.yaml:configs/data/tartanair/tartanair_default_train.yaml:configs/data/blendedmvg/blendedmvg_default_train.yaml:configs/data/matrix_city/matrix_city_default_train.yaml:configs/data/vkitti/vkitti_default_train.yaml:configs/data/dynamic_replica/dynamic_replica_default_train.yaml:configs/data/mvssynth/mvssynth_default_train.yaml:configs/data/sailvos3d/sailvos3d_default_train.yaml \
  --val_data_config configs/data/scannet/scannet_default_val.yaml \
  --batch_size 6 \
  --val_batch_size 6 \
  --da_weights_path /path/to/depth_anything_v2_vitb.pth \
  --gpus 2

📝🧮👩‍💻 Notation for Transformation Matrices

TL;DR: world_T_cam == world_from_cam
This repo uses the notation "cam_T_world" to denote a transformation from world to camera points (extrinsics). The intention is to make it so that the coordinate frame names would match on either side of the variable when used in multiplication from right to left:

cam_points = cam_T_world @ world_points

world_T_cam denotes camera pose (from cam to world coords). ref_T_src denotes a transformation from a source to a reference view.
Finally this notation allows for representing both rotations and translations such as: world_R_cam and world_t_cam

🗺️ World Coordinate System

This repo is geared towards ScanNet, so while its functionality should allow for any coordinate system (signaled via input flags), the model weights we provide assume a ScanNet coordinate system. This is important since we include ray information as part of metadata. Other datasets used with these weights should be transformed to the ScanNet system. The dataset classes we include will perform the appropriate transforms.

🙏 Acknowledgements

The tuple generation scripts make heavy use of a modified version of DeepVideoMVS's Keyframe buffer (thanks Arda and co!).

We'd like to thank the Niantic Raptor R&D infrastructure team - Saki Shinoda, Jakub Powierza, and Stanimir Vichev - for their valuable infrastructure support.

📜 BibTeX

If you find our work useful in your research please consider citing our paper:

@inproceedings{izquierdo2025mvsanywhere,
  title={{MVSAnywhere}: Zero Shot Multi-View Stereo},
  author={Izquierdo, Sergio and Sayed, Mohamed and Firman, Michael and Garcia-Hernando, Guillermo and Turmukhambetov, Daniyar and Civera, Javier and Mac Aodha, Oisin and Brostow, Gabriel J. and Watson, Jamie},
  booktitle={CVPR},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configs		configs
data_splits		data_splits
demo_assets		demo_assets
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
simple_demo.py		simple_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MVSAnywhere: Zero Shot Multi-View Stereo

Table of Contents

⚙️ Setup

📦 Pretrained Models

🏃 Running out of the box!

Running on recordings from your own device!

🍏 iOS

📱 Android

📷 Custom data

Running Gaussian splatting with MVSAnywhere regularisation!

📊 Testing and Evaluation

Robust Multi-View Depth Benchmark (RMVD)

🔨 Training

📝🧮👩‍💻 Notation for Transformation Matrices

🗺️ World Coordinate System

🙏 Acknowledgements

📜 BibTeX

👩‍⚖️ License

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

nianticlabs/mvsanywhere

Folders and files

Latest commit

History

Repository files navigation

MVSAnywhere: Zero Shot Multi-View Stereo

Table of Contents

⚙️ Setup

📦 Pretrained Models

🏃 Running out of the box!

Running on recordings from your own device!

🍏 iOS

📱 Android

📷 Custom data

Running Gaussian splatting with MVSAnywhere regularisation!

📊 Testing and Evaluation

Robust Multi-View Depth Benchmark (RMVD)

🔨 Training

📝🧮👩‍💻 Notation for Transformation Matrices

🗺️ World Coordinate System

🙏 Acknowledgements

📜 BibTeX

👩‍⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages