GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

Vibhas K. Vats¹, Sripad Joshi¹, David J. Crandall¹, Md. Alimoor Reza², Soon-heung Jung³

¹Indiana University ²Drake University ³ETRI
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)-2024

Abstract

Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.

WACV-2024 presentation video. Watch on YouTube.

Point cloud reconstruction of Scene 1 of DTU.

Point cloud reconstruction of Scene 4 of DTU.

Point cloud reconstruction of Scene 9 of DTU.

Point cloud reconstruction of Scene 118 of DTU.

Point cloud reconstruction of Scene 110 of DTU.

Point cloud reconstruction of Scene 23 of DTU.

Point cloud reconstruction of Scene 33 of DTU.

Point cloud reconstruction of Scene Horse of Tanks and Temples

Point cloud reconstruction of Scene Family of Tanks and Temples

Point cloud reconstruction of Scene Museum of Tanks and Temples

Point cloud reconstruction of Scene Palace of Tanks and Temples

Point cloud reconstruction of Scene Courtroom of Tanks and Temples

Point cloud reconstruction of Scene M60 of Tanks and Temples

Point cloud reconstruction of Playground of Tanks and Temples

-->

Poster

Click or Scan to provide your feedback.

BibTeX

@InProceedings{Vats_2024_WACV,
    author    = {Vats, Vibhas K and Joshi, Sripad and Crandall, David and Reza, Md. and Jung, Soon-heung },
    title     = {GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {3242-3252}
    
    @article{Vats2025gcmvsnet++,
    author    = {Vats, Vibhas Kumar and Reza, Md. Alimoor and Crandall, David J. and Jung, Soon-heung},
    title     = {Blending 3D Geometry and Machine Learning for Multi-View Stereopsis (accepted - Nurocomputing)},
    booktitle = {Nurocomputing},
    month     = {8},
    year      = {2025},
    pages     = {}
}
}