Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization (3DV 2024)
π¨ This repository contains download links to our code, and trained deep stereo models of our work "Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization", 3DV 2024
by Luca Bartolomei1,2, Matteo Poggi1,2, Andrea Conti2, Fabio Tosi2, and Stefano Mattoccia1,2
Advanced Research Center on Electronic System (ARCES)1 University of Bologna2
Note: π§ Kindly note that this repository is currently in the development phase. We are actively working to add and refine features and documentation. We apologize for any inconvenience caused by incomplete or missing elements and appreciate your patience as we work towards completion.
We would like to share with you our previous work Active Pattern Without Pattern Projector from which we took inspiration for this work.
This paper proposes a new framework for depth comple-tion robust against domain-shifting issues. It exploits the generalization capability of modern stereo networks to face depth completion, by processing fictitious stereo pairs obtained through a virtual pattern projection paradigm. Any stereo network or traditional stereo matcher can be seamlessly plugged into our framework, allowing for the deployment of a virtual stereo setup that is future-proof against advancement in the stereo field.
Contributions:
-
We cast depth completion as a virtual stereo correspondence problem, where two appropriately patterned virtual images enable us to face depth completion with robust stereo-matching algorithms or networks.
-
Extensive experimental results with multiple datasets and networks demonstrate that our proposal vastly outperforms state-of-the-art concerning generalization capability.
If you find this code useful in your research, please cite:
@inproceedings{bartolomei2024revisiting,
title={Revisiting depth completion from a stereo matching perspective for cross-domain generalization},
author={Bartolomei, Luca and Poggi, Matteo and Conti, Andrea and Tosi, Fabio and Mattoccia, Stefano},
booktitle={2024 International Conference on 3D Vision (3DV)},
pages={1360--1370},
year={2024},
organization={IEEE}
}
Here, you can download the weights of RAFT-Stereo architecture.
- Vanilla Models: these models are pretrained on Sceneflow vanilla images and Middlebury vanilla images
- RAFT-Stereo vanilla models (raft-stereo/sceneflow-raftstereo.tar)
- Fine-tuned Models: starting from vanilla models, these models (sceneflow-*-raftstereo.tar) are finetuned in a specific real domain.
- Models trained from scratch: these models (*-raftstereo.tar) are trained from scratch using our framework
To use these weights, please follow these steps:
- Install GDown python package:
pip install gdown
- Download all weights from our drive:
gdown --folder https://drive.google.com/drive/folders/1AZRHzCn7K7HiPQZocfxWplYHo3WhI8lm?usp=sharing
The Test section provides scripts to evaluate depth estimation models on datasets like VOID, NYU, DDAD and KITTIDC. It helps assess the accuracy of the models and saves predicted depth maps.
Please refer to each section for detailed instructions on setup and execution.
Warning:
- Please be aware that we will not be releasing the training code for deep models. The provided code focuses on evaluation and demonstration purposes only.
- With the latest updates in PyTorch, slight variations in the quantitative results compared to the numbers reported in the paper may occur.
Ensure that you have installed all the necessary dependencies.
The list of dependencies can be found in the ./requirements.txt
file.
You can also follow this script to create a virtual environment and install all the dependencies:
$ conda create -n "vppdc" python
$ conda activate vppdc
$ python -m pip install -r requirements.txt
We used two datasets for training and evaluation.
We used preprocessed NYUv2 HDF5 dataset provided by Andrea Conti.
$ cd PATH_TO_DOWNLOAD
$ wget https://github.com/andreaconti/sparsity-agnostic-depth-completion/releases/download/v0.1.0/nyu_img_gt.h5
$ wget https://github.com/andreaconti/sparsity-agnostic-depth-completion/releases/download/v0.1.0/nyu_pred_with_500.h5
After that, you will get a data structure as follows:
nyudepthv2
βββ nyu_img_gt.h5
βββ nyu_pred_with_500.h5
Note that the original full NYUv2 dataset is available at the official website.
You can download VOID dataset with different amount of sparse points (i.e., 150, 500, 1500) following this script:
$ cd PATH_TO_DOWNLOAD
$ ./download_void.sh
After that, you will get a data structure as follows:
void
βββ 150
β βββ void_150
β β βββ data
β β βββ birthplace_of_internet
| | βββ ...
| |
β βββ test_absolute_pose.txt
β βββ ...
βββ 500
β βββ void_150
β β βββ data
β β βββ birthplace_of_internet
| | βββ ...
| |
β βββ test_absolute_pose.txt
β βββ ...
...
Note that the script erases zip files. Raw VOID dataset is available at the official website.
You can download KITTIDC validation split from the official website. You can also directly download it:
$ cd PATH_TO_DOWNLOAD
$ mkdir kitti_dc
$ cd kitti_dc
$ wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_depth_selection.zip
$ unzip data_depth_selection.zip
$ rm data_depth_selection.zip
$ ln -s depth_selection data_depth_selection
After that, you will get a data structure as follows:
kitti_dc
βββ val_selection_cropped
β βββ groundtruth_depth
β βββ image
β βββ intrinsics
β βββ velodyne_raw
βββ test_depth_completion_anonymous
βββ test_depth_prediction_anonymous
First of all, please install Dataset Governance Policy (DGP) following the official guide.
Then, you can download and extract the full dataset:
$ cd PATH_TO_DOWNLOAD
$ wget https://tri-ml-public.s3.amazonaws.com/github/DDAD/datasets/DDAD.tar
$ tar -xvf DDAD.tar
After that, you will get a data structure as follows:
ddad_train_val
βββ 000000
βββ 000001
|
...
|
βββ ddad.json
βββ LICENSE.md
Finally, you can use our script to generate the validation data from the front camera:
python convert_ddad.py -i /path/to/ddad_train_val -o /your/output/path/sampled_ddad [--seed]
After that, you will get a data structure as follows:
sampled_ddad
βββ val
βββ gt
| βββ 0000000000.png
| ...
βββ hints
| βββ 0000000000.png
| ...
βββ intrinsics
| βββ 0000000000.txt
| ...
βββ rgb
βββ 0000000000.png
...
This code snippet allows you to evaluate the depth maps on various datasets, including KITTIDC, NYU, VOID and DDAD. By executing the provided script, you can assess the accuracy of depth completion models on these datasets.
To run the test.py
script with the correct arguments, follow the instructions below:
-
Run the test:
- Open a terminal or command prompt.
- Navigate to the directory containing the
test.py
script.
-
Execute the command: Run the following command, replacing the placeholders with the actual values for your images and model:
export CUDA_VISIBLE_DEVICES=0
python test.py --datapath <path_to_dataset> --dataset <dataset_type> --model <model_name> \
--loadmodel <path_to_pretrained_model> --maxdisp 192 --outdir <output_directory> \
--wsize 5 --guideperc 1 --blending 1 --interpolate --filling --leftpadding --filterlidar \
--maskocc --iscale <input_image_scale>
Replace the placeholders (<max_disparity>, <path_to_dataset>, <dataset_type>, etc.) with the actual values for your setup.
The available arguments are:
--maxdisp
: Maximum disparity range for SGM (default 256).--model
: Stereo model type. Options:raft-stereo
sgm
--datapath
: Specify the dataset path.--dataset
: Specify dataset type. Options:kittidcval
,nyudepthv2
,void
,myddad
--outdir
: Output directory to save the disparity maps.--loadmodel
: Path to the pretrained model file.--iscale
Rescale input images before apply vpp and stereo matching. Original size is restored before evaluation. Example:--iscale 1
equals full scale,--iscale 2
equals half scale.--guideperc
: Simulate depth seeds using a certain percentage of randomly sampled GT points. Valid only if raw depth seed not exists.--uniform_color
: Uniform patch strategy--wsize
: Pattern patch size (e.g., 1, 3, 5, 7, ...)--blending
: Alpha-bleding between original images and virtual pattern--maskocc
: Use occlusion handling--filterlidar
: Filter depth hints (For DDAD and KITTIDC only)--filling
: Use our proposed hints filling strategy--leftpadding
: Add a left padding to handle left border occlusions--interpolate
: Virtual projection splatting accordingly to sub-pixel value.
For more details, please refer to the test.py
script.
In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.
Synth-to-real generalization. Given an NYU Depth V2 frame and 500 sparse depth points (a), our framework with RAFT-Stereo trained only on the Sceneflow synthetic dataset (e) outperforms the generalization capability of state-of-the-art depth completion networks NLSPN (b), SpAgNet (c), and CompletionFormer (d) β all trained on the same synthetic dataset.
From indoor to outdoor. When it comes to pre-training on SceneFlow and train on indoor data then run the model outdoor, significant domain shift occurs. NLPSN and CompletionFormer seem unable to generalize to outdoor data, while SpAgNet can produce some meaningful depth maps, yet far from being accurate. Finally, VPP4DC can improve the results even further thanks to the pre-training process.
From outdoor to indoor. We consider the case complementary to the previous one β i.e., with models pre-trained on SceneFlow and trained outdoor then tested indoor. NLSPN, CompletionFormer and SpAgNet can predict a depth map that is reasonable to some extent. Our approach instead predicts very accurate results on regions covered by depth hints, yet failing where these are absent.
For questions, please send an email to luca.bartolomei5@unibo.it
We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:
- We would like to thank the authors of RAFT-Stereo for providing their code, which has been instrumental in our depth completion experiments.
We deeply appreciate the authors of the competing research papers for provision of code and model weights, which greatly aided accurate comparisons.