SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames
[Paper]
An important typo: In the current version of the manuscript, there is a description ``to verify the effectiveness of the proposed SSS metric, we also implement a naive multi-frame VPR model by clustering the top-K candidates of multiple queries''. Here, "top-K" should be "top-10". Sorry for this typo.
SF-Loc is vision-centered mapping and localization system, based on the map representation of visual structure frames. We use multi-sensor dense bundle adjustment (MS-DBA) to generate the visual structure frames and sparsify them through co-visibility checking, leading to lightweight map storage. On this basis, multi-frame information is utilized to achieve high-recall, accurate user-side map-based localization.
- Mapping Pipeline (deadline: 2024/12)
- Localization Pipeline (deadline: 2025/01)
The pipeline of the work is based on python, and the computation part is mainly based on Pytorch (with CUDA) and GTSAM.
Use the following command to set up the python environment.
conda create -n sfloc python=3.10.11
conda activate sfloc
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
pip install gdown tqdm numpy==1.25.0 numpy-quaternion==2022.4.3 opencv-python==4.7.0.72 scipy pyparsing matplotlib h5py
As for GTSAM, we make some modifications to it to extend the python wrapper APIs, clone it from the following repository and install it under your python environment.
git clone https://github.com/yuxuanzhou97/gtsam.git
cd gtsam
mkdir build
cd build
cmake .. -DGTSAM_BUILD_PYTHON=1 -DGTSAM_PYTHON_VERSION=3.10.11
make python-install
Finally, run the following command to compile the project.
git clone --recurse-submodules https://github.com/GREAT-WHU/SF-Loc.git
cd SF-Loc
python setup.py install
We use the weights of DROID-SLAM trained on TartanAir for optical flow estimation and DBA. Download droid.pth and put it in this project.
In the mapping phase, multi-sensor data are used for dense bundle adjustment (DBA) to recover image depths and poses. Based on the global optimization results, the lightweight structure frame map is generated.
1.1 Download the WHU1023 data sequence.
1.2 Specify the data paths in launch_dba.py, then run the following command
python launch_dba.py # This would trigger demo_vio_WHU1023.py automatically.
This will launch a task for online multi-sensor DBA. Generally, 1x real-time performance is expected on a 4080 laptop, which would cost around 90 minutes to process the provided data sequence. After finished, the following files would be generated.
- poses_realtime.txt IMU poses (both in world frame and ECEF frame) estimated by online multi-sensor DBA.
- graph.pkl Serialized GTSAM factors that store the multi-sensor DBA information.
- depth_video.pkl Dense depths estimated by DBA.
1.3 Run the following command for global factor optimization (post-processing). This wouldn't cost a long time.
python sf-loc/post_optimization.py --graph results/graph.pkl --result_file results/poses_post.txt
The following file would be generated.
- poses_post.txt Estimated IMU poses after global optimization.
1.4 Run the following command to sparsify the keyframe map.
python sf-loc/sparsify_map.py --imagedir $DATASET/image_undist/cam0 --imagestamp $DATASET/stamp.txt --depth_video results/depth_video.pkl --poses_post results/poses_post.txt --calib calib/1023.txt --map_indices results/map_indices.pkl
The following file would be generated.
- map_indices.pkl Map frame indices (and timestamps), indicating a subset of all DBA keyframes.
1.5 Run the following command to (finally) generate the lightweight structure frame map.
python sf-loc/generate_sf_map.py --imagedir $DATASET/image_undist/cam0 --imagestamp $DATASET/stamp.txt --depth_video results/depth_video.pkl --poses_post results/poses_post.txt --calib calib/1023.txt --map_indices results/map_indices.pkl --map_file sf_map.pkl
The following file would be generated.
- sf_map.pkl: The structure frame map, which is all you need for re-localization.
In this step, the scripts provided by VPR-methods-evaluation would be called. Thanks for Gmberton's great work, which provides convenient interface for different VPR methods.
🎇So far, a lightweight map file (≈ 50MB) of the region is generated. To evaluate the global pose estimation performance, run the following command
python scripts/evaluate_map_poses.py
TBD (DDL: 2025/01)
DBA-Fusion is developed by GREAT (GNSS+ REsearch, Application and Teaching) Group, School of Geodesy and Geomatics, Wuhan University.
This work is based on DROID-SLAM, VPR-methods-evaluation and GTSAM.