Skip to content

Latest commit

 

History

History
42 lines (31 loc) · 2.29 KB

README.md

File metadata and controls

42 lines (31 loc) · 2.29 KB

Achieving Speed Accuracy Balance in Vision based 3D Occupancy Prediction via Geometric Semantic Disentanglement (AAAI 2025)

  • Authors: Yulin He, Wei Chen, Siqi Wang, Tianci Xun, Yusong Tan
  • Paper in arXiv

Framework

Framework GSD-OCC is a fast and accurate Vision-based 3D Occupancy Prediction method, which decouples the learning of geometry and semantics by model design and learning strategy two perspectives:

  • For model design, we propose a dual-branch network that decouples the representation of geometry and semantics. The voxel branch utilizes a novel re-parameterized large-kernel 3D convolution to refine geometric structure efficiently, while the BEV branch employs temporal fusion and BEV encoding for efficient semantic learning.
  • For learning strategy, we propose to separate geometric learning from semantic learning by the mixup of ground-truth and prediction depths.

Performance

Extensive experiments on Occ3D-nuScenes benchmark demonstrate the superiority of our method, achieving 39.4 mIoU with 20.0 FPS.

Visualization

Getting Started

Model Zoo

Acknowledgments

This work builds on multiple great open-sourced code bases such as FB-BEV, open-mmlab, Occ3D, COTR, UniRepLKNet, OpenOccupancy, SoloFusion. Please consider citing these works as well.

Citation

If this work is helpful for your research, please consider citing the following entry.

@article{he2024real,
  title={Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement},
  author={He, Yulin and Chen, Wei and Xun, Tianci and Tan, Yusong},
  journal={arXiv preprint arXiv:2407.13155},
  year={2024}
}