Achieving Speed Accuracy Balance in Vision based 3D Occupancy Prediction via Geometric Semantic Disentanglement (AAAI 2025)

Authors: Yulin He, Wei Chen, Siqi Wang, Tianci Xun, Yusong Tan
Paper in arXiv

Framework

GSD-OCC is a fast and accurate Vision-based 3D Occupancy Prediction method, which decouples the learning of geometry and semantics by model design and learning strategy two perspectives:

For model design, we propose a dual-branch network that decouples the representation of geometry and semantics. The voxel branch utilizes a novel re-parameterized large-kernel 3D convolution to refine geometric structure efficiently, while the BEV branch employs temporal fusion and BEV encoding for efficient semantic learning.
For learning strategy, we propose to separate geometric learning from semantic learning by the mixup of ground-truth and prediction depths.

Performance

Extensive experiments on Occ3D-nuScenes benchmark demonstrate the superiority of our method, achieving 39.4 mIoU with 20.0 FPS.

Visualization

Getting Started

Installation
Prepare Dataset
Training, Eval, Visualization

Model Zoo

Model weights

Acknowledgments

This work builds on multiple great open-sourced code bases such as FB-BEV, open-mmlab, Occ3D, COTR, UniRepLKNet, OpenOccupancy, SoloFusion. Please consider citing these works as well.

Citation

If this work is helpful for your research, please consider citing the following entry.

@article{he2024real,
  title={Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement},
  author={He, Yulin and Chen, Wei and Xun, Tianci and Tan, Yusong},
  journal={arXiv preprint arXiv:2407.13155},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Achieving Speed Accuracy Balance in Vision based 3D Occupancy Prediction via Geometric Semantic Disentanglement (AAAI 2025)

Framework

Performance

Visualization

Getting Started

Model Zoo

Acknowledgments

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Achieving Speed Accuracy Balance in Vision based 3D Occupancy Prediction via Geometric Semantic Disentanglement (AAAI 2025)

Framework

Performance

Visualization

Getting Started

Model Zoo

Acknowledgments

Citation