Skip to content

Latest commit

 

History

History
36 lines (28 loc) · 2.9 KB

gupnet.md

File metadata and controls

36 lines (28 loc) · 2.9 KB

August 2021

tl;dr: Uncertainty prediction of 3D height transfer to uncertainty of depth.

Overall impression

The multi-task learning part is quite interesting, but the depth prediction part lacks clarity and insight (it is more like a post-hoc experiment report).

The relationship between height and depth indeed can be mined, but with a sutble difference that the projected $h_{3d}$ does not necessarily match the height of the 2D bbox $h_{2d}$.

Key ideas

  • Height-guideed depth prediction
    • Uncertainty prediction of 3D height. This can be transferred to uncertainty of depth (H_3d is proportional to d given accurate prediction of H_2d and a fixed focal length). --> The height guided depth prediction is similar to MonoFlex and GS3D.
    • No direct prediction of depth but rather only a bias term. --> This bias term is actually critical as the height of the 2d bbox $h_{2d}$ is not the same as the reprojected $h_{3d}$. Alternatively the network can predict a reprojected H_3d such as RTM3D.
    • Verified in the ablation study that if we just use height to transfer to depth (GeP), the accuracy actually drops. Thus we have to add that bias term (GeU).
    • In GS3D, this bias is estimated to be 93% of the bbox height.
  • UnC: the regressed depth uncertainty $\sigma_d$ can be mapped to a value [0, 1] and used to predict 3D confidence during inference $p_{depth} = \exp(-\sigma_d)$, and $p_{3d} = p_{2d} p_{depth}$
  • HTL (hierarchical task learning) strategy
    • HTL is inspired by the motivation that each task should start training after its pre-tasks have been trained well. HTL ensures only when a task is learned good enough then start training of another task.
    • It solves the instability of the initial training phase where the estimation of 2D/3D heights will be noisy, leading to bad depth prediction.
    • DF(t) computes last K epochs to reflect the mean change trend. --> Not iterations! Iterations would be too noisy.
    • ls (learning situation) score = 1 - DF(t)/DF(K). This means if the loss change trend is similar to the first K epochs then ls score is small.
    • A task will start training only when all pretasks achieve high ls.

Technical details

  • Glossary in this paper is a bit confusing, and here is what I think they mean.
    • GeP: reprojected height to infer depth directly
    • GeU: GeP + depth bias
    • UnC: use GeU uncertainty to guide inference
  • Architecture: Used two-stage detection and used CoordConv to enhance the feature map. --> If only local yaw is predicted, this is actually not needed.
  • Laplacian prior is assumed to be able to use L1 loss as a baseline in the uncertainty loss. Gaussian prior should be used to use L2 loss as a baseline.

Notes

  • Questions and notes on how to improve/revise the current work