Bird's Eye View (BEV) is a popular representation for processing 3D point clouds, and by its nature is fundamentally sparse. Motivated by the computational limitations of mobile robot platforms, we take a fast, high-performance BEV 3D object detector - PointPillars - and modify its backbone to maintain and exploit this input sparsity, leading to decreased runtimes. We present results on KITTI, a canonical 3D detection dataset, and Matterport-Chair, a novel Matterport3D-derived chair detection from scenes in real furnished homes, and we evaluate runtime characteristics using a desktop GPU, an embedded ML accelerator, and a robot CPU, demonstrating our method results in significant runtime decreases (2X or more) for embedded systems with only a modest decrease in detection quality. Our work represents a new approach for practitioners to optimize models for embedded systems by maintaining and exploiting input sparsity throughout their entire pipeline to reduce runtime and resource usage while preserving detection performance. All models, their weights, their experimental configurations, and the training data used is publicly available from this webpage.
[In Submission ICRA 2022 Paper PDF]
[SNN 2021 Workshop Paper PDF]
Matterport-Chair was generated using MatterportDataSampling, our utility for generating supervised object detection datasets from Matterport3D.