Development of AI Algorithms for Real-Time Environmental Perception in Autonomous Vehicles

Authors

Real-time environmental perception is critical for ensuring safe autonomous driving. State-of-the-art real-time environmental perception algorithms in autonomous vehicles utilize 3D LiDAR point clouds, camera images, and radar as sensor inputs. However, most of the existing real-time environmental perception algorithms rely on multi-sensor fusion or multi-task networks and cannot fully exploit the complementarities of different sensors for different tasks, resulting in low efficiency and high cost. Worse still, the development of real-time environmental perception algorithms is becoming more challenging due to the high requirements of algorithm execution efficiency, complex real-world scenarios, and sensor hardware lifespan. To address the above challenges, we developed a Multiple-input Coordination Neural Network (MC-NN) algorithm for real-time 3D environmental perception and dynamic object detection. MC-NN not only reduces the multi-sensor fusion or multi-task network to a single task (i.e., multiple-input coordination) but also simplifies the 3D input data preprocessing and 3D input data clustering, which is noise-robust.

Moreover, MC-NN adopts heterogeneous neural networks for dealing with heterogeneous LiDAR-camera-radar sensor input data. For the first time, we propose an anchor-based radar-to-camera object relationship for refining radar detections for the initialization of camera object detection and association networks. The proposal provides a framework for other researchers to exploit the relationship between radar and camera. By incorporating the sensor input sample-level annotation budget and real-time detection demand, we design a sample-level fusion and algorithm-level decision strategy to boost the utilization of the limited sample-level annotation budget. In summary, the MC-NN algorithm is applicable for real-time environmental perception in an autonomous vehicle due to its high accuracy, robustness, and hardware execution efficiency. At the algorithm level, MC-NN realizes a better balance among model complexity, model accuracy, and execution efficiency. Meanwhile, MC-NN reduces the sample-level annotation deployment difficulty and fully exploits the complementarities of LiDAR, radar, and monocular cameras to promote the real-time performance of dynamic object detection. Due to the successful deployment of MC-NN in RoboTaxi, the autonomous vehicle is unmanned from sensor installation to real-world road tests, and the RoboTaxi series has experienced hundreds of operation days and tens of thousands of pick-ups and drop-offs.

The large-scale statistical experimental results based on the public datasets demonstrate that the MC-NN algorithm yields accuracy superior to state-of-the-art dynamic object detectors and multi-sensor fusion detectors under different evaluation metrics. The algorithm structure features comparison and cat detection result visualization indicate the effectiveness of the multiple-input coordination design, noise-robust 3D input data preprocessing, 3D input data clustering, sample-level fusion strategy, radar detections refinement, initialization of camera object detection and association task, sample-level decision strategy, and GPU hardware execution efficiency promotion.

Most read articles by the same author(s)