自动驾驶中深度学习的三维目标检测方法综述

吴一全; 蔡佳琦

doi:10.11992/tis.202504021

Chinese

您当前的位置：

首页 >

文章列表页 >

自动驾驶中深度学习的三维目标检测方法综述

综述 | 更新时间：2026-05-01

- 自动驾驶中深度学习的三维目标检测方法综述
- Deep learning-based 3D object detection for autonomous driving：a comprehensive review
- 智能系统学报 2026年21卷第2期页码：297-320
- 作者机构：
- 作者简介：
- 基金信息：
  
  国家自然科学基金项目(61573183).
- DOI：10.11992/tis.202504021
  中图分类号： TP391.41
- 网络首发：2026-03-05，
  
  纸质出版：2026
- 稿件说明：
移动端阅览
[1]吴一全,蔡佳琦.自动驾驶中深度学习的三维目标检测方法综述[J].智能系统学报,2026,21(2):297-320.

[doi:10.11992/tis.202504021]
[1]吴一全,蔡佳琦.自动驾驶中深度学习的三维目标检测方法综述[J].智能系统学报,2026,21(2):297-320. DOI： 10.11992/tis.202504021.

[doi:10.11992/tis.202504021] DOI：

摘要

自动驾驶技术的快速发展对车辆感知系统准确性和实时性的要求日益提升。三维目标检测作为车辆感知系统的核心组成部分，对于确保行车安全和提升驾驶体验至关重要。首先将三维目标检测算法按传感器所获取的数据类型分为3类：视觉算法(包括基于二维特征和三维特征的子类)、激光点云算法(涵盖网格化点云、原始点云和混合点云)、基于多传感器的算法(按照网络串行融合和并行融合的方式进行分类)。据此总结了具体算法的特点、贡献及局限性。随后，介绍了典型三维目标检测数据集及其评价指标，并比较了代表性算法在不同数据集上的性能。最后，分析了当前技术面临的挑战，并对未来发展方向进行了展望。

Abstract

The rapid advancement of autonomous driving technology has increasingly heightened the demands for the accuracy and real-time performance of vehicle perception systems. 3D Object Detection

as a core component of vehicle perception systems

is of vital importance for ensuring driving safety and enhancing the driving experience. Firstly

3D object detection algorithms are categorized into three types based on the data types acquired by sensors: Visual algorithms encompass subcategories based on 2D and 3D features; LiDAR point cloud algorithms cover grid-based point clouds

raw point clouds

and hybrid point cloud approaches; multi-sensor-based algorithms are classified based on the modes of serial and parallel fusion of the network. Accordingly

the features

contributions

and limitations of specific algorithms are summarized. Subsequently

typical 3D object detection datasets and their evaluation metrics are reviewed

and the performance of representative algorithms on different datasets is compared. Finally

the current technical challenges are analyzed

and the future development directions are prospected.

关键词

Keywords

references

百度地图, 北京交通发展研究院, 清华大学数据科学研究院交通大数据研究中心, 等. 2024年度中国城市交通报告[R]. 北京: 百度地图, 2024: 7-8. BAIDU Maps, Beijing Traffic Development Research Institute, Tsinghua University Data Science Research Institute Traffic Big Data Center, et al. 2024 Annual China Urban Traffic Report[R]. Beijing: Baidu Maps, 2024: 7-8.

段伟. 汽车自动驾驶技术简述[J]. 中国自动识别技术, 2024(2): 66-68 DUAN Wei. A brief introduction to automobile autonomous driving technology[J]. China automatic identification technology, 2024(2): 66-68

郭毅锋, 吴帝浩, 魏青民. 基于深度学习的点云三维目标检测方法综述[J]. 计算机应用研究, 2023, 40(1): 20-27 GUO Yifeng, WU Dihao, WEI Qingmin. Overview of single-sensor and multi-sensor point cloud 3D target detection methods[J]. Application research of computers, 2023, 40(1): 20-27

李佳男, 王泽, 许廷发. 基于点云数据的三维目标检测技术研究进展[J]. 光学学报, 2023, 43(15): 296-312 LI Jianan, WANG Ze, XU Tingfa. Three-Dimensional Object Detection Technology Based on Point Cloud Data[J]. Acta optica sinica, 2023, 43(15): 296-312

曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. CAO Jiale, LI Yali, SUN Hanqin, et al. A survey on deep learning based visual object detection[J]. Journal of image and graphics, 2022, 27(6): 1697-1722.

贾明达, 杨金明, 孟维亮, 等. 融合点云与图像的环境目标检测研究进展[J]. 中国图象图形学报, 2024, 29(6): 1765-1784. JIA Minda, YANG Jinming, MENG Weiliang, et al. 2024. Survey on the fusion of point clouds and images for environmental object detection[J]. Journal of image and graphics, 2024, 29(6): 1765-1784.

CUI Yaodong, CHEN Ren, CHU Wenbo, et al. Deep learning for image and point cloud fusion in autonomous driving: a review[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(2): 722-739.

陈慧娴, 吴一全, 张耀. 基于深度学习的三维点云分析方法研究进展[J]. 仪器仪表学报, 2023, 44(11): 130-158. CHEN Huixian, WU Yiquan, ZHANG Yao. Research progress on 3D point cloud analysis methods based on deep learning[J]. Chinese journal of scientific instrument, 2023, 44(11): 130-158.

周燕, 许业文, 蒲磊, 等. 自动驾驶场景下的图像三维目标检测研究进展[J]. 计算机科学, 2024, 1-18. ZHOU Yan, XU Yewen, PU Lei, et al. Research progress on image 3D target detection in autonomous driving scenarios[J]. Computer science, 2024, 1-18.

DROBNITZKY M, FRIEDERICH J, EGGER B, et al. Survey and systematization of 3D object detection models and methods[J]. The visual computer, 2024, 40(3): 1867-1913.

任柯燕, 谷美颖, 袁正谦, 等. 自动驾驶 3D 目标检测研究综述[J]. 控制与决策, 2023, 38(4): 865-889. REN Keyan, GU Meiyin, YUAN Zhengqian, et al. Review of research on 3D target detection in autonomous driving[J]. Control and decision, 2023, 38(4): 865-889.

张新宇, 徐子贤, 闫冬梅, 等. 基于深度学习的3D目标检测算法综述[J]. 控制工程, 2024, 31(3): 526-534. ZHANG Xinyu, XU Zixian, YAN Dongmei, et al. Review of 3D object detection algorithms based on deep learning[J]. Control engineering, 2024, 31(3): 526-534.

MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 7074-7082.

LI Buyu, OUYANG Wanli, Lu Sheng, et al. GS3D: an efficient 3D object detection framework for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1019-1028.

LUO Shujie, DAI Hang, SHAO Ling, et al. M3DSSD: monocular 3D single stage object detector[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 6145-6154.

QIN Zengyi, WANG Jinglu, LU Yan. Triangulation learning network: from monocular to stereo 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7615-7623.

GUO Xiaoyang, SHI Shaoshuai, WANG Xiaogang, et al. LIGA-Stereo: learning LiDAR geometry aware representations for stereo-based 3D detector[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 3153-3163.

迟旭然, 裴伟, 朱永英, 等. Fast Stereo-RCNN三维目标检测算法[J]. 小型微型计算机系统, 2022, 43(10): 2157-2161. CHI Xuran, PEI Wei, ZHU Yongying, et al. Fast Stereo-RCNN 3D target detection algorithm[J]. Mini-micro computer systems, 2022, 43(10): 2157-2161.

HEN Xiaozhi, KUNDU K, ZHU Yukun, et al. 3D object proposals using stereo imagery for accurate object class detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(5): 1259-1272.

GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1440-1448.

CHABOT F, CHAOUCH M, RABARISOA J, et al. Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2040-2049.

KUNDU A, LI Yin, REHG J M. 3D-RCNN: instance-level 3D object reconstruction via render-and-compare[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3559-3568.

KREISS S, BERTONI L, ALAHI A. PifPaf: composite fields for human pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 11977-11986.

BERTONI L, KREISS S, ALAHI A. MonoLoco: monocular 3D pedestrian localization and uncertainty estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Long Beach: IEEE, 2019: 6861-6871.

LI Peixuan, ZHAO Huaici, LIU Pengfei, et al. RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2020: 644-660.

CAI Yingjie, LI Buyu, JIAO Zeyu, et al. Monocular 3D object detection with decoupled structured polygon estimation and height-guided depth estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 10478-10485.

LIU Zongdai, ZHOU Dingfu, LU Feixiang, et al. AutoShape: real-time shape-aware monocular 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 15641-15650.

SHUAI Qingyao, ZHANG Chi, YANG Kaizhi, et al. DPF-Net: combining explicit shape priors in deformable primitive field for unsupervised structural reconstruction of 3D objects[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 14321-14329.

DUAN Fan, YU Jiahao, CHEN Li. T-CorresNet: template guided 3D point cloud completion with correspondence pooling query generation strategy[C]//European Conference on Computer Vision. Milan: Springer Nature Switzerland, 2024: 90-106.

CHEN Yongjian, TAI Lei, SUN Kai, et al. MonoPair: monocular 3D object detection using pairwise spatial relationships[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 12093-12102.

MA Xinzhu, ZHANG Yinmin, XU Dan, et al. Delving into localization errors for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4721-4730.

ZHANG Yunpeng, LU Jiwen, ZHOU Jie, et al. Objects are different: flexible monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3289-3298.

汪萌, 诸兵. 不确定性建模在2D和3D目标检测中的应用[J]. 系统工程与电子技术, 2023, 45(8): 2370-2376. WANG Meng, ZHU Bing. Application of uncertainty modeling in 2D and 3D target detection[J]. Systems engineering and electronics, 2023, 45(8): 2370-2376.

HUANG K C, WU T H, SU H T, et al. MonoDTR: monocular 3D object detection with depth-aware Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 4012-4021.

WANG Zeyu, LI Dingwen, LUO Chenxu, et al. DistillBEV: boosting multi-camera 3D object detection with cross-modal knowledge distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 8637-8646.

XU Bin, CHEN Zhenzhong. Multi-level fusion based 3D object detection from monocular images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2345-2353.

GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 270-279.

DING Mingyu, HUO Yuqi, YI Hongwei, et al. Learning depth-guided convolutions for monocular 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Virtual Conference: IEEE, 2020: 1000-1001.

PENG Liang, WU Xiaopei, YANG Zheng, et al. DID-M3D: decoupling instance depth for monocular 3D object detection[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 71-88.

LIU Yingfei, WANG Tiancai, ZHANG Xiangyu, et al. PETR: position embedding transformation for multi-view 3D object detection[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 531-548.

?BONTAR J, LECUN Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of machine learning research, 2016, 17(65): 1-32.

KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 66-75.

WANG Yan, CHAO Weilun, GARG D, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 8445-8453.

FU Huan, GONG Mingming, WANG Chaohui, et al. Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2002-2011.

CHANG Jiaren, CHEN Yongshen. Pyramid stereo matching network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5410-5418.

WANG Xinlong, YIN Wei, KONG Tao, et al. Task-aware monocular depth estimation for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12257-12264.

LI Chengyao, KU J, WASLANDER S L. Confidence guided stereo 3D object detection with split depth estimation[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Las Vegas: IEEE, 2020: 5776-5783.

HOSSAIN S, LIN Xianke. Efficient stereo depth estimation for pseudo-LiDAR: a self-supervised approach based on multi-input ResNet encoder[J]. Sensors, 2023, 23(3): 1650.

OH C, JANG Y, SHIM D, et al. Automatic pseudo-LiDAR annotation: generation of training data for 3D object detection networks[J]. IEEE access, 2024.

YANG Bin, LUO Wenjie, URTASUN R. PIXOR: real-time 3D object detection from point clouds[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7652-7660.

BELTR?N J, GUINDEL C, MORENO F M, et al. BirdNet: a 3D object detection framework from LiDAR information[C]//2018 21st International Conference on Intelligent Transportation Systems(ITSC). Miami: IEEE, 2018: 3517-3523.

MEYER G P, LADDHA A, KEE E, et al. LaserNet: an efficient probabilistic 3D object detector for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12677-12686.

DENG Jiajun, ZHOU Wengang, ZHANG Yanyong, et al. From multi-view to hollow-3D: hallucinated hollow-3D R-CNN for 3D object detection[J]. IEEE transactions on circuits and systems for video technology, 2021, 31(12): 4722-4734.

SUN Pei, WANG Weiyue, CHAI Yuning, et al. RSN: range sparse net for efficient, accurate LiDAR 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 5725-5734.

ZHOU Yin, TUZEL O. VoxelNet: end-to-end learning for point cloud based 3D object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4490-4499.

YAN Yan, MAO Yuxing, LI Bo. Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.

LANG A H, VORA S, CAESAR H, et al. Pointpillars: fast encoders for object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12697-12705.

LI Haisheng, LU Yanling. 3D object detection based on point cloud in automatic driving scene[J]. Multimedia tools and applications, 2024, 83(5): 13029-13044.

Wang Bei, An Jianping, Cao Jiayan, et al. Voxel-FPN: multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds[J]. Sensors, 2020, 20(3): 704.

LIU Zhe, ZHAO Xin, HUANG Tengteng, et al. TANet: robust 3D object detection from point clouds with triple attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 11677-11684.

CHEN Yukang, LIU Jianhui, ZHANG Xiangyu, et al. VoxelNeXt: fully sparse voxelnet for 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 21674-21683.

ZHENG Wu, TANG Weiliang, CHEN Sijin, et al. CIA-SSD: confident IOU-aware single-stage object detector from point cloud[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3555-3562.

FAN Lue, PANG Ziqi, ZHANG Tianyuan, et al. Embracing single stride 3D object detector with sparse Transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8458-8468.

HE Chenhang, LI Ruihuang, LI Shuai, et al. Voxel set Transformer: a set-to-set approach to 3D object detection from point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 8417-8427.

HE Chenhang, ZENG Hui, HUANG Jianqiang, et al. Structure aware single-stage 3D object detection from point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11873-11882.

ZHAO Tianchen, NING Xuefei, HONG Ke, et al. Ada3D: exploiting the spatial redundancy with adaptive inference for efficient 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17728-17738.

QI C R, SU Hao, MO Kaichun, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Montreal: IEEE, 2017: 652-660.

QI C R, YI Li, SU Hao, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017, 30.

YANG Zetong, SUN Yanan, LIU Shu, et al. STD: sparse-to-dense 3D object detector for point cloud[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 1951-1960.

陈熙源, 戈明明, 姚志婷, 等. 雨雪天气下的激光雷达滤波算法研究[J]. 仪器仪表学报, 2023, 44(7): 172-181. CHEN Xiyuan, GE Mingming, YAO Zhiting, et al. Filtering algorithm of LiDAR in rainy and snowy weather[J]. Chinese journal of scientific instrument, 2023, 44(7): 172-181.

TAO Manli, ZHAO Chaoyang, TANG Ming, et al. Objformer: boosting 3D object detection via instance-wise interaction[J]. Pattern Recognition, 2024, 146: 110061.

CHEN Chen, CHEN Zhe, ZHANG Jing, et al. SASA: semantics-augmented set abstraction for point-based 3D object detection[J]. Proceedings of the AAAI conference on artificial intelligence, 2022, 36(1): 221-229.

ZHANG Yifan, HU Qingyong, XU Guoquan, et al. Not all points are equal: learning highly efficient point-based detectors for 3D LiDAR point clouds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18953-18962.

王理嘉, 于欢, 刘守印. 动态环境中多帧点云融合算法及三维目标检测算法研究[J]. 计算机应用研究, 2023, 40(3): 909-913. WANG Lijia, YU Huan, LIU Shouyin. Research on multi-frame point cloud fusion and 3D target detection algorithms in dynamic environments[J]. Application research of computers, 2023, 40(3): 909-913.

ZHANG Gang, CHEN Junnan, GAO Guohuan, et al. HEDNet: a hierarchical encoder-decoder network for 3D object detection in point clouds[J]. Advances in neural information processing systems, 2024, 36.

LI Yangyan, BU Rui, SUN Mingchao, et al. PointCNN: convolution on X-transformed points[J]. Advances in neural information processing systems, 2018, 31.

YIN Tianwei, ZHOU Xingyi, KRAHENBUHL P. Center-based 3D object detection and tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11784-11793.

涂新奎, 郑少武, 于善虎, 等. 基于对称形状生成的三维目标检测网络[J]. 仪器仪表学报, 2023, 44(6): 252-263. TU Xinkui, ZHENG Shaowu, YU Shanhu, et al. 3D target detection network based on symmetric shape generation[J]. Chinese journal of scientific instrument, 2023, 44(6): 252-263.

陶乐, 王海, 蔡英凤, 等. 面向自动驾驶场景的多目标点云检测算法[J]. 汽车工程, 2024, 46(7): 1208-1218, 1238. TAO Le, WANG Hai, CAI Yingfeng, et al. Multi-object point cloud detection algorithm for autonomous driving scenarios[J]. Automotive engineering, 2024, 46(7): 1208-1218, 1238.

周昊, 齐洪钢, 邓永强, 等. 融合点云深度信息的3D目标检测与分类[J]. 中国图象图形学报, 2024, 29(8): 2399-2412. ZHOU Hao, QI Honggang, DENG Yongqiang, et al. 3D target detection and classification using fused point cloud depth information[J]. Journal of image and graphics, 2024, 29(8): 2399-2412.

SHI Weijing, RAJKUMAR R. Point-GNN: graph neural network for 3D object detection in a point cloud[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 1711-1719.

NAJIBI M, LAI Guangda, KUNDU A, et al. DOPS: learning to detect 3D objects and predict their 3D shapes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 11913-11922.

ZHANG Yanan, HUANG Di, WANG Yunhong. PC-RGNN: point cloud completion and graph neural network for 3D object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3430-3437.

LIU Zhijian, TANG Haotian, LIN Yujun, et al. Point-voxel CNN for efficient 3D deep learning[J]. Advances in neural information processing systems, 2019, 32.

NOH J, LEE S, HAM B. HVPR: hybrid voxel-point representation for single-stage 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 14605-14614.

SHI Shaoshuai, GUO Chaoxu, JIANG Li, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 10529-10538.

WU Peng, GU Lipeng, YAN Xuefeng, et al. PV-RCNN++: semantical point-voxel feature interaction for 3D object detection[J]. The visual computer, 2023, 39(6): 2425-2440.

ZHOU Wei, ZHANG Xiaodan, HAO Xin, et al. Multi point-voxel convolution(MPVConv) for deep learning on point clouds[J]. Computers & graphics, 2023, 112: 72-80.

李虎辰, 管海燕, 雷相达, 等. 基于点–体素一致性约束的城市激光雷达点云分类[J]. 中国激光, 2024, 51(13): 251-264. LI Huchen, GUAN Haiyan, LEI Xiangda, et al. Urban LiDAR point cloud classification based on point-voxel consistency constraints[J]. Chinese journal of lasers, 2024, 51(13): 251-264.

DENG Pengzhen, ZHOU Li, CHEN Jie. PVC-SSD: point-voxel dual-channel fusion with cascade point estimation for anchor-free single-stage 3D object detection[J]. IEEE sensors journal, 2024.

CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1907-1915.

KU Jason, MOZIFIAN M, LEE J, et al. Joint 3D proposal generation and object detection from view aggregation[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS). Madrid: IEEE, 2018: 1-8.

QI C R, LIU Wei, WU Chenxia, et al. Frustum pointnets for 3D object detection from RGB-D data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 918-927.

WANG Zhixin, JIA Kui. Frustum ConvNet: sliding frustums to aggregate local point-wise features for amodal 3D object detection[C]//2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. Osaka: IEEE, 2019: 1742-1749.

VORA S, LANG A H, HELOU B, et al. Pointpainting: sequential fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 4604-4612.

LIANG Ming, YANG Bin, CHEN Yun, et al. Multi-task multi-sensor fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7345-7353.

WU Xiaopei, PENG Liang, YANG Honghui, et al. Sparse fuse dense: towards high quality 3D detection with depth completion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 5418-5427.

黄漫, 黄勃, 高永彬. 引入深度补全与实例分割的三维目标检测[J]. 传感器与微系统, 2021, 40(1): 129-132. HUANG Man, HUANG Bo, GAO Yongbin. 3D target detection with depth completion and instance segmentation[J]. Sensors and microsystems, 2021, 40(1): 129-132.

XIE Yichen, XU Chenfeng, RAKOTOSAONA M J, et al. Sparsefusion: Fusing multi-modal sparse representations for multi-sensor 3D object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17591-17602.

ZHANG Yanan, CHEN Jiaxin, HUANG Di. CAT-Det: contrastively augmented Transformer for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 908-917.

LIU Zhijian, TANG Haotian, AMINI A, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation[C]//2023 IEEE International Conference on Robotics and Automation(ICRA). London: IEEE, 2023: 2774-2781.

WANG Ke, ZHOU Tianqiang, ZHANG Zhichuang, et al. PVF-DectNet: multi-modal 3D detection network based on perspective-voxel fusion[J]. Engineering applications of artificial intelligence, 2023, 120: 105951.

LI Yingwei, YU A W, MENG Tianjian, et al. DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 17182-17191.

周治国, 马文浩. 一种多层多模态融合3D目标检测方法[J]. 电子学报, 2024, 52(3): 696-708. ZHOU Zhigui, MA Wenhao. A multi-layer multi-modal fusion 3D target detection method[J]. Acta electronica sinica, 2024, 52(3): 696-708.

LIU Huaijin, DU Jixiang, ZHANG Yong, et al. PVConvNet: pixel-voxel sparse convolution for multimodal 3D object detection[J]. Pattern recognition, 2024, 149: 110284.

金宇锋, 陶重犇. 基于Transformer的融合信息增强3D目标检测算法[J]. 仪器仪表学报, 2023, 44(12): 297-306. JIN Yufeng, TAO Zhongben. Fusion-enhanced 3D target detection algorithm based on Transformer[J]. Chinese journal of scientific instrument, 2023, 44(12): 297-306.

XIA Chenxing, LI Xubing, GAO Xiuju, et al. PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion[J]. Neural computing and applications, 2024: 1-18.

王五岳, 徐召飞, 曲春燕, 等. 基于红外与激光雷达融合的鸟瞰图空间三维目标检测算法[J]. 光子学报, 2024, 53(1): 73-84. WANG Wuyue, XU Zhaofei, QU Chunyan, et al. 3D target detection algorithm in bird’s-eye view space based on infrared and LiDAR fusion[J]. Acta photonica sinica, 2024, 53(1): 73-84.

董钰婷, 官磊. 基于自适应加权融合激光雷达和相机的三维目标检测方法[J]. 计算机应用, 2024, 44(S1): 250-255. DONG Yyuting, GUAN Lei. 3D target detection method based on adaptive weighted fusion of LiDAR and camera[J]. Computer applications, 2024, 44(S1): 250-255.

李文礼, 喻飞, 石晓辉, 等. BEV特征下激光雷达和单目相机融合的目标检测算法研究[J]. 计算机工程与应用, 2024, 60(11): 182-193. LI Wenli, YU Fei, SHI Xiaohui, et al. Target detection algorithm based on BEV features for LiDAR and monocular camera fusion[J]. Computer engineering and applications, 2024, 60(11): 182-193.

NABATI R, QI Hairong. CenterFusion: center-based radar and camera fusion for 3D object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 1527-1536.

KIM Y, KIM S, CHOI J W, et al. CRAFT: camera-radar 3D object detection with spatio-contextual fusion Transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 1160-1168.

KIM Y, SHIN J, KIM S, et al. CRN: camera radar net for accurate, robust, efficient 3D perception[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2023: 17615-17626.

车俐, 吕连辉, 蒋留兵. AF-CenterNet: 基于交叉注意力机制的毫米波雷达和相机融合的目标检测[J]. 计算机应用研究, 2024, 41(4): 1258-1263. CHE Li, LYU Lianhui, JIANG Liubing. AF-CenterNet: Cross-attention mechanism-based millimeter-wave radar and camera fusion for target detection[J]. Application research of computers, 2024, 41(4): 1258-1263.

LIU Xiang, LI Zhenglin, ZHOU Yang, et al. Camera–radar fusion with modality interaction and radar Gaussian expansion for 3D object detection[J]. Cyborg and bionic systems, 2024, 5: 0079.

GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3354-3361.

CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11621-11631.

SUN Pei, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Conference: IEEE, 2020: 2446-2454.

HUANG Xinyu, WANG Peng, CHENG Xinjing, et al. The apolloscape open dataset for autonomous driving and its application[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(10): 2702-2719.

CHOI Y, KIM N, HWANG S, et al. KAIST multi-spectral day/night data set for autonomous and assisted driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(03): 934-948.

HOUSTON J, ZUIDHOF G, BERGAMINI L, et al. One thousand and one hours: Self-driving motion prediction dataset[C]//Conference on Robot Learning. Palo Alto: PMLR, 2021: 409-418.

YU Haibao, LUO Yizhen, SHU Mao, et al. DAIR-V2X: a large-scale dataset for vehicle-infrastructure cooperative 3D object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 21361-21370.

XIAO Pengchuan, SHAO Zhenlei, HAO S, et al. PandaSet: advanced sensor suite dataset for autonomous driving[C]//2021 IEEE International Intelligent Transportation Systems Conference(ITSC). Indianapolis: IEEE, 2021: 3095-3101.

PATIL A, MALLA S, GANG H, et al. The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes[C]//2019 International Conference on Robotics and Automation(ICRA). Montreal: IEEE, 2019: 9552-9557.

CONG Peishan, ZHU Xinge, QIAO Feng, et al. STCrowd: a multimodal dataset for pedestrian perception in crowded scenes[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 19608-19617.

PANG Su, MORRIS D, RADHA H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas: IEEE, 2020: 10386-10393.

WU Hai, WEN Chenglu, SHI Shaoshuai, et al. Virtual sparse convolution for multimodal 3D object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Vancouver: IEEE, 2023: 21653-21662.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于恒定转弯率和加速度模型的点云多目标跟踪算法

面向道路交通场景的高效3D目标检测

基于多通道交叉注意力融合的三维目标检测算法

基于多模态融合的三维目标检测方法研究

基于混合双分支卷积神经网络和图卷积神经网络的全色锐化方法