
- Home
高级 检索
Chinese
English



1.北京航空航天大学 航空科学与工程学院, 北京 100083
2.中国特种飞行器研究所 飞行器高速水气耦合动力学学科与技术中心, 荆门 448000
Received:28 October 2025,
Revised:2026-04-14,
Online First:20 April 2026,
移动端阅览
张楠,刘虎,田永亮,等. 基于强化学习的无人机海上搜寻路径规划[J]. 航空工程进展.
ZHANG Nan,LIU Hu,TIAN Yongliang, et al. Reinforcement learning-based path planning for UAV maritime search[J]. Advances in Aeronautical Science and Engineering.(in Chinese)
海上应急救援是保障海上活动安全的重要组成部分,也是完善当前救援体系的重要环节。相较传统以有人直升机和船舶为主的搜救方式,无人机具有部署灵活、成本低、响应速度快等优势,可作为海上救援力量的重要补充。然而,受海洋动态环境影响,遇险目标位置预测存在不确定性,对高效开展海上搜寻任务提出了挑战。为此,提出一种基于强化学习的海上搜寻路径规划方法。首先,构建无人机智能体模型和海上搜寻任务的状态—动作空间,并设计综合考虑搜寻概率与探索激励的奖励函数;其次,基于PPO强化学习算法搭建算法架构,通过智能体与环境交互实现策略训练;最后,通过典型想定案例对算法进行仿真验证,并对关键参数进行优化,同时与其他路径规划方法进行对比。结果表明:所提方法能够在搜寻初期优先覆盖高概率目标区域,提高整体的搜寻效率,从而在目标位置不确定的情况下获得更优的搜寻路径规划结果。
Maritime emergency rescue is an important component of ensuring the safety of maritime activities and represents a key aspect of improving the current rescue system. Compared with traditional search and rescue operations mainly relying on manned helicopters and ships, unmanned aerial vehicles (UAVs) offer advantages such as flexible deployment, low cost, and rapid response, and can serve as an important supplement to maritime rescue forces. However, due to the dynamic marine environment, the predicted location of distress targets is subject to uncertainty, which poses challenges for conducting efficient maritime search operations. To address this issue, a reinforcement learning-based path planning method for maritime search is proposed. First, a UAV agent model and the state-action space of the maritime search task are constructed, and a reward function that comprehensively considers search probability and exploration incentives is designed. Second, an algorithmic framework based on the Proximal Policy Optimization (PPO) reinforcement learning algorithm is established, and the policy is trained through interactions between the agent and the environment. Finally, a typical scenario is employed to conduct simulation-based verification of the proposed algorithm, optimize key parameters, and perform comparisons with other path planning methods. The results demonstrate that the proposed method can prioritize coverage of high-probability target areas in the early stage of the search, thereby improving overall search efficiency and achieving superior path planning performance under conditions of uncertain target locations.
中华人民共和国国务院 . 国家海洋事业发展“十二五”规划 [EB/OL]. ( 2014-09-02 )[ 2025-10-28 ]. https:∥www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm https://www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm .
State Council of the People's Republic of China . The 12th Five-Year Plan for National Marine Career Development [EB/OL]. ( 2014-09-02 )[ 2025-10-28 ]. https:∥www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm https://www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm . (in Chinese)
程明远 . 建设海洋强国背景下我国海上应急救援工作发展建议 [J]. 水运管理 , 2021 , 43 ( 2 ): 14 - 15, 19 .
Cheng Mingyuan . Development suggestions of maritime emergency rescue work under background of maritime power construction in China [J]. Shipping Management , 2021 , 43 ( 2 ): 14 - 15, 19 . (in Chinese)
Solberg K E , Jensen J E , Barane E , et al . Time to rescue for different paths to survival following a marine incident [J]. Journal of Marine Science and Engineering , 2020 , 8 ( 12 ): 997 .
沈练高 . 无人机在海洋救援中的应用分析 [J]. 水上安全 , 2023 ( 8 ): 1 - 3 .
Shen Liangao . Application analysis of UAV in ocean rescue [J]. Maritime Safety , 2023 ( 8 ): 1 - 3 . (in Chinese)
王帆 . 无人机在海上救援中的应用 [J]. 航海技术 , 2022 ( 5 ): 71 - 73 .
Wang Fan . Application of drones in marine rescue operation [J]. Marine Technology , 2022 ( 5 ): 71 - 73 . (in Chinese)
Lomonaco V , Trotta A , Ziosi M , et al . Intelligent drone swarm for search and rescue operations at sea [PP/OL]. V1. arXiv ( 2018-11-13 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1811.05291 https://doi.org/10.48550/arXiv.1811.05291 .
McRae J N , Gay C J , Nielsen B M , et al . Using an unmanned aircraft system (drone) to conduct a complex high altitude search and rescue operation: A case study [J]. Wilderness & Environmental Medicine , 2019 , 30 ( 3 ): 287 - 290 .
Ma Y , Li B , Huang W T , et al . An improved NSGA-II based on multi-task optimization for multi-UAV maritime search and rescue under severe weather [J]. Journal of Marine Science and Engineering , 2023 , 11 ( 4 ): 781 .
卓星宇 . 无人机山区搜寻方法研究 [D]. 广汉 : 中国民用航空飞行学院 , 2017 .
Zhuo Xingyu . The study on the mountain search method by unmanned aerial vehicles(UAV) [D]. Guanghan : Civil Aviation Flight University of China , 2017 . (in Chinese)
孙艺松 , 胡海军 , 李乐 , 等 . 基于改进蚁群算法的海上目标搜索路径规划 [J]. 传感器与微系统 , 2024 , 43 ( 10 ): 160 - 164 .
Sun Yisong , Hu Haijun , Li Le , et al . Maritime target search path planning based on improved ant colony algorithm [J]. Transducer and Microsystem Technologies , 2024 , 43 ( 10 ): 160 - 164 . (in Chinese)
许海涛 , 陈龙胜 , 王宇翔 . 改进势场法在无人机编队三维路径规划上的应用研究 [J]. 航空工程进展 , 2025 , 16 ( 4 ): 100 - 109 .
Xu Haitao , Chen Longsheng , Wang Yuxiang . Application research on improved artificial potential field method in three-dimensional path planning for UAV formation [J]. Advances in Aeronautical Science and Engineering , 2025 , 16 ( 4 ): 100 - 109 . (in Chinese)
Liu Y X , Liu H , Tian Y L , et al . Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area [J]. Aerospace Science and Technology , 2020 , 98 : 105671 .
Tamtare T , Dumont D , Chavanne C . The Stokes drift in ocean surface drift prediction [J]. Journal of Operational Oceanography , 2022 , 15 ( 3 ): 156 - 168 .
Yan S Y , Zhang J , Parvej M M , et al . Sea drift trajectory prediction based on quantum convolutional long short-term memory model [J]. Applied Sciences , 2023 , 13 ( 17 ): 9969 .
Arulkumaran K , Deisenroth M P , Brundage M , et al . Deep reinforcement learning: a brief survey [J]. IEEE Signal Processing Magazine , 2017 , 34 ( 6 ): 26 - 38 .
Wu C X , Ju B B , Wu Y , et al . UAV autonomous target search based on deep reinforcement learning in complex disaster scene [J]. IEEE Access , 2019 , 7 : 117227 - 117245 .
杨清清 , 高盈盈 , 郭玙 , 等 . 基于深度强化学习的海战场目标搜寻路径规划 [J]. 系统工程与电子技术 , 2022 , 44 ( 11 ): 3486 - 3495 .
Yang Qingqing , Gao Yingying , Guo Yu , et al . Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics , 2022 , 44 ( 11 ): 3486 - 3495 . (in Chinese)
邹良骥 . 基于强化学习的无人机协同区域搜索规划研究 [D]. 武汉 : 华中科技大学 , 2023 .
Zou Liangji . Research on UAV area search planning based on reinforcement learning [D]. Wuhan : Huazhong University of Science and Technology , 2023 . (in Chinese)
王磊 , 问斯莹 . 航空搜救范围与成功概率研究 [J]. 指挥控制与仿真 , 2023 , 45 ( 4 ): 52 - 56 .
Wang Lei , Wen Siying . Research on the scope and successful probability of aerial SAR [J]. Command Control & Simulation , 2023 , 45 ( 4 ): 52 - 56 . (in Chinese)
Gallego A J , Pertusa A , Gil P , et al . Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras [J]. Journal of Field Robotics , 2019 , 36 ( 4 ): 782 - 796 .
International Civil Aviation Organization . IAMSAR manual: organization and management [M]. 4th ed . Montreal : International Civil Aviation Organization , 2003 .
疏利生 , 李桂芳 , 嵇胜 . 基于强化学习的航空器机场智能静态路径规划 [J]. 航空工程进展 , 2021 , 12 ( 3 ): 65 - 70 .
Shu Lisheng , Li Guifang , Ji Sheng . Aircraft AI static path planning on airport ground based on reinforcement learning [J]. Advances in Aeronautical Science and Engineering , 2021 , 12 ( 3 ): 65 - 70 . (in Chinese)
Siboo S , Bhattacharyya A , Naveen Raj R , et al . An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving [J]. IEEE Access , 2023 , 11 : 125094 - 125108 .
Koopman B O . The theory of search. I. kinematic bases [J]. Operations Research , 1956 , 4 ( 3 ): 324 - 346 .
Koopman B O . The theory of search. II. target detection [J]. Operations Research , 1956 , 4 ( 5 ): 503 - 531 .
Koopman B O . The theory of search: III. The optimum distribution of searching effort [J]. Operations Research , 1957 , 5 ( 5 ): 613 - 626 .
Schulman J , Wolski F , Dhariwal P , et al . Proximal policy optimization algorithms [PP/OL]. V2. arXiv ( 2017-08-28 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 .
Schulman J , Moritz P , Levine S , et al . High-dimensional continuous control using generalized advantage estimation [PP/OL]. V6. arXiv ( 2018-10-20 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1506.02438 https://doi.org/10.48550/arXiv.1506.02438 .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010602201714号