Reinforcement learning-based path planning for UAV maritime search

ZHANG Nan; LIU Hu; TIAN Yongliang; JIANG Jiawei; SHEN Beining; LU Xun; WANG Zhiyao

doi:10.16615/j.cnki.1674-8190.2026.03.19

Chinese

您当前的位置：

首页 >

文章列表页 >

Reinforcement learning-based path planning for UAV maritime search

Special Column of Aviation Emergency Rescue Technology | 更新时间：2026-06-16

- Reinforcement learning-based path planning for UAV maritime search
- Advances in Aeronautical Science and Engineering Vol. 17, Issue 3, Pages: 188-199(2026)
- 作者机构：
  
  1.北京航空航天大学航空科学与工程学院，北京 100191
  2.中国特种飞行器研究所飞行器高速水气耦合动力学学科与技术中心，荆门 448035
- 作者简介：
- 基金信息：
- DOI：10.16615/j.cnki.1674-8190.2026.03.19
  CLC： V279;TP18;U676.8⁺3
- Received：28 October 2025，
  
  Revised：2026-04-14，
  
  Published：28 June 2026
- 稿件说明：
移动端阅览
张楠, 刘虎, 田永亮, 等. 基于强化学习的无人机海上搜寻路径规划[J]. 航空工程进展, 2026, 17(3): 188-199.

ZHANG Nan, LIU Hu, TIAN Yongliang, et al. Reinforcement learning-based path planning for UAV maritime search[J]. Advances in Aeronautical Science and Engineering, 2026, 17(3): 188-199.(in Chinese)
张楠, 刘虎, 田永亮, 等. 基于强化学习的无人机海上搜寻路径规划[J]. 航空工程进展, 2026, 17(3): 188-199. DOI： 10.16615/j.cnki.1674-8190.2026.03.19.

ZHANG Nan, LIU Hu, TIAN Yongliang, et al. Reinforcement learning-based path planning for UAV maritime search[J]. Advances in Aeronautical Science and Engineering, 2026, 17(3): 188-199.(in Chinese) DOI： 10.16615/j.cnki.1674-8190.2026.03.19.

摘要

海上应急救援是保障海上活动安全的重要组成部分，也是完善当前救援体系的重要环节。相较传统以有人直升机和船舶为主的搜救方式，无人机具有部署灵活、成本低、响应速度快等优势，可作为海上救援力量的重要补充。然而，受海洋动态环境影响，遇险目标位置预测存在不确定性，这对高效开展海上搜寻任务提出了挑战。为此，提出一种基于强化学习的海上搜寻路径规划方法。首先，构建无人机智能体模型和海上搜寻任务的状态—动作空间，并设计综合考虑搜寻概率与探索激励的奖励函数；其次，基于PPO强化学习算法搭建算法架构，通过智能体与环境交互实现策略训练；最后，通过典型想定案例对算法进行仿真验证，并对关键参数进行优化，同时与其他路径规划方法进行对比。结果表明：所提方法能够在搜寻初期优先覆盖高概率目标区域，提高整体的搜寻效率，从而在目标位置不确定的情况下获得更优的搜寻路径规划结果。

Abstract

Maritime emergency rescue is vital for maritime safety and improving rescue systems. Unlike traditional search using manned helicopters and ships， UAVs offer flexible deployment， low cost， and rapid response， serving as a key supplement to maritime rescue. However， dynamic marine environments cause uncertainty in predicting distress target locations， challenging efficient search operations. To address this， a reinforcement learning-based path planning method is proposed. First， a UAV agent model and state-action space are constructed， and a reward function balancing search probability and exploration incentives is designed. Second， a Proximal Policy Optimization （PPO） framework is built， and the policy is trained via agent-environment interaction. Finally， simulation on a typical scenario validates the method， optimizes key parameters， and compares it with other methods. Results show that the method prioritizes high-probability target areas early， improving overall search efficiency and achieving superior path planning under location uncertainty.

关键词

Keywords

references

中华人民共和国国务院 . 国家海洋事业发展“十二五”规划［EB/OL］. （ 2014-09-02 ）［ 2025-10-28 ］. https：∥www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm https://www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm .

State Council of the People's Republic of China . The 12th Five-Year Plan for National Marine Career Development ［EB/OL］. （ 2014-09-02 ）［ 2025-10-28 ］. https：∥www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm https://www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm . （in Chinese）

程明远 . 建设海洋强国背景下我国海上应急救援工作发展建议［J］. 水运管理， 2021 ， 43 （ 2 ）： 14 - 15， 19 .

CHENG Mingyuan . Development suggestions of maritime emergency rescue work under background of maritime power construction in China ［J］. Shipping Management ， 2021 ， 43 （ 2 ）： 14 - 15， 19 . （in Chinese）

SOLBERG K E ， JENSEN J E ， BARANE E ， et al . Time to rescue for different paths to survival following a marine incident ［J］. Journal of Marine Science and Engineering ， 2020 ， 8 （ 12 ）： 997 .

沈练高 . 无人机在海洋救援中的应用分析［J］. 水上安全， 2023 （ 8 ）： 1 - 3 .

SHEN Liangao . Application analysis of UAV in ocean rescue ［J］. Maritime Safety ， 2023 （ 8 ）： 1 - 3 . （in Chinese）

王帆 . 无人机在海上救援中的应用［J］. 航海技术， 2022 （ 5 ）： 71 - 73 .

WANG Fan . Application of drones in marine rescue operation ［J］. Marine Technology ， 2022 （ 5 ）： 71 - 73 . （in Chinese）

LOMONACO V ， TROTTA A ， ZIOSI M ， et al . Intelligent drone swarm for search and rescue operations at sea ［EB/OL］. （ 2018-11-13 ）［ 2025-10-28 ］. https：∥doi.org/10.48550/arXiv.1811.05291 https://doi.org/10.48550/arXiv.1811.05291 .

MCRAE J N ， GAY C J ， NIELSEN B M ， et al . Using an unmanned aircraft system （drone） to conduct a complex high altitude search and rescue operation： A case study ［J］. Wilderness & Environmental Medicine ， 2019 ， 30 （ 3 ）： 287 - 290 .

MA Y ， LI B ， HUANG W T ， et al . An improved NSGA-Ⅱ based on multi-task optimization for multi-UAV maritime search and rescue under severe weather ［J］. Journal of Marine Science and Engineering ， 2023 ， 11 （ 4 ）： 781 .

卓星宇 . 无人机山区搜寻方法研究［D］. 广汉：中国民用航空飞行学院， 2017 .

ZHUO Xingyu . The study on the mountain search method by unmanned aerial vehicles（UAV）［D］. Guanghan ： Civil Aviation Flight University of China ， 2017 . （in Chinese）

孙艺松，胡海军，李乐，等 . 基于改进蚁群算法的海上目标搜索路径规划［J］. 传感器与微系统， 2024 ， 43 （ 10 ）： 160 - 164 .

SUN Yisong ， HU Haijun ， LI Le ， et al . Maritime target search path planning based on improved ant colony algorithm ［J］. Transducer and Microsystem Technologies ， 2024 ， 43 （ 10 ）： 160 - 164 . （in Chinese）

许海涛，陈龙胜，王宇翔 . 改进势场法在无人机编队三维路径规划上的应用研究［J］. 航空工程进展， 2025 ， 16 （ 4 ）： 100 - 109 .

XU Haitao ， CHEN Longsheng ， WANG Yuxiang . Application research on improved artificial potential field method in three-dimensional path planning for UAV formation ［J］. Advances in Aeronautical Science and Engineering ， 2025 ， 16 （ 4 ）： 100 - 109 . （in Chinese）

LIU Y X ， LIU H ， TIAN Y L ， et al . Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area ［J］. Aerospace Science and Technology ， 2020 ， 98 ： 105671 .

TAMTARE T ， DUMONT D ， CHAVANNE C . The Stokes drift in ocean surface drift prediction ［J］. Journal of Operational Oceanography ， 2022 ， 15 （ 3 ）： 156 - 168 .

YAN S Y ， ZHANG J ， PARVEJ M M ， et al . Sea drift trajectory prediction based on quantum convolutional long short-term memory model ［J］. Applied Sciences ， 2023 ， 13 （ 17 ）： 9969 .

ARULKUMARAN K ， DEISENROTH M P ， BRUNDAGE M ， et al . Deep reinforcement learning： a brief survey ［J］. IEEE Signal Processing Magazine ， 2017 ， 34 （ 6 ）： 26 - 38 .

WU C X ， JU B B ， WU Y ， et al . UAV autonomous target search based on deep reinforcement learning in complex disaster scene ［J］. IEEE Access ， 2019 ， 7 ： 117227 - 117245 .

杨清清，高盈盈，郭玙，等 . 基于深度强化学习的海战场目标搜寻路径规划［J］. 系统工程与电子技术， 2022 ， 44 （ 11 ）： 3486 - 3495 .

YANG Qingqing ， GAO Yingying ， GUO Yu ， et al . Target search path planning for naval battle field based on deep reinforcement learning ［J］. Systems Engineering and Electronics ， 2022 ， 44 （ 11 ）： 3486 - 3495 . （in Chinese）

邹良骥 . 基于强化学习的无人机协同区域搜索规划研究［D］. 武汉：华中科技大学， 2023 .

ZOU Liangji . Research on UAV area search planning based on reinforcement learning ［D］. Wuhan ： Huazhong University of Science and Technology ， 2023 . （in Chinese）

王磊，问斯莹 . 航空搜救范围与成功概率研究［J］. 指挥控制与仿真， 2023 ， 45 （ 4 ）： 52 - 56 .

WANG Lei ， WEN Siying . Research on the scope and successful probability of aerial SAR ［J］. Command Control & Simulation ， 2023 ， 45 （ 4 ）： 52 - 56 . （in Chinese）

GALLEGO A J ， PERTUSA A ， GIL P ， et al . Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras ［J］. Journal of Field Robotics ， 2019 ， 36 （ 4 ）： 782 - 796 .

International Civil Aviation Organization . IAMSAR manual： organization and management ［M］. 4th ed . Montreal ： International Civil Aviation Organization ， 2003 .

疏利生，李桂芳，嵇胜 . 基于强化学习的航空器机场智能静态路径规划［J］. 航空工程进展， 2021 ， 12 （ 3 ）： 65 - 70 .

SHU Lisheng ， LI Guifang ， JI Sheng . Aircraft AI static path planning on airport ground based on reinforcement learning ［J］. Advances in Aeronautical Science and Engineering ， 2021 ， 12 （ 3 ）： 65 - 70 . （in Chinese）

SIBOO S ， BHATTACHARYYA A ， NAVEEN RAJ R ， et al . An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving ［J］. IEEE Access ， 2023 ， 11 ： 125094 - 125108 .

KOOPMAN B O . The theory of search. Ⅰ. kinematic bases ［J］. Operations Research ， 1956 ， 4 （ 3 ）： 324 - 346 .

KOOPMAN B O . The theory of search. Ⅱ. target detection ［J］. Operations Research ， 1956 ， 4 （ 5 ）： 503 - 531 .

KOOPMAN B O . The theory of search： Ⅲ. The optimum distribution of searching effort ［J］. Operations Research ， 1957 ， 5 （ 5 ）： 613 - 626 .

SCHULMAN J ， WOLSKI F ， DHARIWAL P ， et al . Proximal policy optimization algorithms ［EB/OL］. （ 2017-08-28 ）［ 2025-10-28 ］. https：∥doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 .

SCHULMAN J ， MORITZ P ， LEVINE S ， et al . High-dimensional continuous control using generalized advantage estimation ［EB/OL］. （ 2018-10-20 ）［ 2025-10-28 ］. https：∥doi.org/10.48550/arXiv.1506.02438 https://doi.org/10.48550/arXiv.1506.02438 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Application research on improved artificial potential field method in three-dimensional path planning for UAV formation

Research on UAV path planning algorithm based on improved virtual spring method

Path planning algorithm and its simulation validation based on vector field

One-on-one air combat control method based on situation assessment and DDPG algorithm

Intelligent flight control test of unmanned aerial vehicle based on deep reinforcement learning

Related Author

WANG Zhiyao

SHEN Beining

JIANG Jiawei

LIU Hu

ZHANG Nan

LU Xun

WANG Yuxiang

CHEN Longsheng

Related Institution

College of Aircraft Engineering， Nanchang Hangkong University

College of Systems Engineering， National University of Defense Technology

Faculty of Civil Aviation and Aeronautics， Kunming University of Science and Technology

National Key Laboratory of Rotorcraft Aeromechanics， Nanjing University of Aeronautics and Astronautics

Laboratory of Artificial Intelligence and Graphic Images， AVIC Xi’an Aeronautics Computing Technique Research Institute

AI问答

Postal code：100079
Tel：（010）53879206 Email：tmw@bjxintong.com.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备09082226号-64 京公网安备11010602201714号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰