Maritime emergency rescue is vital for maritime safety and improving rescue systems. Unlike traditional search using manned helicopters and ships, UAVs offer flexible deployment, low cost, and rapid response, serving as a key supplement to maritime rescue. However, dynamic marine environments cause uncertainty in predicting distress target locations, challenging efficient search operations. To address this, a reinforcement learning-based path planning method is proposed. First, a UAV agent model and state-action space are constructed, and a reward function balancing search probability and exploration incentives is designed. Second, a Proximal Policy Optimization (PPO) framework is built, and the policy is trained via agent-environment interaction. Finally, simulation on a typical scenario validates the method, optimizes key parameters, and compares it with other methods. Results show that the method prioritizes high-probability target areas early, improving overall search efficiency and achieving superior path planning under location uncertainty.
State Council of the People's Republic of China . The 12th Five-Year Plan for National Marine Career Development [EB/OL]. ( 2014-09-02 )[ 2025-10-28 ]. https:∥www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm https://www.gov.cn/guoqing/2014-09-02/content_2744175_2.htm . (in Chinese)
CHENG Mingyuan . Development suggestions of maritime emergency rescue work under background of maritime power construction in China [J]. Shipping Management , 2021 , 43 ( 2 ): 14 - 15, 19 . (in Chinese)
SOLBERG K E , JENSEN J E , BARANE E , et al . Time to rescue for different paths to survival following a marine incident [J]. Journal of Marine Science and Engineering , 2020 , 8 ( 12 ): 997 .
WANG Fan . Application of drones in marine rescue operation [J]. Marine Technology , 2022 ( 5 ): 71 - 73 . (in Chinese)
LOMONACO V , TROTTA A , ZIOSI M , et al . Intelligent drone swarm for search and rescue operations at sea [EB/OL]. ( 2018-11-13 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1811.05291 https://doi.org/10.48550/arXiv.1811.05291 .
MCRAE J N , GAY C J , NIELSEN B M , et al . Using an unmanned aircraft system (drone) to conduct a complex high altitude search and rescue operation: A case study [J]. Wilderness & Environmental Medicine , 2019 , 30 ( 3 ): 287 - 290 .
MA Y , LI B , HUANG W T , et al . An improved NSGA-Ⅱ based on multi-task optimization for multi-UAV maritime search and rescue under severe weather [J]. Journal of Marine Science and Engineering , 2023 , 11 ( 4 ): 781 .
卓星宇 . 无人机山区搜寻方法研究 [D]. 广汉 : 中国民用航空飞行学院 , 2017 .
ZHUO Xingyu . The study on the mountain search method by unmanned aerial vehicles(UAV) [D]. Guanghan : Civil Aviation Flight University of China , 2017 . (in Chinese)
SUN Yisong , HU Haijun , LI Le , et al . Maritime target search path planning based on improved ant colony algorithm [J]. Transducer and Microsystem Technologies , 2024 , 43 ( 10 ): 160 - 164 . (in Chinese)
XU Haitao , CHEN Longsheng , WANG Yuxiang . Application research on improved artificial potential field method in three-dimensional path planning for UAV formation [J]. Advances in Aeronautical Science and Engineering , 2025 , 16 ( 4 ): 100 - 109 . (in Chinese)
LIU Y X , LIU H , TIAN Y L , et al . Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area [J]. Aerospace Science and Technology , 2020 , 98 : 105671 .
TAMTARE T , DUMONT D , CHAVANNE C . The Stokes drift in ocean surface drift prediction [J]. Journal of Operational Oceanography , 2022 , 15 ( 3 ): 156 - 168 .
YAN S Y , ZHANG J , PARVEJ M M , et al . Sea drift trajectory prediction based on quantum convolutional long short-term memory model [J]. Applied Sciences , 2023 , 13 ( 17 ): 9969 .
ARULKUMARAN K , DEISENROTH M P , BRUNDAGE M , et al . Deep reinforcement learning: a brief survey [J]. IEEE Signal Processing Magazine , 2017 , 34 ( 6 ): 26 - 38 .
WU C X , JU B B , WU Y , et al . UAV autonomous target search based on deep reinforcement learning in complex disaster scene [J]. IEEE Access , 2019 , 7 : 117227 - 117245 .
YANG Qingqing , GAO Yingying , GUO Yu , et al . Target search path planning for naval battle field based on deep reinforcement learning [J]. Systems Engineering and Electronics , 2022 , 44 ( 11 ): 3486 - 3495 . (in Chinese)
ZOU Liangji . Research on UAV area search planning based on reinforcement learning [D]. Wuhan : Huazhong University of Science and Technology , 2023 . (in Chinese)
WANG Lei , WEN Siying . Research on the scope and successful probability of aerial SAR [J]. Command Control & Simulation , 2023 , 45 ( 4 ): 52 - 56 . (in Chinese)
GALLEGO A J , PERTUSA A , GIL P , et al . Detection of bodies in maritime rescue operations using unmanned aerial vehicles with multispectral cameras [J]. Journal of Field Robotics , 2019 , 36 ( 4 ): 782 - 796 .
International Civil Aviation Organization . IAMSAR manual: organization and management [M]. 4th ed . Montreal : International Civil Aviation Organization , 2003 .
SHU Lisheng , LI Guifang , JI Sheng . Aircraft AI static path planning on airport ground based on reinforcement learning [J]. Advances in Aeronautical Science and Engineering , 2021 , 12 ( 3 ): 65 - 70 . (in Chinese)
SIBOO S , BHATTACHARYYA A , NAVEEN RAJ R , et al . An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving [J]. IEEE Access , 2023 , 11 : 125094 - 125108 .
KOOPMAN B O . The theory of search. Ⅰ. kinematic bases [J]. Operations Research , 1956 , 4 ( 3 ): 324 - 346 .
KOOPMAN B O . The theory of search. Ⅱ. target detection [J]. Operations Research , 1956 , 4 ( 5 ): 503 - 531 .
KOOPMAN B O . The theory of search: Ⅲ. The optimum distribution of searching effort [J]. Operations Research , 1957 , 5 ( 5 ): 613 - 626 .
SCHULMAN J , WOLSKI F , DHARIWAL P , et al . Proximal policy optimization algorithms [EB/OL]. ( 2017-08-28 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1707.06347 https://doi.org/10.48550/arXiv.1707.06347 .
SCHULMAN J , MORITZ P , LEVINE S , et al . High-dimensional continuous control using generalized advantage estimation [EB/OL]. ( 2018-10-20 )[ 2025-10-28 ]. https:∥doi.org/10.48550/arXiv.1506.02438 https://doi.org/10.48550/arXiv.1506.02438 .