多智能体强化学习驱动的主动声呐发射参数联合优化

生雪莉; 穆梦飞; 毕耀; 高远; 石冰玉

doi:10.11990/jheu.202507001

Chinese

您当前的位置：

首页 >

文章列表页 >

多智能体强化学习驱动的主动声呐发射参数联合优化

论文 | 更新时间：2026-03-31

- 多智能体强化学习驱动的主动声呐发射参数联合优化
- Joint optimization of transmit parameters of active sonar driven by multiagent reinforcement learning
- 哈尔滨工程大学学报 2025年46卷第8期页码：1557-1565
- 作者机构：
  
  1.哈尔滨工程大学水声技术全国重点实验室，黑龙江哈尔滨 150001
  2.极地海洋声学与技术应用教育部重点实验室(哈尔滨工程大学) 教育部，黑龙江哈尔滨 150001
  3.哈尔滨工程大学水声工程学院，黑龙江哈尔滨 150001
  4.哈尔滨工程大学三亚南海创新发展基地，海南三亚 572024
- 作者简介：
  
  [ "生雪莉, 女, 教授,博士生导师" ]
  [ "穆梦飞,男，博士研究生" ]
- 基金信息：
  
  国家重点研发计划(2022YFC2807804)
- DOI：10.11990/jheu.202507001
  中图分类号： TB566
- 收稿：2025-07-01，
  
  网络首发：2025-07-07，
  
  纸质出版：2025-08-05
- 稿件说明：
移动端阅览
生雪莉, 穆梦飞, 毕耀, 等. 多智能体强化学习驱动的主动声呐发射参数联合优化[J]. 哈尔滨工程大学学报, 2025,46(8):1557-1565.

Xueli SHENG, Mengfei MU, Yao BI, et al. Joint optimization of transmit parameters of active sonar driven by multiagent reinforcement learning[J]. Journal of Harbin Engineering University, 2025, 46(8): 1557-1565.
生雪莉, 穆梦飞, 毕耀, 等. 多智能体强化学习驱动的主动声呐发射参数联合优化[J]. 哈尔滨工程大学学报, 2025,46(8):1557-1565. DOI： 10.11990/jheu.202507001.

Xueli SHENG, Mengfei MU, Yao BI, et al. Joint optimization of transmit parameters of active sonar driven by multiagent reinforcement learning[J]. Journal of Harbin Engineering University, 2025, 46(8): 1557-1565. DOI： 10.11990/jheu.202507001.

摘要

针对传统固定发射策略的主动声呐在水声信道中面临环境适配性不足，导致探测稳定性差的问题，本文提出一种基于多智能体强化学习的主动声呐发射波形与声源级的联合优化方法。采用多智能体协作学习方法，将发射波形优化与声源级优化解耦为多个智能体任务。引入奖励塑形方法，抑制多峰信道频谱引起的奖励信号噪声，提升智能体寻优能力，并避免子脉冲频点冲突。此外，使用双深度

网络(double deep q-network)，降低智能体

值估计偏差并提升决策稳定性。在基于南海实测声速梯度重构的典型深海信道场景下进行了数值验证，结果表明:经所提算法优化后的信道适配度与回波信噪比调控准确性均优于对比算法，为构建具备环境自适应能力的智能主动声呐系统提供了一种可行的技术途径。

Abstract

Inadequate environmental adaptability of traditional fixed transmission strategies in active sonar systems leads to poor detection stability in underwater acoustic channels. To address this issue

this paper proposes a joint optimization method for active sonar transmission waveform and source level based on multiagent reinforcement learning. First

a multiagent collaborative learning approach was adopted to decouple waveform optimization and source level optimization into multiple agent tasks. Then

a reward-shaping method was introduced to suppress reward signal noise induced by multipeak channel spectra

enhancing the optimization capability of the agents while avoiding subpulse frequency conflicts. Furthermore

a double deep

-network was employed to reduce

-value estimation bias and improve decision stability. Finally

numerical validation was conducted in a typical deep-sea channel scenario reconstructed using measured sound speed gradients from the South China Sea. The results demonstrate that the proposed algorithm outperforms baseline methods in terms of both channel adaptability and echo signal-to-noise ratio control accuracy

providing a viable technical approach for constructing intelligent active sonar systems with environmental self-adaptation capabilities.

关键词

Keywords

references

GUO Qijia, XIE Kean, YE Weibin, et al. A sparse Bayesian learning method for moving target detection and reconstruction[J]. IEEE transactions on instrumentation and measurement, 2025, 74: 4505413.

兰朝凤,郑智韦,陈欢.基于复杂声传播环境的水下作战效能评估[J].哈尔滨工程大学学报,2025,46(1):166-172.

LAN Chaofeng,ZHENG Zhiwei,CHEN Huan.Method for assessing the combat effectiveness of underwater unmanned clusters based on a complex acoustic propagation environment[J].Journal of Harbin Engineering University,2025,46(1):166-172.

佟文涛,葛威,殷敬伟,等.水声单载波通信中的块稀疏均衡器[J].声学学报,2025,50(2):511-524.

TONG Wentao,GE Wei,YIN Jingwei,et al.Block-wise sparse equalizer for underwater acoustic single-carrier communication[J].Acta acustica,2025,50(2):511-524.

梁国龙,张博宇,齐滨,等.无源声呐水下多目标融合跟踪方法[J].声学学报,2024,49(3):501-512.

LIANG Guolong，ZHANG Boyu，QI Bin，et al.Underwater multitarget fusion tracking method for passive sonar[J]. Acta acustica, 2024, 49(3): 501-51.

ZHANG Yi, VENKATESAN R, DOBRE O A, et al. Efficient estimation and prediction for sparse time-varying underwater acoustic channels[J]. IEEE journal of oceanic engineering, 2020, 45(3): 1112-1125.

郑巧宁, 郑浩赐, 李茂林, 等. 采用改进支持向量机的浅海水声信道小样本估计[J]. 哈尔滨工程大学学报, 2025, 46(3): 390-400.

ZHENG Qiaoning, ZHENG Haoci, LI Maolin, et al. Shallow water acoustic channel small sample estimation using enhanced support vector machines[J]. Journal of Harbin Engineering University, 2025, 46(3): 390-400.

李昊鑫,肖长诗,元海文,等.特征降维与融合的水声目标识别方法[J].哈尔滨工程大学学报,2025,46(1):102-110.

LI Haoxin,XIAO Changshi,YUAN Haiwei,et al.Underwater acoustic target recognition method based on feature dimension reduction and fusion[J].Journal of Harbin Engineering University,2025,46(1):102-110.

HAYKIN S. Cognitive radar: a way of the future[J]. IEEE signal processing magazine, 2006, 23(1): 30-40.

LI Xiaohua, LI Yaan, CUI Lin, et al. Research of new concept sonar-cognitive sonar[J]. Journal of marine science and application, 2011, 10(4): 502-509.

LI Xiaohua, LI Yaan, LIN Guancheng, et al. Research of the principle of cognitive sonar and beamforming simulation analysis[C]//2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). Piscataway, NJ, 2011: 1-5.

CLAUSSEN T, NGUYEN V D. Real-time cognitive sonar system with target-optimized adaptive signal processing through multi-layer data fusion[C]//2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). Piscataway, NJ, 2015: 357-361.

LU Shuping, CHEN Yang, CHEN Fangxiang, et al. Cognitive continuous tracking algorithm for centralized multistatic sonar systems[C]//2021 OES China Ocean Acoustics (COA). Piscataway, NJ, 2021: 1021-1026.

HAGUE D A. Adaptive transmit waveform design for active cognitive sonar using multi-tone sinusoidal frequency modulation[J]. The journal of the acoustical society of America, 2022, 151(4): A101-A101.

PAKDEL A O, AMIRI H, RAZZAZI F. Enhanced target detection using a new cognitive sonar waveform design in shallow water[J]. Applied acoustics, 2023, 205: 109270.

ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: a brief survey[J]. IEEE signal processing magazine, 2017, 34(6): 26-38.

WANG Qiang, ZHAN Zhongli. Reinforcement learning model, algorithms and its application[C]//2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC). Piscataway, NJ, 2011: 1143-1146.

KOBER J, BAGNELL J A, PETERS J. Reinforcement learning in robotics: a survey[J]. The international journal of robotics research, 2013, 32(11): 1238-1274.

LEE J, NIYATO D, GUAN Yong liang, et al. Learning to schedule joint radar-communication with deep multi-agent reinforcement learning[J]. IEEE transactions on vehicular technology, 2022, 71(1): 406-422.

FU Yubin, MA Xiaochuan, FENG Chao, et al. Model-based optimal action selection for Dyna-Q reverberation suppression cognitive sonar[J]. EURASIP journal on advances in signal processing, 2023, 2023(1): 116.

WISNIEWSKA D M, JOHNSON M, BEEDHOLM K, et al. Acoustic gaze adjustments during active target selection in echolocating porpoises[J]. Journal of experimental biology, 2012, 215(pt 24): 4358-4373.

KAM C, KOMPELLA S, NGUYEN G D, et al. Frequency selection and relay placement for energy efficiency in underwater acoustic networks[J]. IEEE Journal of oceanic engineering, 2013, 39(2): 331-342.

WATKINS C J C H, DAYAN P. Q-learning[J]. Machine learning, 1992, 8: 279-299.

VAN H H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Phoenix, 2016: 431-442.

XI Lei, YU Lu, XU Yanchun, et al. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems[J]. IEEE transactions on sustainable energy, 2020, 11(4): 2417-2426.

浏览量

144

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于非负矩阵分解的双曲调频信号目标回波增强

基于3D-Hankel矩阵构造的混响不变量分离

基于强化学习自抗扰的气垫船进坞控制策略