Sound event localization and detection network with enhanced feature expression

章东平; 符珍涛; 王杼涛; 林丽莉; 魏明

doi:10.13700/j.bh.1001-5965.2024.0019

Chinese

您当前的位置：

首页 >

文章列表页 >

Sound event localization and detection network with enhanced feature expression

更新时间：2026-04-28

- Sound event localization and detection network with enhanced feature expression
- Vol. 52, Issue 4, Pages: 1088-1095(2026)
- 作者机构：
  
  1. 中国计量大学信息工程学院,杭州,310018
  2. 浙江工商大学信息与电子工程学院,杭州,310018
  3. 杭州爱华智能科技有限公司,杭州,311121
- 作者简介：
- 基金信息：
  
  Zhejiang Key R & D Project of China (2023C01034,2023C01030,2023C01032)
- DOI：10.13700/j.bh.1001-5965.2024.0019
  CLC：
- Published：2026
- 稿件说明：
移动端阅览
章东平, 符珍涛, 王杼涛, et al. Sound event localization and detection network with enhanced feature expression[J]. 2026, 52(4): 1088-1095.
DOI：

章东平, 符珍涛, 王杼涛, et al. Sound event localization and detection network with enhanced feature expression[J]. 2026, 52(4): 1088-1095. DOI： 10.13700/j.bh.1001-5965.2024.0019.

摘要

针对传统深度学习模型难以捕捉输入特征图中的长上下文特征关联及通道与空间维度上的关键特征信息，导致声音事件定位与检测(SELD)错误率高、性能不理想的问题，基于声学场景分类和声音事件检测挑战赛中的基线模型SELDnet，提出一种基于增强特征表达能力的声音事件定位与检测网络(FE-SELDnet)。采用组归一化和SiLU激活函数来解决函数无法反向传播导致神经元死亡的问题；引入卷积块注意力模块(CBAM)来捕捉声学特征中通道与空间2个维度的重要特征，抑制不必要的特征，加强网络对特征信息的敏感性和准确性，提高信息流动；引入Transformer模块来捕获更长的语音上下文特征关联，并结合局部特征，提升模型在声音事件定位与检测任务中的精确性和鲁棒性。在TUT Sound Events数据集上的实验结果表明：FE-SELDnet与基线网络性能相比有较大的提升，错误率从0.45降低到0.326，SED评分和DOA评分分别从0.45和0.32降至0.26和0.25，F1分数提高到79.4%，验证了FE-SELDnet具有更高的优越性。

Abstract

To address the problem that traditional deep learning models are difficult to capture the long-context feature correlations in input feature maps as well as the key feature information in channel and spatial dimensions

resulting in high error rates and unsatisfactory performance in sound event localization and detection (SELD). Based on the baseline model SELDnet in the acoustic scene classification and sound event detection challenge

this paper proposes a feature enhanced sound event localization and detection network (FE-SELDnet). In order to address the issue of function failure to backpropagate

which leads to neuron death

it suggests using group normalization and the SiLU activation function; introducing the convolutional block attention module (CBAM) to capture significant features in both channel and spatial dimensions of acoustic features

suppressing superfluous features

improving network sensitivity and accuracy to feature information

and improving information flow; introducing the Transformer module to capture longer speech context feature association and combine local features to improve the accuracy and robustness of the model in sound event detection and localization tasks. The proposed FE-SELDnet significantly outperforms the original baseline network

according to experimental results on the TUT Sound Events dataset. The error rate decreased from 0.45 to 0.326

the SED and DOA scores decreased from 0.45 and 0.32 to 0.26 and 0.25

respectively

and the F1 score increased to 79.4%. The algorithm proposed in this paper has higher superiority.

关键词

Keywords

references

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Vision-based mobile positioner insertion method for pose alignment of large components

A rotated content-aware retina network for SAR ship detection

Coordinate-aware attention-based multi-frame self-supervised monocular depth estimation

MRI reconstruction based on geometry distillation and feature adaptation

Dehazing network based on residual global contextual attention and cross-layer feature fusion

Related Author

孙阳

翟雨农

杨应科

叶夏

傅伟

李东升

王梓懿

尹嘉豪

Related Institution

北京航空航天大学机械工程及自动化学院

中国商飞上海飞机制造有限公司

School of Computer Science and Technology，Ocean University of China

Qingdao Marine Information Mining and Inference Engineering Research Center，Ocean University of China

School of Information and Control Engineering，China University of Mining and Technology

AI问答

Postal code：100079
Tel：（010）53879206 Email：tmw@bjxintong.com.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备09082226号-64 京公网安备11010602201714号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰