基于参数空间定向对抗扰动的后门检测与防御方法

田有亮; 金昆龙; 石璐嘉; 王帅; 左建烁; 向阿新

doi:10.11959/j.issn.1000-436x.2026076

Chinese

您当前的位置：

首页 >

文章列表页 >

基于参数空间定向对抗扰动的后门检测与防御方法

学术论文 | 更新时间：2026-05-07

- 基于参数空间定向对抗扰动的后门检测与防御方法
- Backdoor detection and defense method via parameter-space targeted adversarial perturbations
- 通信学报 2026年47卷第4期页码：163-180
- 作者机构：
  
  1.贵州大学计算机科学与技术学院，贵州贵阳 550025
  2.贵州省密码学与区块链技术特色重点实验室，贵州贵阳 550025
  3.贵州大学大数据与信息工程学院，贵州贵阳 550025
- 作者简介：
  
  [ "田有亮（1982- ），男，贵州盘州人，博士，贵州大学教授，主要研究方向为博弈论、密码学与安全协议。" ]
  [ "金昆龙（2000- ），男，贵州盘州人，贵州大学硕士生，主要研究方向为隐私保护、联邦学习、后门攻击。" ]
  [ "石璐嘉（2000- ），女，重庆人，贵州大学硕士生，主要研究方向为人工智能、模型水印等。" ]
  [ "王帅（2000- ），男，贵州贵阳人，贵州大学博士生，主要研究方向为隐私保护、联邦学习、安全聚合等。" ]
  [ "左建烁（2002- ），男，河南新乡人，贵州大学硕士生，主要研究方向为人工智能、图像取证等。" ]
  [ "向阿新（1996- ），男，贵州遵义人，博士，贵州大学副教授，主要研究方向为区块链、密码学、密钥管理等。" ]
- 基金信息：
  
  国家重点研发计划基金资助项目(2025YFB3109800);国家自然科学基金资助项目(62272123);贵州省科技创新平台科研基金资助项目(CXPTXM[2025]024);贵州省科技计划基金资助项目（No.[2020]5017, No.[2022]065）
- DOI：10.11959/j.issn.1000-436x.2026076
  中图分类号： TN92
- 收稿：2026-01-30，
  
  修回：2026-03-12，
  
  录用：2026-03-13，
  
  纸质出版：2026-04-20
- 稿件说明：
移动端阅览
田有亮,金昆龙,石璐嘉等.基于参数空间定向对抗扰动的后门检测与防御方法[J].通信学报,2026,47(04):163-180.

Tian Youliang,Jin Kunlong,Shi Lujia,et al.Backdoor detection and defense method via parameter-space targeted adversarial perturbations[J].Journal on Communications,2026,47(04):163-180.
田有亮,金昆龙,石璐嘉等.基于参数空间定向对抗扰动的后门检测与防御方法[J].通信学报,2026,47(04):163-180. DOI： 10.11959/j.issn.1000-436x.2026076.

Tian Youliang,Jin Kunlong,Shi Lujia,et al.Backdoor detection and defense method via parameter-space targeted adversarial perturbations[J].Journal on Communications,2026,47(04):163-180. DOI： 10.11959/j.issn.1000-436x.2026076.

摘要

为解决现有后门防御方法对显著且可分后门特征的依赖，以及触发器反演开销较高的问题，提出了参数空间定向对抗扰动框架PTAP。该框架在参数空间内针对各候选目标类别，求解达到预设成功率所需的最小参数扰动幅度，并以该幅度作为后门异常检测的统计量，避免高开销的触发反演过程并提升检测性能。此外，PTAP利用参数扰动指向的异常敏感方向来指导轻量级微调，在尽量保持主任务性能的同时削弱后门效应，并面向第三方模型场景实现检测与修复的一体化流程。在涵盖输入空间、特征空间和动态触发设置的11种后门攻击上的实验表明，PTAP对后门目标的检测置信度超过99%，显著降低了检测开销，并在各种攻击类型中保持稳定的性能。

Abstract

To address the reliance of existing backdoor defenses on salient and separable backdoor features

as well as their high trigger inversion cost

a parameter-space targeted adversarial perturbation (PTAP) framework was proposed. For each candidate target class in the parameter space

the minimum parameter perturbation was required to achieve a predefined success rate

and this quantity was used as the test statistic for backdoor anomaly detection

thereby avoiding costly trigger inversion and improving detection performance. Moreover

PTAP exploited the abnormally sensitive directions revealed by parameter perturbations to guide lightweight fine-tuning

thereby mitigating backdoor effects while largely preserving primary task performance and enabling an integrated detection-and-repair pipeline for third-party model scenarios. Experiments on eleven backdoor attacks covering input-space

feature-space

and dynamic-trigger settings show that PTAP achieves over 99% detection confidence for backdoor targets

significantly reduces detection overhead

and maintains stable performance across diverse attack types.

关键词

Keywords

references

Deng J , Dong W , Socher R , et al . ImageNet: a large-scale hierarchical image database [C ] // Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE Press , 2009 : 248 - 255 .

Parkhi O M , Vedaldi A , Zisserman A . Deep face recognition [C ] // Proceedings of the British Machine Vision Conference 2015 . Piscataway : IEEE Press , 2015 : 1 - 12 .

Yurtsever E , Lambert J , Carballo A , et al . A survey of autonomous driving: common practices and emerging technologies [J ] . IEEE Access , 2020 , 8 : 58443 - 58469 .

Ribeiro M , Grolinger K , Capretz M A M . MLaaS: machine learning as a service [C ] // Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) . Piscataway : IEEE Press , 2015 : 896 - 902 .

Jia Y Q , Shelhamer E , Donahue J , et al . Caffe: convolutional architecture for fast feature embedding [C ] // Proceedings of the 22nd ACM International Conference on Multimedia . New York : ACM Press , 2014 : 675 - 678 .

Koh J Y . Model zoo: discover open source deep learning code and pretrained models [EB ] . ( 2018-06-14 )[ 2026-01-30 ] .

Chen X Y , Liu C , Li B , et al . Targeted backdoor attacks on deep learning systems using data poisoning [PP ] . V1 . ( 2017-12-15 )[ 2026-01-30 ] . arXiv: arXiv. 1712 . 05526 .

Voth D , Dane L , Grebe J , et al . Effective backdoor learning on open-set face recognition systems [C ] // Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE Press , 2025 : 1027 - 1039 .

Pourkeshavarz M , Sabokrou M , Rasouli A . Adversarial backdoor attack by naturalistic data poisoning on trajectory prediction in autonomous driving [C ] // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2024 : 14885 - 14894 .

Wang B L , Yao Y S , Shan S , et al . Neural cleanse: identifying and mitigating backdoor attacks in neural networks [C ] // Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP) . Piscataway : IEEE Press , 2019 : 707 - 723 .

Guo W B , Wang L , Xu Y , et al . Towards inspecting and eliminating Trojan backdoors in deep neural networks [C ] // Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM) . Piscataway : IEEE Press , 2020 : 162 - 171 .

Xu X Y , Ersoy O , Tajalli B , et al . Universal soldier: using universal adversarial perturbations for detecting backdoor attacks [C ] // Proceedings of the 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) . Piscataway : IEEE Press , 2024 : 66 - 73 .

Wang Z T , Mei K , Ding H L , et al . Rethinking the reverse-engineering of Trojan triggers [C ] // Proceedings of the Advances in Neural Information Processing Systems 35 . Massachusetts : MIT Press , 2022 : 9738 - 9753 .

Xu X , Huang K Z , Li Y M , et al . Towards reliable and efficient backdoor trigger inversion via decoupling benign features [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2024 : 13784 - 13809 .

Xu X Y , Liu Z R , Koffas S , et al . BAN: detecting backdoors activated by adversarial neuron noise [C ] // Proceedings of the Advances in Neural Information Processing Systems 37 . Massachusetts : MIT Press , 2024 : 114348 - 114373 .

Gu T Y , Dolan-Gavitt B , Garg S . BadNets: identifying vulnerabilities in the machine learning model supply chain [PP ] . V2 . ( 2019-03-11 )[ 2026-01-30 ] . arXiv: arXiv. 1708 . 06733 .

Turner A , Tsipras D , Madry A . Label-consistent backdoor attacks [PP ] . V2 . ( 2019-12-06 )[ 2026-01-30 ] . arXiv: arXiv. 1912 . 02771 .

Li Y Z , Li Y M , Wu B Y , et al . Invisible backdoor attack with sample-specific triggers [C ] // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2021 : 16443 - 16452 .

Nguyen T A , Tran T A . Input-aware dynamic backdoor attack [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM Press , 2020 : 3454 - 3464 .

Nguyen T A , Tran A T . WaNet: imperceptible warping-based backdoor attack [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2021 : 6667 - 6683 .

Wang Z T , Zhai J , Ma S Q . BppAttack: stealthy and efficient Trojan attacks against deep neural networks via image quantization and contrastive adversarial learning [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 15054 - 15063 .

Chen X , Li M H , Sun Y B , et al . Invisible trigger image: a dynamic neural backdoor attack based on hidden feature [J ] . Neurocomputing , 2025 , 639 : 130296 .

Cheng S Y , Liu Y Q , Ma S Q , et al . Deep feature space Trojan attack of neural networks by controlled detoxification [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 2 ): 1148 - 1156 .

Zhao Z D , Chen X J , Xuan Y X , et al . DEFEAT: deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 15192 - 15201 .

Li Q Y , Chen W , Xu X T , et al . Precision strike: precise backdoor attack with dynamic trigger [J ] . Computers & Security , 2025 , 148 : 104101 .

Chen W M , Xu X W , Wang X D , et al . Dynamic frequency domain trigger backdoor attack with steganography against deep neural networks [J ] . Information Sciences , 2025 , 718 : 122368 .

Chen H L , Fu C , Zhao J S , et al . DeepInspect: a black-box Trojan detection and mitigation framework for deep neural networks [C ] // Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence . International Joint Conferences on Artificial Intelligence Organization , 2019 : 4658 - 4664 .

Doan B G , Abbasnejad E , Ranasinghe D C . Februus: input purification defense against Trojan attacks on deep neural network systems [C ] // Proceedings of the 36th Annual Computer Security Applications Conference . New York : ACM Press , 2020 : 897 - 912 .

Tran B , Li J , Mądry A . Spectral signatures in backdoor attacks [C ] // Proceedings of the 32nd International Conference on Neural Information Processing Systems . New York : ACM Press , 2018 : 8011 - 8021 .

Tao G H , Shen G Y , Liu Y Q , et al . Better trigger inversion optimization in backdoor scanning [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 13358 - 13368 .

Hu X L , Lin X , Cogswell M , et al . Trigger hunting with a topological prior for trojan detection [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2022 : 23417 - 23434 .

Li Y G , Lyu X X , Koren N , et al . Neural attention distillation: erasing backdoor triggers from deep neural networks [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2021 : 11661 - 11680 .

Zeng Y , Chen S , Park W , et al . Adversarial unlearning of backdoors via implicit hypergradient [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2022 : 1 - 27 .

Wu D X , Wang Y S . Adversarial neuron pruning purifies backdoored deep models [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : ACM Press , 2021 : 16913 - 16925 .

Li Y G , Lyu X X , Koren N , et al . Anti-backdoor learning: training clean models on poisoned data [C ] // Proceedings of the Neural Information Processing Systems (NeurIPS 2021) . Massachusetts : MIT Press , 2021 : 14900 - 14912 .

Wang Z T , Ding H L , Zhai J , et al . Training with more confidence: mitigating injected and natural backdoors during training [C ] // Proceedings of the Advances in Neural Information Processing Systems 35 . Massachusetts : MIT Press , 2022 : 36396 - 36410 .

Shen G Y , Liu Y Q , Tao G H , et al . Backdoor scanning for deep neural networks through K-arm optimization [C ] // Proceedings of the 38th International Conference on Machine Learning (ICML) . New York : PMLR , 2021 : 9525 - 9536 .

Wang Z T , Mei K , Zhai J , et al . UNICORN: a unified backdoor trigger inversion framework [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Vancouver : ICLR , 2023 : 26127 - 26147 .

Zhang H T , Wang Y C , Yan S H , et al . Test-time backdoor detection for object detection models [C ] // Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2025 : 24377 - 24386 .

Zhai S F , Li J J , Liu Y , et al . Efficient input-level backdoor defense on text-to-image synthesis via neuron activation variation [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE Press , 2025 : 15182 - 15193 .

Wu B Y , Chen H R , Zhang M D , et al . BackdoorBench: a comprehensive benchmark of backdoor learning [C ] // Proceedings of the Advances in Neural Information Processing Systems 35 . Massachusetts : MIT Press , 2022 : 10546 - 10559 .

Hampel F R . The influence curve and its role in robust estimation [J ] . Journal of the American Statistical Association , 1974 , 69 ( 346 ): 383 - 393 .

Liu Y Q , Ma S Q , Aafer Y , et al . Trojaning attack on neural networks [C ] // Proceedings 2018 Network and Distributed System Security Symposium . Internet Society , 2018 : 18 - 21 .

Liu K , Dolan-Gavitt B , Garg S . Fine-pruning: defending against backdooring attacks on deep neural networks [C ] // Research in Attacks, Intrusions, and Defenses . Berlin : Springer , 2018 : 273 - 294 .

Raj R , Roy B , Das A , et al . “We must protect the transformers”: underst-anding efficacy of backdoor attack mitigation on transformer models [C ] // Security, Privacy, and Applied Cryptography Engineering . Berlin : Springer , 2024 : 242 - 260 .

Yuan Z H , Zhou P , Zou K , et al . You are catching my attention: are vision transformers bad learners under backdoor attacks? [C ] // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2023 : 24605 - 24615 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于滤波器参数重构的稀疏剪枝方法

基于后门攻击的恶意流量逃逸方法

基于对比训练的联邦学习后门防御方法

基于雅可比显著图的电磁信号快速对抗攻击方法