Cross-Modal Semantic Alignment-Based Method for Detecting Text-Image Consistency in Telecom Fraud

doi:null

Chinese

您当前的位置：

首页 >

文章列表页 >

Cross-Modal Semantic Alignment-Based Method for Detecting Text-Image Consistency in Telecom Fraud

更新时间：2026-04-21

- Cross-Modal Semantic Alignment-Based Method for Detecting Text-Image Consistency in Telecom Fraud
- Telecommunications Science (2026)
- 作者机构：
- 作者简介：
- 基金信息：
- DOI：
  CLC： TP393
- Received：26 February 2026，
  
  Revised：2026-04-18，
  
  Accepted：20 April 2026，
- 稿件说明：
移动端阅览
Cross-Modal Semantic Alignment-Based Method for Detecting Text-Image Consistency in Telecom Fraud[J/OL]. Telecommunications Science, 2026.
DOI：

Cross-Modal Semantic Alignment-Based Method for Detecting Text-Image Consistency in Telecom Fraud[J/OL]. Telecommunications Science, 2026. DOI：

摘要

随着生成式AI发展，电信诈骗日益呈现视觉化与复合化趋势，诈骗分子通过伪造转账截图、公文等图像搭配误导文本，构成“图文并悖”欺诈，传统检测系统面临挑战。为此，本文提出一种基于深度跨模态语义对齐的图文一致性检测方法：先利用微调的YOLOv8定位图像中关键视觉元素，并解析文本实体与欺诈意图；再借助预训练CLIP模型将视觉与文本信息编码至同一语义空间；最后通过计算特征向量余弦相似度并结合动态阈值评估语义矛盾。在自建电信诈骗图文数据集上的实验表明，该方法F1值达92.7%，较特征拼接基线提升19.6%，且对噪声、裁剪等干扰具有良好鲁棒性，为自动化、高精度的混合型诈骗识别提供了可解释解决方案。

Abstract

With the advancement of generative AI

telecom fraud is increasingly exhibiting visual and composite trends. Fraudsters combine forged images

such as transfer screenshots and official documents

with misleading text to form "text-image contradiction" scams

posing significant challenges to traditional detection systems. To address this issue

this paper proposes a deep cross-modal semantic alignment-based method for detecting text-image consistency. The method first employs a fine-tuned YOLOv8 model to localize and extract key visual elements

such as logos and seals

from images

while simultaneously parsing key entities and fraudulent intents from the text. Subsequently

a pre-trained CLIP model is utilized to encode the extracted visual regions and text information into the same semantic space

obtaining comparable feature vectors. Finally

the semantic contradiction between text and images is quantitatively assessed by computing the cosine similarity of the visual and textual feature vectors combined with an adaptive dynamic threshold. Experimental results on a self-built dataset of telecom fraud text-image pairs demonstrate that the proposed method achieves an F1-score of 92.7%

representing a 19.6% improvement over the feature concatenation baseline. Furthermore

it exhibits good robustness against common adversarial interferences such as noise addition and image cropping. This study provides an efficient and interpretable solution for automated

high-precision detection of hybrid telecom fraud involving both text and images.

关键词

Keywords

references

LI J , DANG J , WANG Y , et al . Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network [J ] . Entropy , 2025 , 27 ( 10 ): 1013 .

YANG J , LI S , HUANG Z , et al . An improve fraud detection framework via dynamic representations and adaptive frequency response filter [J ] . Scientific Reports , 2025 , 15 ( 1 ): 19051 .

梁飞 , 张世星 , 陈子睿 . 基于威胁环境感知与大模型特征增强的区块链异常交易检测模型 [J ] . 数据与计算发展前沿（中英文） , 2025 , 7 ( 6 ): 23 - 34 .

LIANG F , ZHANG S X , CHEN Z R . Blockchain Anomaly Transaction Detection Model Based on Threat Environment Awareness and Large Model Feature Enhancement [J ] . Frontiers of Data & Computing , 2025 , 7 ( 6 ): 23 - 34 .

牟宇伻 , 芦天亮 , 陈亮 . 电信网络诈骗犯罪中星链设备溯源方法 [J ] . 情报杂志 , 2025 , 44 ( 4 ): 1 - 9 .

MU Y P , LU T L , CHEN L . A Traceability Method for Starlink Devices in Telecom Network Fraud Crime [J ] . Journal of Intelligence , 2025 , 44 ( 4 ): 1 - 9 .

吕仁堃 , 孙鹏 , 郎宇博 , 等 . 面向深度伪造检测的高效自解释图神经网络 [J ] . 计算机应用研究 , 2025 , 42 ( 6 ): 1832 - 1840 .

LYU R K , SUN P , LANG Y B , et al . Efficient self-explaining graph neural network for deepfake detection [J ] . Application Research of Computers , 2025 , 42 ( 6 ): 1832 - 1840 .

刘晓龙 , 刘欢 , 赵耀 , 等 . AIGC伪造内容被动检测与主动防御技术综述 [J ] . 中国科学: 信息科学 , 2025 , 55 ( 9 ): 2250 - 2288 .

LIU X L , LIU H , ZHAO Y , et al . Passive detection and active defense for AIGC-generated fake content: a survey [J ] . Scientia Sinica (Informationis) , 2025 , 55 ( 9 ): 2250 - 2288 .

杨红梅 , 赵勋 . 人工智能赋能网络安全的挑战与应用 [J ] . 中兴通讯技术 , 2025 , 31 ( 3 ): 39 - 43 .

YANG H M , ZHAO X . Challenges and Applications of AI-Enabled Cybersecurity [J ] . ZTE Technology Journal , 2025 , 31 ( 3 ): 39 - 43 .

刘治杰 , 丁锰 . 基于多模态特征融合的恶意程序分类研究 [J ] . 计算机应用与软件 , 2025 , 42 ( 5 ): 311 - 319 .

LIU Z J , DING M . Malware Classification Research Based on Multi-Modal Feature Fusion [J ] . Computer Applications and Software , 2025 , 42 ( 5 ): 311 - 319 .

高建新 , 孙锦平 , 蔡瑜坤 , 等 . 人工智能犯罪与我国对策研究 [J ] . 中国科学院院刊 , 2025 , 40 ( 3 ): 408 - 418 .

GAO J X , SUN J P , CAI Y K , et al . Artificial Intelligence Crime and China’s Countermeasures [J ] . Bulletin of Chinese Academy of Sciences , 2025 , 40 ( 3 ): 408 - 418 .

张玲玲 , 黄务兰 . 基于ChatGPT API和提示词工程的专利知识图谱构建 [J ] . 情报杂志 , 2025 , 44 ( 3 ): 180 - 187 .

ZHANG L L , HUANG W L . Patent Knowledge Graph Construction Based on ChatGPT API and Prompt Engineering [J ] . Journal of Intelligence , 2025 , 44 ( 3 ): 180 - 187 .

陈咏豪 , 蔡满春 , 张溢文 , 等 . 基于参数高效微调及双流网络的人脸伪造检测 [J ] . 计算机工程与应用 , 2025 , 61 ( 10 ): 288 - 298 .

CHEN Y H , CAI M C , ZHANG Y W , et al . Face Forgery Detection Based on Parameter-Efficient Fine-Tuning and Dual-Stream Network [J ] . Computer Engineering and Applications , 2025 , 61 ( 10 ): 288 - 298 .

游畅 , 黄诚 , 田璇 , 等 . 基于多维特征的涉诈网站检测与分类技术研究 [J ] . 四川大学学报（自然科学版） , 2024 , 61 ( 4 ): 33 - 42 .

YOU C , HUANG C , TIAN X , et al . Fraudulent Website Detection and Classification Technology Based on Multidimensional Features [J ] . Journal of Sichuan University (Natural Science Edition) , 2024 , 61 ( 4 ): 33 - 42 .

周业勤 , 邱莉榕 , 张熙 . 基于词典增强的电信诈骗文本线索词提取模型 [J ] . 东北师大学报（自然科学版） , 2025 , 57 ( 3 ): 86 - 94 .

ZHOU Y Q , QIU L R , ZHANG X . Telecom Fraud Text Clue Word Extraction Model Based on Dictionary Enhancement [J ] . Journal of Northeast Normal University (Natural Science Edition) , 2025 , 57 ( 3 ): 86 - 94 .

陈傲 , 白恩健 , 吴贇 , 等 . 融合CNN与ViT的深度伪造人脸篡改视频检测方法 [J ] . 东华大学学报（自然科学版） , 2025 , 51 ( 6 ): 62 - 69 .

CHEN A , BAI E J , WU Y , et al . Deepfake Face Tampering Video Detection Method Fusing CNN and ViT [J ] . Journal of Donghua University (Natural Science Edition) , 2025 , 51 ( 6 ): 62 - 69 .

余聪 , 李柏岩 , 刘晓强 . 基于深度学习的网页违规图片检测 [J ] . 现代计算机 , 2022 ( 13 ): 45 - 50 .

YU C , LI B Y , LIU X Q . Webpage Non-Compliant Image Detection Based on Deep Learning [J ] . Modern Computer , 2022 ( 13 ): 45 - 50 .

梁枭杰 . 基于免疫深度网络的网页信息违规检测研究 [D ] . 天津 : 天津理工大学 , 2025 .

LIANG X J . Research on Web Information Violation Detection Based on Immune Deep Network [D ] . Tianjin : Tianjin University of Technology , 2025 .

XIANGJUN K . Construction of Automatic Matching Recommendation System for Web Page Image Packaging Design Based on Constrained Clustering Algorithm [J ] . Mobile Information Systems , 2022 , 2022 : 9706598 .

司海平 , 李阔 , 李婷婷 , 等 . 基于提示对比学习的小样本电信诈骗文本分类方法研究 [J ] . 计算机应用与软件 , 2025 , 42 ( 5 ): 77 - 84 .

SI H P , LI K , LI T T , et al . Research on Few-Shot Telecom Fraud Text Classification Method Based on Prompt Contrastive Learning [J ] . Computer Applications and Software , 2025 , 42 ( 5 ): 77 - 84 .

斯彬洲 , 孙海春 , 吴越 . 基于大语言模型和事件融合的电信诈骗事件风险分析 [J ] . 数据分析与知识发现 , 2025 , 9 ( 7 ): 38 - 51 .

SI B Z , SUN H C , WU Y . Telecom Fraud Event Risk Analysis Based on Large Language Model and Event Fusion [J ] . Data Analysis and Knowledge Discovery , 2025 , 9 ( 7 ): 38 - 51 .

庄华 , 马忠红 . 电信网络诈骗犯罪预警的失灵与优化 [J ] . 情报杂志 , 2025 , 44 ( 2 ): 116 - 123 .

ZHUANG H , MA Z H . Failure and Optimization of Early Warning for Telecom Network Fraud Crime [J ] . Journal of Intelligence , 2025 , 44 ( 2 ): 116 - 123 .

尹彦 , 张红斌 , 刘滨 , 等 . 网络安全态势感知中的威胁情报技术 [J ] . 河北科技大学学报 , 2021 , 42 ( 2 ): 195 - 204 .

YIN Y , ZHANG H B , LIU B , et al . Threat Intelligence Technology in Network Security Situation Awareness [J ] . Journal of Hebei University of Science and Technology , 2021 , 42 ( 2 ): 195 - 204 .

张溢文 , 蔡满春 , 陈咏豪 , 等 . 融合空间特征的多尺度深度伪造检测方法 [J ] . 计算机工程 , 2024 , 50 ( 7 ): 240 - 250 .

ZHANG Y W , CAI M C , CHEN Y H , et al . Multi-Scale Deepfake Detection Method Fusing Spatial Features [J ] . Computer Engineering , 2024 , 50 ( 7 ): 240 - 250 .

冯畅 , 吴晓龙 , 赵熠扬 , 等 . 生成式伪造语音安全问题与解决方案 [J ] . 信息安全研究 , 2024 , 10 ( 2 ): 122 - 129 .

FENG C , WU X L , ZHAO Y Y , et al . Security Issues and Solutions of Generative Fake Audio [J ] . Journal of Information Security Research , 2024 , 10 ( 2 ): 122 - 129 .

李泽卿 , 黄诚 , 曾雨潼 , 等 . 基于知识图谱嵌入的涉诈网络链接补全和关键节点识别 [J ] . 四川大学学报（自然科学版） , 2024 , 61 ( 3 ): 44 - 54 .

LI Z Q , HUANG C , ZENG Y T , et al . Fraudulent Network Link Completion and Key Node Identification Based on Knowledge Graph Embedding [J ] . Journal of Sichuan University (Natural Science Edition) , 2024 , 61 ( 3 ): 44 - 54 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Infrared image human vehicle detection algorithm based on SDL-YOLO

Image synthesis method based on multiple text description

Related Author

HUANG Guochao

CHANG Cheng

FANG Hongsu

YANG Wei

NIE Kaiqin

NI Zhengwei

Related Institution

School of Automotive Engineering, Zhejiang Institute of Communications

School of Automobile, Chang'an University

College of Information and Electronic Engineering, Zhejiang Gongshang University

AI问答

Postal code：100079
Tel：（010）53879206 Email：tmw@bjxintong.com.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备09082226号-64 京公网安备11010602201714号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰