周旭莹, 周宇飞, 林宇俊, et al. Method for identifying bad number of harassment fraud based on integrated learning and social relationship[J]. 2021, 34(12): 27-33.
DOI:
周旭莹, 周宇飞, 林宇俊, et al. Method for identifying bad number of harassment fraud based on integrated learning and social relationship[J]. 2021, 34(12): 27-33. DOI: 10.13992/j.cnki.tetas.2021.12.006.
Method for identifying bad number of harassment fraud based on integrated learning and social relationship
摘要
本文基于运营商网络通信数据和业务数据
探索"多数据汇聚、多技术融合"的新型数据挖掘方式。在研究手段上
首先采用XGBoost集成学习算法精准分类号码
有效区分骚扰诈骗号码、外卖快递号码和正常号码
并构建号码间的社交特征
基于社交关系进一步提升骚扰诈骗号码精度。经验证
本文通过XGBoost集成学习算法和社交关系的融合模型
进一步提升诈骗骚扰号码的精度至80%
可广泛应用于新型不良信息治理领域
助力打击电信网络新型违法犯罪治理
维护社会公共安全。
Abstract
Based on the communication data and business data of operator’s network
this paper explores a new data mining method of "multi-data aggregation and multi-technology integration". In terms of research means
firstly
XGBoost integrated learning algorithm is used to accurately classify numbers and effectively distinguish three categories of numbers: harassment fraud numbers
takeout express numbers and normal numbers. And it builds the social characteristics between numbers to further improve the accuracy of harassment and fraud numbers based on social relations. After verification
this paper further improves the accuracy of fraud and harassment numbers to 80% through the xgboost integrated learning algorithm and the fusion model of social relations
which can be widely used in the field of new bad information governance
especially to help combat the new illegal and criminal governance of telecom networks and maintain social and public security.