朱壮军, 王彬. Application of massive data extraction technology in IDC bad information monitoring system[J]. 2020, 33(11): 82-87. DOI: 10.13992/j.cnki.tetas.2020.11.014.
海量数据采集技术在IDC不良信息监测系统中的应用研究
摘要
本文阐述了某电信企业在建设IDC不良信息监测系统过程中
为高效处理每天的海量数据
选取了多种数据采集技术
进行反复方案论证和实验对比
最终选择了"Hadoop脚本+FTP"方式
极大提高了数据采集效率
实现了海量数据高效采集和处理
保证IDC不良信息监测系统能够及时发现和处理IDC中包含的不良信息
助力IDC业务健康发展
避免给国家和社会带来负面影响。
Abstract
This paper describes that in the process of building IDC bad information monitoring system
in order to efficiently process tens of tons of massive data every day
a variety of data extraction technologies are selected for repeated scheme demonstration and experimental comparison. Finally
Hadoop script + FTP mode is selected
which greatly improves the efficiency of data extraction and realizes efficient collection and processing of massive data
to ensure that IDC bad information monitoring system can discover and process the bad information contained in time
help IDC business develop healthily
and avoid negative impact on the country and society.