吉晶, 余凤丽. A novel feature mining method based on the characters of telecom data[J]. 2019, 32(3): 61-65. DOI: 10.13992/j.cnki.tetas.2019.03.017.
A novel feature mining method based on the characters of telecom data
摘要
电信数据特征具备含有缺失率、量纲相差大、内在相关性高等特点
现有特征挖掘技术无法完全适用。由此
本文设计了一种新型特征挖掘方法
从缺失率、标准差、相关性和重要度4个维度出发
分别拟合指标量化评估函数
并加权计算综合评分来挖掘特征。最后
本文使用实际业务数据
与业内最常用的随机森林特征挖掘方法进行比较
得出本文设计的方法从更多维度综合衡量
其特征挖掘结果更加科学合理
并且节约25%时间成本
实用性强。
Abstract
Telecom data shows characters that containing missing data rate
large difference in feature dimensions
high correlation in features
which current feature mining technology can not fully applied to. Therefore
a novel feature mining method was designed to deal with such problems
which was consist of four indicators
namely
the missing data rate
standard deviation
correlation and importance. Firstly
based on these indicators
different evaluation functions were fitted. Then the comprehensive score of each feature was calculated with weight value for feature mining. On the other hand
based on actual telecom data
comparing with Random Forest
which was one of the most commonly used feature mining algorithm
the conclusion was drawn that this novel method can obtain a more scienti?c and reasonable result for measuring from more dimensions
what’s more
it saves 25% cost of time and provides strong practicability.