New method for the auto-extraction of new words

李亚松; 王玉龙

doi:10.13992/j.cnki.tetas.2014.12.020

Chinese

您当前的位置：

首页 >

文章列表页 >

New method for the auto-extraction of new words

更新时间：2026-04-20

- New method for the auto-extraction of new words
- Vol. 27, Issue 12, Pages: 83-86(2014)
- 作者机构：
  
  1. 北京邮电大学网络与交换技术国家重点实验室
  2. 东信北邮信息技术有限公司
- 作者简介：
- 基金信息：
- DOI：10.13992/j.cnki.tetas.2014.12.020
  CLC： TP391.1
- Published：2014
- 稿件说明：
移动端阅览
[1]李亚松,王玉龙.一种新词自动提取方法[J].电信工程技术与标准化,2014,27(12):83-86.

李亚松, 王玉龙. New method for the auto-extraction of new words[J]. 2014, 27(12): 83-86.
[1]李亚松,王玉龙.一种新词自动提取方法[J].电信工程技术与标准化,2014,27(12):83-86. DOI： 10.13992/j.cnki.tetas.2014.12.020.

李亚松, 王玉龙. New method for the auto-extraction of new words[J]. 2014, 27(12): 83-86. DOI： 10.13992/j.cnki.tetas.2014.12.020.

摘要

当前网络语料会不断出现大量新词已经成为一种普遍的趋势

这里面包含大量网友创造的新词

以及一些社会热点形成的新词。同时社交网络产生的社交性语料存在大量口语化、简称和随意的表达。这些都对中文分词的准确性造成了困扰。本文提出了一种新词自动提取方法

旨在能准确快速地在特定的语料里提取新词

生成特定领域词典

更准确地对网络语料进行中文分词。通过从语料中提取候选词

计算候选词的支持度和置信度

通过阈值刷选出新词

从而实现从海量文本中准确且快速的提取新词。

Abstract

It has been a widespread tendency that large amount of new words are emerging in web text corpus. Among these are many new words created by netizens or arising from social focuses

and are also many colloquial expressions

abbreviations in the social intercourse corpus created by SNS. All the above cases together make it diffi cult for words segmentation. In this essay a new extraction method for new words is proposed

aiming to extract new words in a certain corpus

to generate a dictionary and to segment the Chinese expressions more accurately. The new method fi rstly extracts candidate words from the corpus

and then calculates its support and confi dence

sifts the new words out

and fi nally extracts new words accurately and rapidly from huge text data.

关键词

Keywords

references

中文新词识别技术综述 [J]. 张海军,史树敏,朱朝勇,黄河燕. 计算机科学 . 2010(03)

基于聚类的网络舆情热点发现及分析 [J]. 王伟,许鑫. 现代图书情报技术 . 2009(03)

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Research for Data Consistency of Business Support System

Related Author

顾怀恩

Related Institution

中国移动通信集团北京有限公司

AI问答

Postal code：100079
Tel：（010）53879206 Email：tmw@bjxintong.com.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备09082226号-64 京公网安备11010602201714号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰