It has been a widespread tendency that large amount of new words are emerging in web text corpus. Among these are many new words created by netizens or arising from social focuses
and are also many colloquial expressions
abbreviations in the social intercourse corpus created by SNS. All the above cases together make it diffi cult for words segmentation. In this essay a new extraction method for new words is proposed
aiming to extract new words in a certain corpus
to generate a dictionary and to segment the Chinese expressions more accurately. The new method fi rstly extracts candidate words from the corpus
and then calculates its support and confi dence
sifts the new words out
and fi nally extracts new words accurately and rapidly from huge text data.