计算机工程与应用Issue(2):10-14,43,6.DOI:10.3778/j.issn.1002-8331.1307-0121
语义、句法网络作为语体分类知识源的对比研究
Comparison study of using semantic and syntactic network characteristics to do text clustering
摘要
Abstract
The study builds six dependence syntactic networks and semantic networks based on syntactic and semantic treebanks of different genres and does a comparative analysis of overall features of the networks, including the number of edges, the number of the nodes, the average degree, the clustering coefficient, the average path length, the centraliza-tion, the diameter, the index of power-law, and the coefficient of determination. The article tries multi-methods, with fea-tures as variables, to do clustering analysis of these networks. The results show that, although the syntactic and semantic networks all follow the linguistic principles, there are obvious differences between syntax and semantic networks. The meanings of the network parameters vary and the clustering results according to the parameters are different. Using the combinations of main semantic network parameters can obtain relatively reasonable clustering results, but it cannot distin-guish well written style from colloquialism while using the combinations of main syntactic network parameters can well distinguish different styles of texts and obtain reasonable text clustering results.关键词
语体/文本分类/网络特征Key words
genre/text clustering/network features分类
信息技术与安全科学引用本文复制引用
陈芯莹,刘海涛..语义、句法网络作为语体分类知识源的对比研究[J].计算机工程与应用,2014,(2):10-14,43,6.基金项目
国家社会科学基金重大项目(No.11&ZD188)。 ()