东南大学学报(自然科学版)2017,Vol.47Issue(4):660-666,7.DOI:10.3969/j.issn.1001-0505.2017.04.006
数据驱动的细粒度中文属性对齐方法
Data-driven method for fine-grained property alignment between Chinese open datasets
摘要
Abstract
In order to improve the performance of property alignment between heterogeneous Chinese open datasets, a data-driven method for fine-grained alignment is proposed, which exploits the extension and domain information of properties to find equivalence, subsumption and relevance relations between properties in a unified way.First, the data types of properties are determined utilizing statistical theory, and a type-aware metric is given to calculate the similarity of properties.Based on that, the property relation recognition is modeled as a multi-classification problem, and effective features are generated to represent different property relationships and construct the random forest classifier.The experimental results show that, the proposed method can reach a precision of 94.6% in determining data types of properties, and the final F1 measures in recognizing equivalent, subsumptive and relevant properties are 71.3%, 57.3% and 59.9%, respectively.Compared with the traditional approaches that only focus on equivalent properties, the fine-grained property alignment method can improve the precision in recognizing equivalent properties, and recognize subsumptive and relevant properties, proving its effectiveness on Chinese open datasets.关键词
中文属性对齐/属性数据类型判别/属性相似度/异构数据集成/知识图谱构建Key words
Chinese property alignment/property data type determination/similarity of properties/heterogeneous data integration/construction of knowledge graphs分类
信息技术与安全科学引用本文复制引用
黄廷磊,张伟莉,梁霄,付琨..数据驱动的细粒度中文属性对齐方法[J].东南大学学报(自然科学版),2017,47(4):660-666,7.基金项目
国家高技术研究发展计划(863计划)资助项目(2012AA011005). (863计划)