| 注册
首页|期刊导航|中国中医药图书情报杂志|中医方剂数据库文本挖掘数据预处理的尝试

中医方剂数据库文本挖掘数据预处理的尝试

吴磊 李舒

中国中医药图书情报杂志Issue(3):8-11,4.
中国中医药图书情报杂志Issue(3):8-11,4.DOI:10.3969/j.issn.2095-5707.2015.03.003

中医方剂数据库文本挖掘数据预处理的尝试

An Attempt on Data Preprocessing for Text Mining in TCM Prescription Database

吴磊 1李舒2

作者信息

  • 1. 辽宁中医药大学信息工程学院,辽宁 沈阳 110847
  • 2. 中国医科大学医学信息学系,辽宁 沈阳 110001
  • 折叠

摘要

Abstract

Objective To propose a set of data preprocessing method based on data cleaning for TCM prescription database;To make data more standard, accurate and orderly, and convenient for follow-up processing. Methods The text data source was retrieved from prescription databases by bibliographic searching techniques. Non-normalized data were processed through steps followed by auxiliary word group line processing, regular expression substitution, and synonyms processing, with a purpose to improve data quality. Results Totally 1758 effective records were retrieved from TCM prescription database, and 91 records were retrieved from prescription modern application database. 6913 effective Chinese herbal medicines were retrieved after preprocessing, which can be successfully imported into relevant information mining system, and information about prescription and herb names can be extracted. Conclusion This method is applicable for text mining and knowledge discovery in TCM prescription database. It can successfully implement data cleaning for source text data, get data with unified standard and without noise, and finally realize the effective extraction of prescription information, which can provide references for researches on analysis and mining of TCM prescription text data.

关键词

中医方剂/方剂数据库/文本挖掘/数据预处理/数据清洗

Key words

TCM prescriptions/prescription database/text mining/data preprocessing/data cleaning

引用本文复制引用

吴磊,李舒..中医方剂数据库文本挖掘数据预处理的尝试[J].中国中医药图书情报杂志,2015,(3):8-11,4.

基金项目

辽宁省教育厅科研课题 ()

中国中医药图书情报杂志

2095-5707

访问量0
|
下载量0
段落导航相关论文