中国中医药图书情报杂志Issue(3):8-11,4.DOI:10.3969/j.issn.2095-5707.2015.03.003
中医方剂数据库文本挖掘数据预处理的尝试
An Attempt on Data Preprocessing for Text Mining in TCM Prescription Database
摘要
Abstract
Objective To propose a set of data preprocessing method based on data cleaning for TCM prescription database;To make data more standard, accurate and orderly, and convenient for follow-up processing. Methods The text data source was retrieved from prescription databases by bibliographic searching techniques. Non-normalized data were processed through steps followed by auxiliary word group line processing, regular expression substitution, and synonyms processing, with a purpose to improve data quality. Results Totally 1758 effective records were retrieved from TCM prescription database, and 91 records were retrieved from prescription modern application database. 6913 effective Chinese herbal medicines were retrieved after preprocessing, which can be successfully imported into relevant information mining system, and information about prescription and herb names can be extracted. Conclusion This method is applicable for text mining and knowledge discovery in TCM prescription database. It can successfully implement data cleaning for source text data, get data with unified standard and without noise, and finally realize the effective extraction of prescription information, which can provide references for researches on analysis and mining of TCM prescription text data.关键词
中医方剂/方剂数据库/文本挖掘/数据预处理/数据清洗Key words
TCM prescriptions/prescription database/text mining/data preprocessing/data cleaning引用本文复制引用
吴磊,李舒..中医方剂数据库文本挖掘数据预处理的尝试[J].中国中医药图书情报杂志,2015,(3):8-11,4.基金项目
辽宁省教育厅科研课题 ()