计算机应用与软件2015,Vol.32Issue(5):54-58,5.DOI:10.3969/j.issn.1000-386x.2015.05.013
基于模糊加权近似支持向量机的 Web文本分类
WEB TEXT CLASSIFICATION ALGORITHM BASED ON FUZZY WEIGHTED PROXIMAL SUPPORT VECTOR MACHINE
摘要
Abstract
Web text classification is a hot topic in data mining field.In light of the high-dimension and imbalance features of Web text data, we propose in this paper the fuzzy weighted proximal support vector machine ( FWPSVM) which introduces fuzzy membership and balance factor to PSVM.First, it calculates the average density of samples, and seeks the balance factor in combination with samples' num-ber and overcomes the defect of traditional weighted algorithms that it sets the weighting value only based on samples' number, thus mitigates the offset of the classification hyperplane caused by the imbalanced data.Then it calculates the fuzzy membership of samples in order to elimi-nate the classification error incurred from noise and singular point.The PSVM has noticeable advantage in speed compared with standard SVM, and is more suitable for high-dimension data classification.Experiments indicate that the proposed algorithm can effectively improve the classification accuracy of imbalanced data, and makes certain improvement on Web text training speed and classification quality.关键词
文本分类/近似支持向量机/模糊隶属度/平衡因子/不平衡数据Key words
Text classification/Proximal support vector/machine( PSVM) Fuzzy membership/Balance factor/Imbalanced data分类
信息技术与安全科学引用本文复制引用
王平,吴剑..基于模糊加权近似支持向量机的 Web文本分类[J].计算机应用与软件,2015,32(5):54-58,5.基金项目
江西省科技支撑计划项目(2009BGB01 900) (2009BGB01 900)
江西省自然科学基金项目(2009JX02367). (2009JX02367)