自动化学报2012,Vol.38Issue(3):399-411,13.DOI:10.3724/SP.J.1004.2012.00399
邮件网络协同过滤机制研究
Spam Collaborative Filtering in Enron E-mail Network
摘要
Abstract
Social network analysis in Enron corpus found that the real e-mail network was a scale-free and small world in some degree. Then a spam collaborative filtering method was designed based on users' interaction. By adjusting the parameter A, users can decide filtering spam by themselves or others or trade-off between them. Even in the absence of reading habits of users, the collaborative filtering method could achieve good performance. Because the Enron corpus was unlabeled, by adding I.I.d. Assumption constraint to training data set W and test data set T, we labeled Enron corpus using improved EM (Expectation maximization) algorithm in a sense of minimum statistical risk in W ∪ T. Experiment results showed that the collaborative filtering method is simple and effective which can steadily increase average accuracy compared with single machine and ensemble filterings.关键词
文本分类/邮件过滤/邮件网络/协同过滤Key words
Text classification/spam filtering/e-mail network/collaborative filtering引用本文复制引用
杨震,赖英旭,段立娟,李玉鑑,许昕..邮件网络协同过滤机制研究[J].自动化学报,2012,38(3):399-411,13.基金项目
国家自然科学基金(61001178,60905017,61175115),国家软科学研究计划项目(2010GXQ5D317),北京市自然科学基金(4102012,4112009,4102013,4123093),北京市教育委员会科技发展计划面上项目(KM201210005024),北京市教育委员会科技发展计划重点项目(KZ201210005007),北京市高等学校人才强教深化计划“中青年骨干人才培养计划”项目(PHR201108016),北京工业大学高层人才培养项目,北京工业大学校青基金资助 (61001178,60905017,61175115)