现代电子技术2015,Vol.38Issue(16):51-55,5.
大数据下MongoDB数据库档案文档存储去重研究
Research on duplicated document removal in big data archive storage of MongoDB database
摘要
Abstract
In allusion to the present situation in document storage in case of big data,the MongoDB method to save docu-ments is proposed according to the reason analysis of duplication in document storage. GridFs of MongoDB is used to store different type documents. Three different assemblages are definited to store the uploader record,document information record and content of blocked documents respectively. A research is proposed for removing the duplication by checking whether MD 5 check code is same or not. It is significant to realize program code for duplicated document removal. The distributive memory database was used to enhance the expandability of the document saving system. The experimental result shows that this method can remove the duplicated documents effectively and improve the efficiency of inquiry.关键词
MongoDB/MD5/大数据/档案文档去重/GridFsKey words
MongoDB/MD5/big data/file document duplicate removal/GridFs分类
信息技术与安全科学引用本文复制引用
贺建英..大数据下MongoDB数据库档案文档存储去重研究[J].现代电子技术,2015,38(16):51-55,5.基金项目
国家档案局项目:基于大数据的档案数据去重模型与方法研究(2014-X-65) (2014-X-65)
四川省教育厅一般项目:大数据环境下NoSQL数据库应用研究(14ZB0313) (14ZB0313)