|国家科技期刊平台
首页|期刊导航|现代情报|基于机器学习分类算法的高质量专利成果筛选研究

基于机器学习分类算法的高质量专利成果筛选研究OACHSSCDCSTPCD

Research on the Screening Method of High-quality Patent Results Based on Machine Learning Classification Algorithms

中文摘要英文摘要

[目的 /意义]基于客观数据形成一套自动筛选方法,对专利成果质量进行快速识别,为推动专利成果转化工作提供决策支持.[方法/过程]首先,以专利成果的发明人数量、IPC号数量等形式特征结合语义向量匹配度特征、专利成果质量标注结果,构建高质量专利成果筛选指标体系;其次,以"先进制造与自动化"领域为例,在专利之星平台检索该领域的发明专利作为专利文本数据来源,并以湖北省需求为例,将其相关的产业发展规划(宏观)和市场技术需求(微观)作为需求文本数据来源;随后,采用分词、去停、文本向量化等步骤对专利文本和需求文本进行处理,并整理形成训练集和测试集;最后,调用 8 种机器学习分类算法模型进行训练与评估,并对训练效果最优的算法展开应用测试,以验证筛选方法的可行性.[结果/结论]结果显示,随机森林算法模型在选取的 8 类算法模型中整体表现最优,被用为高质量专利成果筛选方法中的内核分类算法.此外,本文提出的筛选方法对专利成果质量识别具备较强的可行性,能够结合不同省(市)的特定专利需求,快速地进行大批量专利成果的筛选,在一定程度上可有效降低人力、物力和财力成本的消耗.

[Purpose/Significance]Based on objective data,the study forms a set of automatic screening methods to quickly identify the quality of patent results and provides decision support to promote the transformation of patent results.[Methodology/Process]Firstly,the study constructed a high-quality patent results screening index system with combining the formal features such as the number of inventors and the number of IPC numbers of patent results with the semantic vector matching degree features and the quality annotation results of patent results;Secondly,taking the field of"advanced man-ufacturing and automation"as an example,the study retrieved the invention patents in this field on the Patent Star platform as the source of patent text data,and took the demand of Hubei Province as an example,and took its relevant industrial development plan(macro)and market technology demand(micro)as the source of demand text data.;then,processed the patented text and the demanded text by using word separation,de-stopping,text vectorization and other steps,and or-ganized to form a training set and a test set;finally,called eight machine learning classification algorithm model for train-ing and evaluation,and tested the algorithm with the best training effect for application to verify the feasibility of the screen-ing method.[Results/Conclusion]The results show that the random forest algorithm model has the best overall perform-ance among the selected eight types of algorithm models,and is used as the kernel classification algorithm in the screening method of high-quality patent results.In addition,the screening method proposed in this paper has a strong feasibility for the quality identification of patent results and can combine the specific patent needs of different provinces(municipalities)to quickly screen large quantities of patent results,which face to a certain extent,effectively reduce the consumption of hu-man,material and financial resources costs.

周一夫;谭春辉;江婷;李玥澎;毕慧婷;汪红信

华中师范大学信息管理学院, 湖北 武汉 430079湖北技术交易所, 湖北 武汉 430071武汉大学信息管理学院, 湖北 武汉 430072

专利成果筛选高质量专利成果机器学习Doc2vec

screening of patent resultshigh-quality patent resultsmachine learningDoc2vec

《现代情报》 2024 (002)

81-91 / 11

2022 年度华中师范大学基本科研业务费(人文社科类)交叉科学研究项目"基于大数据的科教智能评价与智慧服务模式研究"(项目编号:CCNU22JC031).

10.3969/j.issn.1008-0821.2024.02.007

评论