|国家科技期刊平台
首页|期刊导航|计算机应用与软件|面向法院电子卷宗的文本分类方法研究

面向法院电子卷宗的文本分类方法研究OA北大核心CSTPCD

TEXT CLASSIFICATION METHOD FOR COURT ELECTRONIC FILE

中文摘要英文摘要

针对法院电子卷宗文本分类的主要问题,给出相应解决方案.提出卷宗文件的多维度语义表示方法,得到更准确全面的文本特征信息;使用基于高斯核的KELM(Kernel Extreme Learning Machine)学习文本分类器,获取全局最优解的同时大幅提高训练效率;使用基于RLS(Recursive Least Squares)的序列优化模型KOS-ELM,通过新样本对模型参数迭代更新,使分类模型具备在线自学习的能力,减少了对初始样本的依赖.对比实验证明,基于高斯核的KELM分类模型在正确率上比BP网络模型和LSSVM分别提高了 2.66百分点和4.43百分点,但训练时间只有两者的1/6和1/10;采用多维度语义表示方法为模型提供输入,在正确率上比文本向量和词向量表示方法分别提高了 8.84百分点和2.33百分点;采用基于RLS的序列优化模型KOS-ELM对弱分类器进行迭代优化,以4种不同步长迭代20次后,分类正确率均得到显著提升.

This paper provides corresponding solutions to the main problems in the text classification of court electronic files.We propose a multi-dimensional semantic representation method for court case file to obtain more accurate and comprehensive text feature information.The Gaussian kernel-based kernel extreme learning machine(KELM)learning text classifier was used to get the global optimal solution while greatly improving the training efficiency.The sequence optimization model KOS-ELM based on recursive least squares(RLS)was used to iteratively update the model parameters through new samples.The solutions enabled the classification model to learn online by itself and reduce the dependence on the initial samples.Through comparative experiments,it was proved that the accuracy of the Gaussian kernel-based KELM classification model was 2.66 percentage points and 4.43 percentage points higher than that of the BP network model and LSSVM,but the training time was only 1/6 and 1/10 of the two.The multi-dimensional semantic representation method was used to provide input for the model,and the accuracy rate was 8.84 percentage points and 2.33 percentage points higher than the text vector and word vector representation methods respectively.The RLS-based sequence optimization model KOS-ELM was used to iteratively optimize the weak classifier.After 20 iterations with 4 different types of step-size,the classification accuracy was significantly improved.

王霄;万玉晴

太极计算机股份有限公司 北京 100102

计算机与自动化

法院电子卷宗文本分类语义表示核极限学习机递归最小二乘

Court electronic fileText classificationSemantic representationKernel extreme learning machineRecursive least squares

《计算机应用与软件》 2024 (006)

101-107,133 / 8

国家重点研发计划项目(2018YFC0807700).

10.3969/j.issn.1000-386x.2024.06.015

评论