通信与信息技术Issue(1):7-12,30,7.
基于自然语言处理的职务犯罪法律文书处理与分析研究
Research on the processing and analysis of legal documents for duty crimes based on natural language processing
摘要
Abstract
In recent years,there have been frequent cases of job-related crimes,and existing research is mostly limited to legal texts and analysis of crime composition,lacking interdisciplinary perspectives and making it difficult to reveal their characteristics and devel-opment trends.It is of great significance to use big data,artificial intelligence,and natural language processing technologies to analyze case texts of job-related crimes,reveal criminal patterns,and achieve efficient prevention.A research model and algorithm for job-relat-ed crimes based on intelligent data processing and analysis were proposed,and a system prototype was constructed.Efficiently collect multi platform job-related crime document data through customized web crawling technology.In the data preprocessing stage,jieba seg-mentation combined with deep learning sequence annotation technology is used for cleaning,segmentation,and key information extrac-tion.Based on the Word2Vec model,text information is converted into digital expressions,and combined with K-Means clustering algo-rithm and Llama3 big language model to mine key features,significantly improving the accuracy of case retrieval.Finally,crime patterns are displayed through visualization methods such as box plots and scatter plots.The experimental results show that compared to tradition-al methods,the model has improved accuracy and recall by 21%and 9%respectively,fully verifying the powerful ability of Llama3 in se-mantic understanding and feature extraction.关键词
职务犯罪/法律文书/大数据/自然语言处理/词向量模型/聚类算法Key words
Job-related crimes/Legal documents/Big data/Natural language processing/Word vector model/Clustering algorithm分类
信息技术与安全科学引用本文复制引用
姜志超,杨炳文,高谷刚,李林怡..基于自然语言处理的职务犯罪法律文书处理与分析研究[J].通信与信息技术,2026,(1):7-12,30,7.基金项目
国家自然科学基金青年项目(项目编号:72401110) (项目编号:72401110)