铁道标准设计2025,Vol.69Issue(11):17-25,9.DOI:10.13238/j.issn.1004-2954.202401280001
基于大语言模型的陆路交通基础设施设计规范智能抽取
Intelligent Extraction of Land Transportation Infrastructure Design Specifications Based on Large Language Models
摘要
Abstract
The design specifications in the field of land transportation infrastructure are characterized by a low degree of structure,scarce annotated data,and difficult-to-learn nested entities.In order to achieve the intelligent extraction of design specification knowledge and improve the level of intelligent retrieval and application of the specifications,this paper,based on an extensive review of literature,focuses on the design specifications for routes,subgrades,bridges,and tunnels as examples of structures.A rule for extracting knowledge from specification provisions using the"subject+condition+measures"structure was proposed.Regular expressions were applied to clean the specification text data.Based on the format characteristics of the specifications,the key semantic information and the types of relationships to be extracted were defined.Building upon the ChatGLM2-6B large language model from Tsinghua University,the LoRA fine-tuning method was adopted to conduct reinforcement learning with human feedback.The annotated dataset was used for supervised learning.A similarity algorithm was introduced to automatically score and rank the results for reward learning.The KL divergence algorithm was adopted to construct the reinforcement learning model.The extraction results underwent quality analysis and structured data integration,and a specification knowledge graph was constructed.The results showed that the reinforcement learning model had good convergence,with a quality score exceeding 90 points,representing a 35%to 99%improvement over the original large language model.When applied to specification text knowledge extraction tasks,the model could accurately extract the key semantic information of"subject","condition",and"measure"from specification provisions,and identify logical or sequential relationships in nested text statements.The extraction performance achieves significant results.Unstructured data governance methods for specification knowledge are established.The intelligent and efficient extraction of design specification knowledge in the field of land transportation is achieved,laying the foundation for the construction of regulatory knowledge graphs and the development of knowledge applications.关键词
大语言模型/陆路交通/设计规范/知识抽取/语义信息/相似度算法/强化学习模型Key words
large language model/land transportation/design specification/knowledge extraction/semantic information/similarity algorithm/reinforcement learning model分类
交通运输引用本文复制引用
谢浩,姚洪锡,钟晶,陈瓴,蒲建伟..基于大语言模型的陆路交通基础设施设计规范智能抽取[J].铁道标准设计,2025,69(11):17-25,9.基金项目
国家重点研发计划项目(2021YFB2600404) (2021YFB2600404)
中国铁建股份有限公司科技研发计划项目(2022-A02) (2022-A02)