计算机工程与应用2026,Vol.62Issue(9):20-45,26.DOI:10.3778/j.issn.1002-8331.2506-0198
大语言模型驱动的表格智能处理技术研究综述
Survey of Large Language Model Driven Intelligent Processing of Tabular Data
摘要
Abstract
Due to the structured and semi-structured characteristics of tabular data,it plays an irreplaceable pivotal role in tasks such as data warehouse construction,knowledge graph alignment and feature engineering.In recent years,large lan-guage models provide new technical possibilities for intelligent processing of"unified modeling"tables by means of extremely long context windows,zero/few-shot transfer capabilities,and pluggable tool interfaces.Firstly,this paper systematically combs the mainstream representation forms of tabular data and their influence on the complexity of subsequent tasks,and establishes the mapping between task pedigrees and typical datasets covering four categories of tasks:structure under-standing,data understanding,reasoning,and data augmentation.From four dimensions of pre-training strategy,text serial-ization and embedding,prompt engineering and inference chain design,and retrieval enhancement generation and dynamic feedback mechanism,the key technologies of large language models enabling intelligent processing of forms are summa-rized.Secondly,according to the four modules of structure parsing,table understanding,reasoning generation and data governance,the latest applications of large language models in table detection and positioning,entity alignment,Text-to-SQL and other tasks are analyzed.Finally,the core bottlenecks faced by large language models in tabular scenarios are pointed out:insufficient structural understanding,cross-modal alignment errors,scarcity of labeled data,and poor execu-tion reliability.Looking forward to the future,with the continuous evolution of the multi-agent collaborative decision-making framework,the structure-symbol hybrid reasoning paradigm,and the industry-specific instruction fine-tuning tech-nology,the large language model is expected to complete the intelligent processing tasks of forms robustly,interpretable,and efficient in real business scenarios.关键词
大语言模型/链式推理/提示工程/表格理解/表格数据清洗与异常检测/多模态融合Key words
large language models/chain reasoning/prompt engineering/table understanding/tabular data cleaning and anomaly detection/multimodal fusion分类
信息技术与安全科学引用本文复制引用
王星凯,奚雪峰,崔志明,王飞,郑倩..大语言模型驱动的表格智能处理技术研究综述[J].计算机工程与应用,2026,62(9):20-45,26.基金项目
国家自然科学基金(62176175) (62176175)
苏州市水利水务科技项目(2023008). (2023008)