计算机工程与应用2026,Vol.62Issue(8):176-188,13.DOI:10.3778/j.issn.1002-8331.2507-0389
视觉语言大模型驱动的多模态工业缺陷检测与智能决策
Multimodal Industrial Defect Detection and Intelligent Decision-Making Driven by Large Vision-Language Models
摘要
Abstract
Defect detection is an important application scenario in the industrial field.Aiming at the problems of low traditional detection efficiency,intelligent analysis and processing of industrial data,insufficient generalization ability of single-modal system and hallucination of large models,an enhanced industrial defect intelligent inspection system is proposed and implemented to support multi-source heterogeneous data collaborative processing and efficient decision-making.The system fine-tunes a multimodal large model for industrial anomaly detection.Through improved cross-modal feature alignment algorithms and prompt learning,it achieves semantic fusion of multi-source data such as images and texts,and synchronously outputs semantic descriptions of defect recognition results.It constructs an industrial data knowl-edge base,uses RAG retrieval to enhance generation,suppresses model hallucinations and improves detection credibility and decision-making effects.Combined with Depth-Anything-V2,it generates highly consistent depth maps to support 3D quantitative analysis of defects,breaking through the limitations of traditional 2D detection.A natural language-driven Excel intelligent analysis module automatically extracts and visualizes quality inspection table data.The OCR intelligent inspection module integrates PaddleOCR and ErnieBot to realize text extraction and semantic understanding of industrial documents.Finally,the agent integrates the core functional modules to provide comprehensive decision-making advice.Tests on 5 typical industrial parts such as metals and screws show that the system's average recognition accuracy for defect types reaches 95.26%,with an average error of 2.9 pixels in defect location positioning.It expands the automation level and analysis dimensions of industrial quality inspection,providing a practical technical solution for the intelligent transformation of manufacturing.关键词
工业缺陷检测/多模态融合/大视觉语言模型(LVLM)/深度图分析/检索提高生成(RAG)/数据决策/智能体Key words
industrial defect detection/multimodal fusion/large vision-language model(LVLM)/depth map analysis/retrieval-augmented generation(RAG)/decision-making/agent分类
信息技术与安全科学引用本文复制引用
董禹彤,侯惠芳,龚明明,陈自康..视觉语言大模型驱动的多模态工业缺陷检测与智能决策[J].计算机工程与应用,2026,62(8):176-188,13.基金项目
河南省科技攻关项目(252102221018) (252102221018)
河南工业大学校级大学生创新训练计划项目(202510463088). (202510463088)