| 注册
首页|期刊导航|计算机工程与应用|视觉语言大模型驱动的多模态工业缺陷检测与智能决策

视觉语言大模型驱动的多模态工业缺陷检测与智能决策

董禹彤 侯惠芳 龚明明 陈自康

计算机工程与应用2026,Vol.62Issue(8):176-188,13.
计算机工程与应用2026,Vol.62Issue(8):176-188,13.DOI:10.3778/j.issn.1002-8331.2507-0389

视觉语言大模型驱动的多模态工业缺陷检测与智能决策

Multimodal Industrial Defect Detection and Intelligent Decision-Making Driven by Large Vision-Language Models

董禹彤 1侯惠芳 1龚明明 2陈自康1

作者信息

  • 1. 河南工业大学 人工智能与大数据学院,郑州 450000
  • 2. 科大讯飞 讯飞聆智人才培养业务部,郑州 450000
  • 折叠

摘要

Abstract

Defect detection is an important application scenario in the industrial field.Aiming at the problems of low traditional detection efficiency,intelligent analysis and processing of industrial data,insufficient generalization ability of single-modal system and hallucination of large models,an enhanced industrial defect intelligent inspection system is proposed and implemented to support multi-source heterogeneous data collaborative processing and efficient decision-making.The system fine-tunes a multimodal large model for industrial anomaly detection.Through improved cross-modal feature alignment algorithms and prompt learning,it achieves semantic fusion of multi-source data such as images and texts,and synchronously outputs semantic descriptions of defect recognition results.It constructs an industrial data knowl-edge base,uses RAG retrieval to enhance generation,suppresses model hallucinations and improves detection credibility and decision-making effects.Combined with Depth-Anything-V2,it generates highly consistent depth maps to support 3D quantitative analysis of defects,breaking through the limitations of traditional 2D detection.A natural language-driven Excel intelligent analysis module automatically extracts and visualizes quality inspection table data.The OCR intelligent inspection module integrates PaddleOCR and ErnieBot to realize text extraction and semantic understanding of industrial documents.Finally,the agent integrates the core functional modules to provide comprehensive decision-making advice.Tests on 5 typical industrial parts such as metals and screws show that the system's average recognition accuracy for defect types reaches 95.26%,with an average error of 2.9 pixels in defect location positioning.It expands the automation level and analysis dimensions of industrial quality inspection,providing a practical technical solution for the intelligent transformation of manufacturing.

关键词

工业缺陷检测/多模态融合/大视觉语言模型(LVLM)/深度图分析/检索提高生成(RAG)/数据决策/智能体

Key words

industrial defect detection/multimodal fusion/large vision-language model(LVLM)/depth map analysis/retrieval-augmented generation(RAG)/decision-making/agent

分类

信息技术与安全科学

引用本文复制引用

董禹彤,侯惠芳,龚明明,陈自康..视觉语言大模型驱动的多模态工业缺陷检测与智能决策[J].计算机工程与应用,2026,62(8):176-188,13.

基金项目

河南省科技攻关项目(252102221018) (252102221018)

河南工业大学校级大学生创新训练计划项目(202510463088). (202510463088)

计算机工程与应用

1002-8331

访问量0
|
下载量0
段落导航相关论文