摘要
Abstract
Multi-modal prediction tasks typically require the simultaneous modeling of heterogeneous data,including text,images and structured numerical information,to achieve robust inference and explainable decision-making in complex environments.Traditional uni-modal or weak fusion methods struggle to consistently address semantic alignment,information complementation and cross-source reasoning,while the inherent black-box nature of deep models limits the result interpretability.Meanwhile,the large language model(LLM)has demonstrated strong capabilities in semantic understanding,instruction following,and reasoning,yet a gap remains in their performance for time series modeling,cross-modal alignment,and real-time knowledge integration.To address these challenges,this paper proposes a LLM-guided multi-modal time series-semantic prediction framework.By combining variational inference-based time series modeling with LLM-driven semantic analysis,the approach establishes a collaborative"temporal-semantic-decision"mechanism:The temporal module extracts historical behavior patterns using recurrent latent variables and attention mechanisms;the semantic module distills high-level semantics and interpretations through domain-specific language models and multi-modal encoders;and both components are jointly optimized via a learnable fusion module,which also provides uncertainty annotations and explainable reports.Experiments on the StockNet,CMIN-US,and CMIN-CN datasets demonstrate that the approach achieves an accuracy of 63.54%,an improvement of 5.31 percentage points over the best baseline and an Matthews correlation coefficient(MCC)elevated to 0.223.This study offers a unified paradigm for multi-modal time series prediction and underscores its promising application in the field of financial technology.关键词
多模态/大语言模型/人工智能/预训练模型/时间序列预测Key words
multi-modal/large language model(LLM)/artificial intelligence/pre-trained model/time series prediction分类
信息技术与安全科学