心理学报2025,Vol.57Issue(6):987-1000,中插11,15.DOI:10.3724/SP.J.1041.2025.0987
基于大语言模型的自杀意念文本数据增强与识别技术
Suicidal ideation data augmentation and recognition technology based on large language models
摘要
Abstract
Suicide constitutes a significant global public health challenge,with the World Health Organization reporting substantial annual mortality rates.Traditional suicide detection methods primarily depend on self-assessment scales and clinical evaluations,which require considerable resources and rely on patients actively seeking assistance.The integrated motivational-volitional(IMV)model offers a theoretical framework for comprehending suicidal behavior progression,with suicidal ideation serving as a critical risk indicator.While text-based analysis presents a promising non-invasive approach for early identification,it encounters technical challenges due to limited annotated data and linguistic complexity.Large Language Models(LLMs)offer unprecedented capabilities in language understanding and generation,potentially addressing these challenges through their ability to comprehend diverse expressions of suicidal ideation and generate high-quality training data. This research employed a two-stage design leveraging LLMs to address the challenge of limited training data for suicidal ideation recognition.In Study I,we selected ChatGLM3-6B and Qwen-7B-Chat as foundation LLMs and implemented both zero-shot and few-shot learning approaches combined with supervised learning strategies.We extracted examples from an original dataset of Weibo comments to create high-quality training data for the LLMs.Comparative experiments evaluated model performance,with human coders assessing the quality of LLM-generated texts using established suicide risk evaluation criteria.In Study Ⅱ,we evaluated the impact of LLM-based data augmentation on recognition models by comparing traditional machine learning approaches with LLM-based methods trained on both original and augmented datasets,measuring performance through accuracy and true negative rate metrics. In Study I,the two self-developed LLM-based models demonstrated excellent performance in suicidal ideation data augmentation,significantly outperforming baseline models according to comprehensive evaluation metrics.The success of these LLM-enhanced models highlighted the effectiveness of high-quality data construction through advanced language modeling capabilities.In Study Ⅱ,all experimental models trained on LLM-augmented data significantly outperformed their corresponding baseline models in both accuracy and true negative rate.The highest-performing model utilized the ChatGLM3-6B architecture with few-shot learning,showing marked improvements compared to its baseline counterpart.These findings demonstrate the substantial impact of LLM-based data augmentation on model generalization ability,particularly in capturing diverse and subtle expressions of suicidal ideation that traditional approaches often miss. This study validates the effectiveness of LLM-based data augmentation methods in enhancing suicidal ideation recognition while addressing data scarcity challenges.The non-invasive approach developed through LLM technology has the potential to provide timely and effective early warning of suicide risk while protecting user privacy.This research contributes to both theoretical understanding of LLMs' capabilities in complex psychological text processing and practical applications in mental health monitoring.Future research should explore cross-platform applicability of LLMs,model interpretability,and ethical considerations to further advance this promising technology in suicide prevention and broader mental health applications.关键词
自杀意念/数据增强/自杀文本识别/大语言模型/人工智能Key words
suicidal ideation/data augmentation/suicide text recognition/large language models/artificial intelligence分类
心理学引用本文复制引用
章彦博,黄峰,莫柳铃,刘晓倩,朱廷劭..基于大语言模型的自杀意念文本数据增强与识别技术[J].心理学报,2025,57(6):987-1000,中插11,15.基金项目
国家自然科学基金面上项目(62272206),北京市自然科学基金(IS23088)资助. (62272206)