首页|期刊导航|现代情报|人机交互多轮对话情境下大语言模型的"迷失现象"

人机交互多轮对话情境下大语言模型的"迷失现象"

张卫东李奉芮胡文杰

现代情报2026，Vol.46Issue(5)：28-40,13.

现代情报2026，Vol.46Issue(5)：28-40,13.DOI:10.3969/j.issn.1008-0821.2026.05.003

人机交互多轮对话情境下大语言模型的"迷失现象"

"Lost in Conversation"Phenomenon of Large Language Models in Multi-Turn Dialogue Scenarios

张卫东 ¹李奉芮 ¹胡文杰¹

作者信息

1. 吉林大学商学与管理学院,吉林长春 130012
折叠

摘要

Abstract

[Purpose/Significance]Multi-turn dialogue has become a dominant interaction paradigm for large language models(LLMs)in complex cognitive tasks,yet mainstream evaluations rely on information-complete single-turn tasks that fail to capture the distributional risks introduced by multi-turn interactions.Although the"Lost in Conversation"pheno-menon has been identified in English settings,whether it generalizes to the Chinese context,which task types are most vul-nerable,and what variables drive the instability remain systematically unexplored.This study aimed to provide controlled,verifiable quantitative evidence on the existence,structural characteristics,and key drivers of this phenomenon in Chinese multi-turn dialogue.[Method/Process]This study constructed a controlled comparative framework grounded in an"infor-mation equivalence"principle,under which the single-turn and multi-turn conditions shared identical semantic content and constraints,with only the presentation mode varied.A total of 250 single-turn samples spanning five task types—read-ing comprehension,logical reasoning,text classification,long-text processing,and summarization—were curated from established Chinese benchmarks(CMRC 2018,CLUE,DuReader,Math23K,LCSTS),and each sample was converted into a paired multi-turn counterpart through an"instruction slicing—alignment verification—human final review"pipe-line.Four LLMs with distinct architectures and parameter scales(Gemini-2.5-Pro,Claude-3.7-Sonnet,Qwen1.5-72B-Chat,Yi-1.5-34B-Chat)each ran 10 repeated trials per instance under low-entropy decoding,yielding 20,000 scored observations evaluated by an LLM-as-a-Judge protocol supplemented with human spot-checks.The study assessed output distributions through mean performance,percentile bounds,and fluctuation ranges,and verified robustness via Cohen's d,95%confidence intervals,and two-way ANOVA.Controlled-effect experiments on 52 underperforming instances further isolated the independent contributions of temporal order,interaction frequency,and language context.[Result/Conclusion]The results confirm that multi-turn interaction in the Chinese context consistently reduces mean performance and amplifies uncertainty,exhibiting a structural pattern of"stable upper bounds,precipitous lower-bound drops,and expanded fluctua-tion."Mean scores decrease by 8.05 to 18.03 points across models,with all Cohen's d values exceeding 0.93.Task structure significantly moderates degradation severity:long-text processing and summarization suffer the most severe lower-bound breakdowns(P10 dropping to 43 and 52),while text classification remains comparatively resilient.Two-way ANOVA confirms that both main effects and their interaction reach high significance(P<0.0001).Controlled experiments reveal that interaction frequency is the dominant driver—compressing dialogue turns substantially recovers performance and narrows fluctuation across all task types.These findings point to a cascading mechanism of"cognitive load accumula-tion—attention dilution—state drift,"and suggest that designing fewer turns with denser instructions and embedding inter-mediate checkpoint mechanisms are essential practical strategies for reliable human-AI collaboration in high-stakes environments.

关键词

大语言模型/人机交互/多轮对话/迷失现象/受控实验/提示工程

Key words

large language models/human-computer interaction/multi-turn dialogue/lost in conversation pheno-menon/controlled experiment/prompt engineering

分类

社会科学

引用本文复制引用

张卫东,李奉芮,胡文杰..人机交互多轮对话情境下大语言模型的"迷失现象"[J].现代情报,2026,46(5):28-40,13.

现代情报

OACHSSCD

ISSN：1008-0821

访问量1

下载量0

段落导航