首页|期刊导航|智能系统学报|中文多技能对话评估

中文多技能对话评估

柳泽明程子豪刘晶晶杨晓郭园方王蕴红

智能系统学报2025，Vol.20Issue(5)：1281-1293,13.

智能系统学报2025，Vol.20Issue(5)：1281-1293,13.DOI:10.11992/tis.202411001

中文多技能对话评估

Evaluation of Chinese multiskill dialogues

柳泽明 ¹程子豪 ¹刘晶晶 ¹杨晓 ¹郭园方 ¹王蕴红¹

作者信息

1. 北京航空航天大学计算机学院,北京 100191
折叠

摘要

Abstract

The accurate evaluation of the capabilities of a multiskilled dialogue system is important to satisfy the differ-ent demands of users,including social banter,profound knowledge-based discussions,role-playing conversations,and dialogue recommendations.Current benchmarks concentrate on assessing specific dialogue skills and cannot efficiently evaluate multiple dialogue skills concurrently.To facilitate the evaluation of multiskill dialogues,this study establishes a Chinese multiskill evaluation benchmark,which is the Multi-Skill Dialogue Evaluation Benchmark(MSDE).MSDE contains 1,781 dialogues and 21,218 utterances,which cover four common dialogue tasks:chit-chat,knowledge dialog,persona-based dialog,and dialog recommendations.We performed extensive experiments on MSDE and examined the correlation between automatic and human evaluation metrics.Results indicate that(1)among the four dialogue tasks,chit-chat is the most difficult to analyze,while knowledge dialogue is the easiest;(2)significant differences exist in the performance of various metrics on MSDE;(3)for human evaluation,the analysis complexity of each metric differs across varying dialogue tasks.Certain data will be made available on https://github.com/IRIP-LLM/MSDE,and all data will be released after sorting.

关键词

多技能对话/对话评估/闲聊/开放域对话/对话推荐/画像聊天/知识对话/大语言模型

Key words

multiskill dialogue/dialogue evaluation/chit-chat/open domain dialogue/conversational recommendation/persona-chat/knowledge-grounded dialogue/large language model

分类

信息技术与安全科学

引用本文复制引用

柳泽明,程子豪,刘晶晶,杨晓,郭园方,王蕴红..中文多技能对话评估[J].智能系统学报,2025,20(5):1281-1293,13.

基金项目

国家重点研发计划项目(2023YFF0725600) （2023YFF0725600）

国家自然科学基金项目(62406015). （62406015）

智能系统学报

OA北大核心

ISSN：1673-4785

访问量3

下载量0

段落导航