| 注册
首页|期刊导航|华东师范大学学报(自然科学版)|ATBench:面向端到端数据分析任务的分析轨迹评估基准

ATBench:面向端到端数据分析任务的分析轨迹评估基准

王旭飞 许华容 陈攀峰 陈梅 马丹 陈正曦 田旭 李晖

华东师范大学学报(自然科学版)Issue(5):43-52,10.
华东师范大学学报(自然科学版)Issue(5):43-52,10.DOI:10.3969/j.issn.1000-5641.2025.05.005

ATBench:面向端到端数据分析任务的分析轨迹评估基准

ATBench:Benchmark for evaluating analysis trajectories in end-to-end data analysis

王旭飞 1许华容 1陈攀峰 1陈梅 1马丹 1陈正曦 1田旭 1李晖1

作者信息

  • 1. 公共大数据国家重点实验室,贵阳 550025||贵州大学 计算机科学与技术学院,贵阳 550025
  • 折叠

摘要

Abstract

This paper introduces ATBench,a benchmark designed for evaluating analysis trajectories in end-to-end data analysis tasks,to address the limitations in granularity and domain coverage present in current benchmarks.Analysis trajectories represent the process in which an agent iteratively poses questions,derives insights,and formulates conclusions around a specific analysis goal via iterative interactions.Leveraging both existing benchmarks and real Kaggle task data,we constructed 151 evaluation datasets spanning eight distinct domains by employing an annotation strategy that balances goal-driven and exploratory approaches.Additionally,we propose a fine-grained evaluation metric,the analysis trajectory score,to assess an agent's coherent analytical capabilities during end-to-end data analysis tasks.Experimental results demonstrate that ATBench exhibits strong stability and discriminative power,effectively distinguishing performance differences among models in analytical tasks.The results also reveal the limitations in agents'abilities for coherent analysis and insight discovery,thereby providing data-driven support for future improvements.

关键词

智能体/数据分析/评估基准

Key words

agent/data analysis/benchmark

分类

信息技术与安全科学

引用本文复制引用

王旭飞,许华容,陈攀峰,陈梅,马丹,陈正曦,田旭,李晖..ATBench:面向端到端数据分析任务的分析轨迹评估基准[J].华东师范大学学报(自然科学版),2025,(5):43-52,10.

基金项目

国家自然科学基金(62162010,72161005) (62162010,72161005)

国家重点研发计划(2023YFC3341202,2023YFC3341205) (2023YFC3341202,2023YFC3341205)

华东师范大学学报(自然科学版)

OA北大核心

1000-5641

访问量0
|
下载量0
段落导航相关论文