华东师范大学学报(自然科学版)Issue(5):43-52,10.DOI:10.3969/j.issn.1000-5641.2025.05.005
ATBench:面向端到端数据分析任务的分析轨迹评估基准
ATBench:Benchmark for evaluating analysis trajectories in end-to-end data analysis
摘要
Abstract
This paper introduces ATBench,a benchmark designed for evaluating analysis trajectories in end-to-end data analysis tasks,to address the limitations in granularity and domain coverage present in current benchmarks.Analysis trajectories represent the process in which an agent iteratively poses questions,derives insights,and formulates conclusions around a specific analysis goal via iterative interactions.Leveraging both existing benchmarks and real Kaggle task data,we constructed 151 evaluation datasets spanning eight distinct domains by employing an annotation strategy that balances goal-driven and exploratory approaches.Additionally,we propose a fine-grained evaluation metric,the analysis trajectory score,to assess an agent's coherent analytical capabilities during end-to-end data analysis tasks.Experimental results demonstrate that ATBench exhibits strong stability and discriminative power,effectively distinguishing performance differences among models in analytical tasks.The results also reveal the limitations in agents'abilities for coherent analysis and insight discovery,thereby providing data-driven support for future improvements.关键词
智能体/数据分析/评估基准Key words
agent/data analysis/benchmark分类
信息技术与安全科学引用本文复制引用
王旭飞,许华容,陈攀峰,陈梅,马丹,陈正曦,田旭,李晖..ATBench:面向端到端数据分析任务的分析轨迹评估基准[J].华东师范大学学报(自然科学版),2025,(5):43-52,10.基金项目
国家自然科学基金(62162010,72161005) (62162010,72161005)
国家重点研发计划(2023YFC3341202,2023YFC3341205) (2023YFC3341202,2023YFC3341205)