| 注册
首页|期刊导航|软件导刊|面向移动端智能体的端到端自动化评测框架

面向移动端智能体的端到端自动化评测框架

张光华 王宇 张思远 褚一杰

软件导刊2026,Vol.25Issue(3):9-18,10.
软件导刊2026,Vol.25Issue(3):9-18,10.DOI:10.11907/rjdk.251724

面向移动端智能体的端到端自动化评测框架

An End-to-End Automated Evaluation Framework for On-Device Mobile Agents

张光华 1王宇 1张思远 2褚一杰2

作者信息

  • 1. 河北科技大学 信息科学与工程学院
  • 2. 河北科技大学 澳联大信息工程学院,河北 石家庄 050018
  • 折叠

摘要

Abstract

With the growing deployment of mobile agents on smartphones,there remains a lack of a unified automated framework for evaluat-ing their functionality,performance,and safety.To address this,this paper proposes MESA:a unified automated evaluation framework for mo-bile agents.First,a multi-dimensional metric system is constructed to define evaluation standards.Then,based on this system,an end-to-end automated mechanism is built to generate user-level tasks,drive agent operations,collect interaction data,and automatically assess task completion and safety risks.Finally,a unified interface and abstraction model are designed to enable cross-platform adaptation.Comparative experiments with existing baseline evaluation methods show that MESA provides finer-grained characterization of task completion,a higher de-gree of automation in the evaluation pipeline,and broader coverage of functional evaluation metrics.Using MESA,we evaluate ten mainstream domestic and international mobile agents;the average task completion rate is 30.74%and the average latency is 13.49 s.In terms of safety,the incidence of prompt-injection attacks is 15.72%,the average data leakage and privacy violation rate is 23.28%,the average unauthorized tool use and permission abuse rate is 19.64%,and the average instruction drift and task interruption rate is 37.10%.These results validate the effectiveness of the proposed framework and provide a unified tool and methodology for evaluating mobile agents.

关键词

移动端智能体/自动化评测/多模态评测/安全风险/任务完成度

Key words

mobile agents/automated evaluation/multimodal evaluation/safety risks/task completion rate

分类

信息技术与安全科学

引用本文复制引用

张光华,王宇,张思远,褚一杰..面向移动端智能体的端到端自动化评测框架[J].软件导刊,2026,25(3):9-18,10.

软件导刊

1672-7800

访问量1
|
下载量0
段落导航相关论文