首页|期刊导航|软件导刊|面向移动端智能体的端到端自动化评测框架

面向移动端智能体的端到端自动化评测框架

张光华王宇张思远褚一杰

软件导刊2026，Vol.25Issue(3)：9-18,10.

软件导刊2026，Vol.25Issue(3)：9-18,10.DOI:10.11907/rjdk.251724

面向移动端智能体的端到端自动化评测框架

An End-to-End Automated Evaluation Framework for On-Device Mobile Agents

张光华 ¹王宇 ¹张思远 ²褚一杰²

作者信息

1. 河北科技大学信息科学与工程学院
2. 河北科技大学澳联大信息工程学院,河北石家庄 050018
折叠

摘要

Abstract

With the growing deployment of mobile agents on smartphones,there remains a lack of a unified automated framework for evaluat-ing their functionality,performance,and safety.To address this,this paper proposes MESA:a unified automated evaluation framework for mo-bile agents.First,a multi-dimensional metric system is constructed to define evaluation standards.Then,based on this system,an end-to-end automated mechanism is built to generate user-level tasks,drive agent operations,collect interaction data,and automatically assess task completion and safety risks.Finally,a unified interface and abstraction model are designed to enable cross-platform adaptation.Comparative experiments with existing baseline evaluation methods show that MESA provides finer-grained characterization of task completion,a higher de-gree of automation in the evaluation pipeline,and broader coverage of functional evaluation metrics.Using MESA,we evaluate ten mainstream domestic and international mobile agents;the average task completion rate is 30.74％and the average latency is 13.49 s.In terms of safety,the incidence of prompt-injection attacks is 15.72％,the average data leakage and privacy violation rate is 23.28％,the average unauthorized tool use and permission abuse rate is 19.64％,and the average instruction drift and task interruption rate is 37.10％.These results validate the effectiveness of the proposed framework and provide a unified tool and methodology for evaluating mobile agents.

关键词

移动端智能体/自动化评测/多模态评测/安全风险/任务完成度

Key words

mobile agents/automated evaluation/multimodal evaluation/safety risks/task completion rate

分类

信息技术与安全科学

引用本文复制引用

张光华,王宇,张思远,褚一杰..面向移动端智能体的端到端自动化评测框架[J].软件导刊,2026,25(3):9-18,10.

软件导刊

ISSN：1672-7800

访问量1

下载量0

段落导航