摘要
Abstract
With the growing deployment of mobile agents on smartphones,there remains a lack of a unified automated framework for evaluat-ing their functionality,performance,and safety.To address this,this paper proposes MESA:a unified automated evaluation framework for mo-bile agents.First,a multi-dimensional metric system is constructed to define evaluation standards.Then,based on this system,an end-to-end automated mechanism is built to generate user-level tasks,drive agent operations,collect interaction data,and automatically assess task completion and safety risks.Finally,a unified interface and abstraction model are designed to enable cross-platform adaptation.Comparative experiments with existing baseline evaluation methods show that MESA provides finer-grained characterization of task completion,a higher de-gree of automation in the evaluation pipeline,and broader coverage of functional evaluation metrics.Using MESA,we evaluate ten mainstream domestic and international mobile agents;the average task completion rate is 30.74%and the average latency is 13.49 s.In terms of safety,the incidence of prompt-injection attacks is 15.72%,the average data leakage and privacy violation rate is 23.28%,the average unauthorized tool use and permission abuse rate is 19.64%,and the average instruction drift and task interruption rate is 37.10%.These results validate the effectiveness of the proposed framework and provide a unified tool and methodology for evaluating mobile agents.关键词
移动端智能体/自动化评测/多模态评测/安全风险/任务完成度Key words
mobile agents/automated evaluation/multimodal evaluation/safety risks/task completion rate分类
信息技术与安全科学