大型语言模型与学生在考试中的表现比较研究OA
Comparative Study of Large Language Models and Student Performance in Exams
研究聚焦于大型语言模型(LLM)驱动的AI聊天机器人——通义千问(Qwen),在教育评估中的应用潜力.基于2019-2023年某高校"概率论与数理统计"2 190道期末考题,8位教师对通义千问模型、优化后模型及学生答案进行双盲评分.结果显示,通义千问在选择题中表现稳定,而在解答题中有较大提升空间,特别是在经过Prompt Engineering优化后,其解答题表现显著提升.教师对AI生成内容的评分更为严格,评分受题型和答题主体的影响显著.该研究提供AI辅助教育评估的实证,强调更新标准与探索新模式的重要性.
The research focuses on the application potential of Qwen,an AI chatbot driven by LLM,in educational assessment.Based on 2 190 final examination questions of"Probability and Mathematical Statistics"in a university from 2019 to 2023,eight teachers double-blind score the Qwen Model,the optimized model and the students'answers.The results show that the performance of Qwen is stable in multiple choice questions,but there is much room for improvement in the answer questions.Especially after Prompt Engineering optimization,the performance of the answer questions is significantly improved.Teachers'scores on AI-generated content are more stringent,and the scores are significantly affected by the question type and the answer subject.This study provides empirical evidence for AI-assisted educational assessment,emphasizing the importance of updating standards and exploring new models.
凌达莲;冯诗颖;陈思楠;潘伟权
玉林师范学院 数学与统计学院,广西 玉林 537000玉林师范学院 数学与统计学院,广西 玉林 537000玉林师范学院 数学与统计学院,广西 玉林 537000玉林师范学院 数学与统计学院,广西 玉林 537000
计算机与自动化
LLM通义千问教育评估AI辅助学习
LLMQweneducational assessmentAI-assisted learning
《现代信息科技》 2025 (12)
50-57,62,9
2023年广西高等教育本科教学改革工程项目(2023JGB329)大学生创新创业训练计划项目(202110606020)
评论