| 注册
首页|期刊导航|远程教育杂志|大语言模型在教育研究样本模拟中的保真度与偏差分析

大语言模型在教育研究样本模拟中的保真度与偏差分析

杜君磊 李爽 陈靖茜 李晶

远程教育杂志2025,Vol.43Issue(3):73-86,14.
远程教育杂志2025,Vol.43Issue(3):73-86,14.DOI:10.15881/j.cnki.cn33-1304/g4.2025.03.008

大语言模型在教育研究样本模拟中的保真度与偏差分析

Analyzing the Fidelity and Bias of Large Language Models in Simulated Samples for Educational Research—A Case Study on Online Self-Regulated Learning

杜君磊 1李爽 1陈靖茜 1李晶1

作者信息

  • 1. 北京师范大学远程教育研究中心 北京 100875
  • 折叠

摘要

Abstract

Simulating research samples using large language models(LLMs)has emerged as a promising paradigm in educational research.However,few studies have systematically assessed the fidelity of these simulated samples by comparing them directly with real-world data.This study focuses on self-regulated learning in blended learning environments.Two types of prompts were developed:one based on students'basic demographic information and subject grades(basic model),and the other enhanced with indicators of online learning behavior(enhanced model).Using a real dataset of 173 seventh-and eighth-grade students,the study systematically evaluates the performance of three LLMs,including GLM-4-plus,GPT-4o,and o1-preview,in generating simulated samples.The fidelity of simulated data were assessed in terms of reliability and validity,data distribution,and hypothesis testing outcomes.The results indicate that GPT-4o and o1-preview exhibit higher fidelity in internal consistency,gender distribution,subgroup representation,and correlations with online behavioral indicators;whereas GLM-4-plus performs comparatively poorly.Moreover,enhanced prompts significantly improve structural validity and reduce gender-related bias.Nonetheless,discrepancies remain between LLM-generated and human data,particularly with respect to diversity and inter-variable relationships.This study provides empirical insights into the"machine reasoning"of LLMs when modeling self-regulated learning competence and offers methodological guidance for the use of LLM-generated samples in educational research.

关键词

大语言模型/教育研究方法/模拟样本/在线自我调节学习

Key words

Large language model/Educational research methodology/Simulated samples/Online self-regulated learning

分类

社会科学

引用本文复制引用

杜君磊,李爽,陈靖茜,李晶..大语言模型在教育研究样本模拟中的保真度与偏差分析[J].远程教育杂志,2025,43(3):73-86,14.

基金项目

本文系国家重点研发计划"社会治理与智慧社会科技支撑"2021年度揭榜挂帅项目"大规模学生跨学段成长跟踪研究"(项目编号:2021YFC3340800)的研究成果. (项目编号:2021YFC3340800)

远程教育杂志

OA北大核心

1672-0008

访问量0
|
下载量0
段落导航相关论文