大数据2025,Vol.11Issue(6):95-107,13.DOI:10.11959/j.issn.2096-0271.2025073
TDQE:一种面向深度学习的文本数据质量评估方法
TDQE:a quality evaluation method for text data in deep learning
罗春旭 1熊海旭 1叶雅珍 1丁滟 2宗世泽 2熊贇 1朱扬勇1
作者信息
- 1. 复旦大学计算机科学技术学院,上海 200438||上海市数据科学重点实验室,上海 200438
- 2. 中国人民解放军国防科技大学计算机学院,湖南 长沙 410073
- 折叠
摘要
Abstract
Text data quality is an important factor affecting the performance of language models.and its evaluation methodology is considered decisive for model training effectiveness.To address the issues of high computational costs and incomplete evaluation metrics in existing text data quality assessment methods,a deep learning-oriented text data quality evaluation(TDQE)method was proposed.Specifically,(1)the Dropout module of a text summarization model was utilized to generate multiple stochastic sub-networks,producing embedded representations of data samples to capture semantic consistency,thereby evaluating sample robustness;(2)a text similarity matching model was employed to compute the alignment between data samples and their summaries,assessing sample accuracy;(3)weighted robustness and accuracy metrics were designed to quantify overall text data quality.Comparative experiments were conducted on public datasets between TDQE and state-of-the-art methods,and the results demonstrated that TDQE outperformed existing mainstream algorithms.关键词
深度学习/文本数据/数据质量/质量评估Key words
deep learning/text data/data evaluation/quality evaluation分类
计算机与自动化引用本文复制引用
罗春旭,熊海旭,叶雅珍,丁滟,宗世泽,熊贇,朱扬勇..TDQE:一种面向深度学习的文本数据质量评估方法[J].大数据,2025,11(6):95-107,13.