| 注册
首页|期刊导航|大数据|TDQE:一种面向深度学习的文本数据质量评估方法

TDQE:一种面向深度学习的文本数据质量评估方法

罗春旭 熊海旭 叶雅珍 丁滟 宗世泽 熊贇 朱扬勇

大数据2025,Vol.11Issue(6):95-107,13.
大数据2025,Vol.11Issue(6):95-107,13.DOI:10.11959/j.issn.2096-0271.2025073

TDQE:一种面向深度学习的文本数据质量评估方法

TDQE:a quality evaluation method for text data in deep learning

罗春旭 1熊海旭 1叶雅珍 1丁滟 2宗世泽 2熊贇 1朱扬勇1

作者信息

  • 1. 复旦大学计算机科学技术学院,上海 200438||上海市数据科学重点实验室,上海 200438
  • 2. 中国人民解放军国防科技大学计算机学院,湖南 长沙 410073
  • 折叠

摘要

Abstract

Text data quality is an important factor affecting the performance of language models.and its evaluation methodology is considered decisive for model training effectiveness.To address the issues of high computational costs and incomplete evaluation metrics in existing text data quality assessment methods,a deep learning-oriented text data quality evaluation(TDQE)method was proposed.Specifically,(1)the Dropout module of a text summarization model was utilized to generate multiple stochastic sub-networks,producing embedded representations of data samples to capture semantic consistency,thereby evaluating sample robustness;(2)a text similarity matching model was employed to compute the alignment between data samples and their summaries,assessing sample accuracy;(3)weighted robustness and accuracy metrics were designed to quantify overall text data quality.Comparative experiments were conducted on public datasets between TDQE and state-of-the-art methods,and the results demonstrated that TDQE outperformed existing mainstream algorithms.

关键词

深度学习/文本数据/数据质量/质量评估

Key words

deep learning/text data/data evaluation/quality evaluation

分类

计算机与自动化

引用本文复制引用

罗春旭,熊海旭,叶雅珍,丁滟,宗世泽,熊贇,朱扬勇..TDQE:一种面向深度学习的文本数据质量评估方法[J].大数据,2025,11(6):95-107,13.

大数据

2096-0271

访问量0
|
下载量0
段落导航相关论文