| 注册
首页|期刊导航|清华大学学报(自然科学版)|基于联邦知识蒸馏的跨语言社交媒体事件检测

基于联邦知识蒸馏的跨语言社交媒体事件检测

周帅帅 朱恩昌 高盛祥 余正涛 线岩团 赵子霄 陈霖

清华大学学报(自然科学版)2025,Vol.65Issue(5):854-866,13.
清华大学学报(自然科学版)2025,Vol.65Issue(5):854-866,13.DOI:10.16511/j.cnki.qhdxxb.2024.21.035

基于联邦知识蒸馏的跨语言社交媒体事件检测

Cross-lingual social event detection based on federated knowledge distillation

周帅帅 1朱恩昌 1高盛祥 1余正涛 1线岩团 1赵子霄 1陈霖1

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,昆明 650500||昆明理工大学云南省人工智能重点实验室,昆明 650500
  • 折叠

摘要

Abstract

[Objective]Social event detection involves the identification of trending events from various social event contents.It has garnered widespread attention in recent years.However,in practical scenarios,the performance of social event detection suffers in the case of data scarcity.Moreover,privacy concerns have resulted in regulatory restrictions preventing organizations from sharing data without explicit user consent,which makes centralized data training impractical.The current approaches primarily address the issue of data scarcity through cross-lingual knowledge transfer but often ignore the challenges associated with data privacy.Consequently,this study proposes a framework for cross-lingual social event detection based on federated knowledge distillation,referred to as FedEvent,with the goal of distilling knowledge from high-resource clients to low-resource ones.[Methods]The framework employs parameter-efficient fine-tuning techniques and triple contrastive losses to effectively map non-English semantic spaces to English ones,and employs a federated distillation strategy to ensure data privacy.In addition,a four-stage lifecycle mechanism is designed to adapt to incremental scenarios.It includes four stages:pre-training of high-resource clients,pre-training of low-resource clients,client detection,and maintenance.In the first stage,a specific algorithm is used to train the initial model.In the second stage,the model trained in the first stage uses the central server as a medium for knowledge transfer to assist low-resource clients in training the initial model.In the third stage,the trained model is used to directly detect each incoming message.In the fourth stage,the model is continuously trained with the latest message blocks,and a federated codistillation mechanism is used to enable online learning for each client,allowing for the model to learn new knowledge.Based on the large-scale public English dataset Events2012,this study supplements event samples in Chinese and Vietnamese according to its topic descriptions,constructs private datasets for low-resource clients(Chinese-language client and Vietnamese-language client),and thereby establishes a multi-client,cross-lingual experimental environment.The performance of the framework is evaluated on the aforementioned datasets across two widely used clustering evaluation metrics:Normalized Mutual Information(NMI)and Adjusted Mutual Information(AMI).[Results]1)The experimental results demonstrate that compared with the existing methods,the proposed framework achieves effective knowledge transfer while ensuring data privacy.2)In comparison with the single-node setup,the proposed framework demonstrates notable enhancement in the multimode environment on the Chinese-language client from 1.6%to 204.3%in NMI and from 2.0% to 342.9% in AMI.On the Vietnamese-language client,the respective improvements are noted to be between 2.3% to 6 200% and 0 to 4 400%.3)The proposed framework performs very similarly to the state-of-the-art centralized method Cross-Lingual Knowledge Distillation(CLKD).4)Case analysis reveals that FedEvent's outstanding performance on clusters with high-resource clients significantly influences its effectiveness on analogous clusters with low-resource clients.This demonstrates that FedEvent can effectively transfer knowledge from high-resource clients to low-resource ones.Furthermore,visual analysis vividly highlights the superior clustering outcomes achieved by FedEvent.[Conclusions]The framework employs a lifecycle mechanism to accommodate the needs of event detection both in online and offline scenarios,effectively transferring knowledge through process and outcome supervision.Using knowledge distillation techniques,it mitigates the challenges faced when addressing low-resource languages and leverages a cross-lingual word embedding module to map semantic spaces between various languages.The proposed framework achieved the expected results,notably enhancing the model's performance.

关键词

社交媒体事件检测/低资源/联邦知识蒸馏/跨语言知识迁移

Key words

social event detection/low-resource/federated knowledge distillation/cross-lingual knowledge transfer

分类

信息技术与安全科学

引用本文复制引用

周帅帅,朱恩昌,高盛祥,余正涛,线岩团,赵子霄,陈霖..基于联邦知识蒸馏的跨语言社交媒体事件检测[J].清华大学学报(自然科学版),2025,65(5):854-866,13.

基金项目

国家自然科学基金联合基金重点项目(U23A20388) (U23A20388)

国家自然科学基金面上项目(62376111) (62376111)

国家自然科学基金地区项目(62266028,62266027) (62266028,62266027)

清华大学学报(自然科学版)

OA北大核心

1000-0054

访问量0
|
下载量0
段落导航相关论文