| 注册
首页|期刊导航|北京大学学报(自然科学版)|面向新闻文本的汉藏新词抽取及分析

面向新闻文本的汉藏新词抽取及分析

庞仙 陈波 赵小兵

北京大学学报(自然科学版)2025,Vol.61Issue(1):45-52,8.
北京大学学报(自然科学版)2025,Vol.61Issue(1):45-52,8.DOI:10.13209/j.0479-8023.2025.001

面向新闻文本的汉藏新词抽取及分析

Extraction and Analysis of Chinese-Tibetan New Words from News Texts

庞仙 1陈波 2赵小兵2

作者信息

  • 1. 教育部语言文字应用研究所,北京 100010||首都师范大学文学院,北京 100089||中央民族大学国家语言资源监测与研究民族语言中心,北京 100081
  • 2. 中央民族大学国家语言资源监测与研究民族语言中心,北京 100081||中央民族大学信息工程学院,北京 100081
  • 折叠

摘要

Abstract

This paper proposes an effective unsupervised extraction method for news text.Combined with the unsupervised TopWORDS algorithm and the word segmentation tool PKUSEG,and aided by the heuristic word extraction method,the annual new words are extracted from Chinese and Tibetan news texts.A total of 606 new words in Chinese and 664 new words in Tibetan are extracted for 2022.In terms of efficiency,this method reduces the workload of manual selection and significantly improves the efficiency of new words extraction.In terms of effect,compared with the 2022 Chinese new words published in the"Language Situation in China:2023",the new words extracted by this method have obvious advantages in terms of number and language.In addition,this paper aligns the Chinese and Tibetan new words.A case study is engaged from the perspective of the development and use of new words.

关键词

新闻文本/汉文/藏文/新词抽取

Key words

news text/Chinese/Tibetan/new words extraction

引用本文复制引用

庞仙,陈波,赵小兵..面向新闻文本的汉藏新词抽取及分析[J].北京大学学报(自然科学版),2025,61(1):45-52,8.

基金项目

国家社会科学基金重大项目(22&ZD035)资助 (22&ZD035)

北京大学学报(自然科学版)

OA北大核心

0479-8023

访问量0
|
下载量0
段落导航相关论文