首页|期刊导航|高技术通讯|基于语言类任务的概念化强化学习框架

基于语言类任务的概念化强化学习框架OA北大核心CSTPCD

Conceptual reinforcement learning for language-assisted tasks

中文摘要

英文摘要

语言类强化学习任务可以促进强化学习策略的泛化性,其关键问题是自动化学习观测和语言描述的通用表示.现有方法往往隐式学习联合表示,不可避免地引入训练集中的虚假相关信息,进而损伤策略的泛化性和训练效率.针对这一问题,本文提出了概念化强化学习框架(CRL),其利用概念化这种从实体提取相似性生成抽象表示的认知方式,通过基于注意力机制的概念编码器和限制性损失函数显式地学习概括且抽象的概念化表示作为强化学习策略的输入.本文在常用的语言条件任务和文本游戏任务上验证了CRL的有效性,结果显示概念化表示大幅提升了策略的训练效率(最多70%)和泛化性能(最多30%),并有效提升了策略的可解释性.

Language-assisted tasks are proposed to facilitate the generalization ability of reinforcement learning policy.The key question is to learn the general representation across different scenarios.Existing studies often implicitly learn the joint representation,which may include spurious correlation information and consequently compromise pol-icy's generalization performance and training efficiency.To address this issue,a conceptual reinforcement learning framework(CRL)is proposed,which exploits the motivation of human cognition that extracts similarits from nu-merous instances to generate conceptual abstraction,and incorporates a multi-level attention encoder and restricted loss functions to learn compact and invariant conceptual representation for the policy.Evaluated in challenging lan-guage-assisted tasks,the results demonstrate that CRL significantly improves the policy's training efficiency(up to 70%)and generalization ability(up to30%).Additionally,the conceptual representation also shows better inter-pretability than other representations.

作者：彭少辉;胡杏;支天

作者单位：中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190||中国科学院大学北京 100049||中科寒武纪科技股份有限公司北京 100080中国科学院计算技术研究所处理器芯片全国重点实验室北京 100190

中文关键词：深度强化学习(DRL)语言类强化学习任务文本游戏表示学习互信息优化

英文关键词：deep reinforcement learning(DRL)language-assisted reinforcement learning tasktext gamerepresentation learningmutual information

刊名：《高技术通讯》 2024 (006)

页码/页数：555-566 / 12

基金： 国家自然科学基金(62002338,U20A20227,U22A2028)和中国科学院稳定支持基础研究领域青年团队计划(YSBR-029)资助项目.

DOI：10.3772/j.issn.1002-0470.2024.06.001

基于语言类任务的概念化强化学习框架OA北大核心CSTPCD

Conceptual reinforcement learning for language-assisted tasks

评论