一种基于后门技术的深度强化学习水印框架OACSTPCD
A Deep Reinforcement Learning Watermarking Framework Based on Backdoor Technology
深度强化学习(DRL)已经证明了它在各种复杂任务中的有效性,因其出色的性能使其商业化正在急剧加速.生成一个DRL模型需要大量的计算资源和专业知识,使得一个训练有素的DRL模型已经成为人工智能应用程序和产品的核心知识产权.基于对DRL模型的产权保护,防止非法抄袭、未经授权的分发和复制,提出一种后门技术的DRL水印框架DrlWF,并使用一个全新的评价指标水印动作实现比例来衡量水印性能.通过向训练状态中添加水印,并使用带有水印的水印状态训练模型从而实现将水印嵌入至模型中.框架中的水印嵌入操作可以通过将水印嵌入到少量的训练数据中(仅需 0.025%的训练数据)和不影响性能的奖励修改来实现.实验结果证明,在标准状态下,DRL模型仍具有良好的性能;在水印状态下,DRL模型性能将急剧下降,不足原有性能的 1%,且水印动作执行比例达到了 99%.通过急剧下降的性能以及模型对水印状态的动作表现,即可验证模型的所有权.此外,该水印具有良好的鲁棒性,在模型微调和模型压缩下,模型依然能够识别出水印,性能急剧下降且水印动作执行比例依旧达到了 99%以上,证明了该DRL水印具有良好的鲁棒性.
Deep Reinforcement Learning(DRL)has demonstrated its effectiveness in various complex tasks.Its outstanding performance is rapidly accelerating its commercialization.Generating a DRL model requires substantial computational resources and expertise,making a well-trained DRL model the core intellectual property of artificial intelligence applications and products.Protecting these DRL models from illegal plagiarism,unauthorized distribution,and copying is crucial.We propose a DRL watermarking framework,DrlWF,based on backdoor technology to protect the intellectual property of DRL models.It uses a new evaluation metric,watermark action execution rate,to measure watermarking performance.We embed a watermark pattern into the training state and use watermark states with this watermark pattern to train the model.The watermark embedding operation in our scheme can be achieved through minimal data watermarking(only 0.025%of train-ing data)and reward modification without affecting model performance.Experimental results demon-strate that the DRL model performs well under normal conditions.However,its performance drops sharply to less than 1%of its original performance under watermarked conditions,with a watermark action execution rate of 99%.The ownership of the model can be verified by the sharp decline in performance and the model's response towards watermark states.In addition,the DRL watermark proposed in this paper has good robustness,as the model can still recognize the watermark and maintain a watermark action execution rate of over 99%,even under model fine-tuning and com-pression.
陈瑜霖;姚志强;金彪;李璇;蔡娟娟;熊金波
福建师范大学计算机与网络空间安全学院,福建 福州 350117福建师范大学计算机与网络空间安全学院,福建 福州 350117||福建省大数据分析与应用工程研究中心,福建 福州 350117
计算机与自动化
深度强化学习知识产权保护后门攻击神经网络水印黑盒模型
deep reinforcement learningintellectual property protectionbackdoorneural network watermarkingblack-box model
《福建师范大学学报(自然科学版)》 2024 (001)
96-105 / 10
国家自然科学基金资助项目(62272103、62272102);福建省自然科学基金资助项目(2023J01531);福建省教育厅中青年教师科研项目(JAT220045)
评论