Supervised Advantage Actor-Critic for Recommender Systems

Xin, X., Karatzoglou, A., Arapakis, I. and Jose, J. M. (2022) Supervised Advantage Actor-Critic for Recommender Systems. In: 15th ACM International Conference on Web Search and Data Mining, Phoenix, Arizona, 21-25 Feb 2022, pp. 1186-1196. ISBN 9781450391320 (doi: 10.1145/3488560.3498494)

[img] Text
257042.pdf - Accepted Version

842kB

Abstract

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the ''advantage'' of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

Item Type:Conference Proceedings
Additional Information:The funding for Xin Xin was supported by the Natural Science Foundation of China (62106105, 62102234, 62072279, 61902219, 61972234), the National Key R&D Program of China (2020YFB1406704), the Key Scientific and Technological Innovation Program of Shandong Province (2019JZZY010129), the Natural Science Foundation of Shandong Province (ZR2021QF129), and the Fundamental Research Funds of Shandong University.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Jose, Professor Joemon and Xin, Xin
Authors: Xin, X., Karatzoglou, A., Arapakis, I., and Jose, J. M.
College/School:College of Science and Engineering
College of Science and Engineering > School of Computing Science
ISBN:9781450391320
Copyright Holders:Copyright © 2022 Association for Computing Machinery
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record