Xin, X., Karatzoglou, A., Arapakis, I. and Jose, J. M. (2022) Supervised Advantage Actor-Critic for Recommender Systems. In: 15th ACM International Conference on Web Search and Data Mining, Phoenix, Arizona, 21-25 Feb 2022, pp. 1186-1196. ISBN 9781450391320 (doi: 10.1145/3488560.3498494)
Text
257042.pdf - Accepted Version 842kB |
Abstract
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the ''advantage'' of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | The funding for Xin Xin was supported by the Natural Science Foundation of China (62106105, 62102234, 62072279, 61902219, 61972234), the National Key R&D Program of China (2020YFB1406704), the Key Scientific and Technological Innovation Program of Shandong Province (2019JZZY010129), the Natural Science Foundation of Shandong Province (ZR2021QF129), and the Fundamental Research Funds of Shandong University. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Jose, Professor Joemon and Xin, Xin |
Authors: | Xin, X., Karatzoglou, A., Arapakis, I., and Jose, J. M. |
College/School: | College of Science and Engineering College of Science and Engineering > School of Computing Science |
ISBN: | 9781450391320 |
Copyright Holders: | Copyright © 2022 Association for Computing Machinery |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record