Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

Wu, Y., Macdonald, C. and Ounis, I. (2021) Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation. In: 15th ACM Conference on Recommender Systems (RecSys21), Amsterdam, The Netherlands, 27 Sep - 01 Oct 2021, pp. 241-251. (doi: 10.1145/3460231.3474256)

[img] Text
246701.pdf - Accepted Version



A dialog-based interactive recommendation task is where users can express natural-language feedback when interacting with the recommender system. However, the users’ feedback, which takes the form of natural-language critiques about the recommendation at each iteration, can only allow the recommender system to obtain a partial portrayal of the users’ preferences. Indeed, such partial observations of the users’ preferences from their natural-language feedback make it challenging to correctly track the users’ preferences over time, which can result in poor recommendation performances and a less effective satisfaction of the users’ information needs when in presence of limited iterations. Reinforcement learning, in the form of a partially observable Markov decision process (POMDP), can simulate the interactions between a partially observable environment (i.e. a user) and an agent (i.e. a recommender system). To alleviate such a partial observation issue, we propose a novel dialog-based recommendation model, the Estimator-Generator-Evaluator (EGE) model, with Q-learning for POMDP, to effectively incorporate the users’ preferences over time. Specifically, we leverage an Estimator to track and estimate users’ preferences, a Generator to match the estimated preferences with the candidate items to rank the next recommendations, and an Evaluator to judge the quality of the estimated preferences considering the users’ historical feedback. Following previous work, we train our EGE model by using a user simulator which itself is trained to describe the differences between the target users’ preferences and the recommended items in natural language. Thorough and extensive experiments conducted on two recommendation datasets – addressing images of fashion products (namely dresses and shoes) – demonstrate that our proposed EGE model yields significant improvements in comparison to the existing state-of-the-art baseline models.

Item Type:Conference Proceedings
Additional Information:The authors acknowledge support from EPSRC grant EP/R018634/1 entitled Closed-Loop Data Science for Complex, Computationallyand Data-Intensive Analytics.
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Wu, Mr Yaxiong and Ounis, Professor Iadh
Authors: Wu, Y., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2021 Association for Computing Machinery
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science