Enlighten Publications

In this section

Coordination as inference in multi-agent reinforcement learning

Li, Z., Wu, L., Su, K., Wu, W., Jing, Y., Wu, T., Duan, W., Yue, X., Tong, X. and Han, Y. (2024) Coordination as inference in multi-agent reinforcement learning. Neural Networks, 172, 106101. (doi: 10.1016/j.neunet.2024.106101) (PMID:38232426)

Full text not currently available from Enlighten.

Abstract

The Centralized Training and Decentralized Execution (CTDE) paradigm, where a centralized critic is allowed to access global information during the training phase while maintaining the learned policies executed with only local information in a decentralized way, has achieved great progress in recent years. Despite the progress, CTDE may suffer from the issue of Centralized–Decentralized Mismatch (CDM): the suboptimality of one agent’s policy can exacerbate policy learning of other agents through the centralized joint critic. In contrast to centralized learning, the cooperative model that most closely resembles the way humans cooperate in nature is fully decentralized, i.e. Independent Learning (IL). However, there are still two issues that need to be addressed before agents coordinate through IL: (1) how agents are aware of the presence of other agents, and (2) how to coordinate with other agents to improve joint policy under IL. In this paper, we propose an inference-based coordinated MARL method: Deep Motor System (DMS). DMS first presents the idea of individual intention inference where agents are allowed to disentangle other agents from their environment. Secondly, causal inference was introduced to enhance coordination by reasoning each agent’s effect on others’ behavior. The proposed model was extensively experimented on a series of Multi-Agent MuJoCo and StarCraftII tasks. Results show that the proposed method outperforms independent learning algorithms and the coordination behavior among agents can be learned even without the CTDE paradigm compared to the state-of-the-art baselines including IPPO and HAPPO.

Item Type:	Articles
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	UNSPECIFIED
Authors:	Li, Z., Wu, L., Su, K., Wu, W., Jing, Y., Wu, T., Duan, W., Yue, X., Tong, X., and Han, Y.
College/School:	College of Social Sciences > School of Social and Political Sciences
Journal Name:	Neural Networks
Publisher:	Elsevier
ISSN:	0893-6080

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	316917
Depositing User:	Publications Router
Datestamp:	24 Jan 2024 16:32
Last Modified:	25 Jan 2024 02:31
Date of acceptance:	3 January 2024
Date of first online publication:	16 January 2024
Data Availability Statement:	No