SADMA: Scalable Asynchronous Distributed Multi-Agent Reinforcement Learning Training Framework

Sizhe Wang*, Long Qian*, Cairun Yi, Fan Wu, Qian Kou, Mingyang Li, Xingyu Chen,Xuguang Lan†

*Equal contribution †Corresponding author

Abstract

Multi-agent Reinforcement Learning (MARL) has shown significant success in solving large-scale complex decision-making problems while facing the challenge of increasing computational cost and training time. MARL algorithms often require sufficient environment exploration to achieve good performance, especially for complex environments, where the interaction frequency and synchronous training scheme can severelylimit the overall speed. Most existing RL training frameworks, which utilize distributed training for acceleration, focus on simple single-agent settings and are not scalable to extend to large-scale MARL scenarios. To address this problem, we introduce a Scalable Asynchronous Distributed Multi-Agent RL training framework called SADMA, which modularizes the training process and executes the modules in an asynchronous and distributed manner for efficient training. Our framework is power fully scalable and provides an efficient solution for distributed training of multi-agent reinforcement learning in large-scale complex environments.

Flexible Resource Allocation

Benefiting from the modularized design and unified data transfer interface, each module can be flexibly combined with each other and assigned to different computing nodes in the cluster regardless of the hardware device restrictions. This facilitates deployment on clusters with different resource configurations. Our framework naturally adapts to different resource configurations and thus can fully utilize cluster resources to accelerate training.

Experiments

Throughput Comparisons

We compare to baselines under different resource configurations for single and multiple machines settings.

Convergence Acceleration

We compare the wall times of each framework to make the algorithm converge with the same resource configuration.

Scalability Evaluation

In order to evaluate the scalability of SADMA for large-scale multi-agent environments, we constructed an environment containing 1225 agents based on the CityFlow environment, as well as a replenishment environment containing 1000 agents.

Citation

@inproceedings{
wang2024SADMA,
title={SADMA: Scalable Asynchronous Distributed Multi-Agent Reinforcement Learning Training Framework},
author={Sizhe Wang, Long Qian, Cairun Yi, Fan Wu, Qian Kou, Mingyang Li, Xingyu Chen, Xuguang Lan},
booktitle={12th International Workshop on Engineering Multi-Agent Systems},
year={2024},
url={https://link.springer.com/chapter/10.1007/978-3-031-71152-7_4}
}
Wang, S., Qian, L., Yi, C., Wu, F., Kou, Q., Li, M., Chen, X., Lan, X. SADMA: Scalable Asynchronous Distributed Multi-Agent Reinforcement Learning Training Framework. In Proceedings of 12th International Workshop on Engineering Multi-Agent Systems Co-located with AAMAS 2024, pages 31-47, Auckland, New Zealand, May. 2024. URL https://link.springer.com/chapter/10.1007/978-3-031-71152-7_4.