A Communication Efficient ADMMโ€‘based Distributed Algorithm Using Twoโ€‘Dimensional Torus Grouping AllReduce

Abstract

Large-scale distributed training mainly consists of sub-model parallel training and parameter synchronization. With the expansion of training workers, the efficiency of parameter synchronization will be affected. To tackle this problem, we first propose 2D-TGA, a grouping AllReduce method based on the two-dimensional torus topology. This method synchronizes the model parameters by grouping and makes full use of bandwidth. Secondly, we propose a distributed algorithm, 2D-TGA-ADMM, which combines the 2D-TGA with the alternating direction method of multipliers (ADMM). It focuses on sub-model training and reduces the wait time among workers in the synchronization process. Finally, experimental results on the Tianhe-2 supercomputing platform show that compared with the ๐™ผ๐™ฟ๐™ธ_๐™ฐ๐š•๐š•๐š›๐šŽ๐š๐šž๐šŒ๐šŽ, the 2D-TGA could shorten the synchronization wait time by 33%.

Publication
In Data Science and Engineering
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.