Liu Fei,Wang Peng,Ma Han,et al.Multicast time-expanded graph-based collective communication scheduling algorithm[J].Journal on Communications,2026,47(04):67-79.
Liu Fei,Wang Peng,Ma Han,et al.Multicast time-expanded graph-based collective communication scheduling algorithm[J].Journal on Communications,2026,47(04):67-79. DOI: 10.11959/j.issn.1000-436x.2026062.
Multicast time-expanded graph-based collective communication scheduling algorithm
Collective communication latency has become a major performance bottleneck in large-scale distributed training. To address the limitations of existing algorithms in terms of insufficient topology awareness and high computational complexity
a topology-aware collective communication scheduling algorithm based on multicast time-expanded graph was proposed. By constructing a multicast time-varying graph model
both network topology constraints and time-slot conflict constraints were explicitly captured. The collective communication scheduling problem was then transformed into a multicast path search problem on the time-varying graph
enabling joint optimization of routing and time slot planning. Simulation results demonstrate that the proposed algorithm achieves near-optimal communication completion time while significantly reducing computational complexity
making it suitable for large-scale distributed training scenarios.
Gao X , Dong B , Xiao Q , et al . Challenges and development trends of collective communication libraries in intelligent computing scenarios [J ] . Telecommunications Science , 2025 , 41 ( 4 ): 81 - 94 .
Liu T R , Hei C Y , Li F L , et al . ResCCL: resource-efficient scheduling for collective communication [C ] // Proceedings of the ACM SIGCOMM 2025 Conference . New York : ACM Press , 2025 : 55 - 70 .
Hui L H , Yang W , Wu F , et al . DirectReduce: a scalable ring AllReduce offloading architecture for torus topologies [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 16 ): 32951 - 32964 .
Zhang Z X , Wen Y B , Lyu H Q , et al . AI computing systems for large language models training [J ] . Journal of Computer Science and Technology , 2025 , 40 ( 1 ): 6 - 41 .
Geng J K , Li D , Cheng Y , et al . HiPS: hierarchical parameter synchronization in large-scale distributed machine learning [C ] // Proceedings of the 2018 Workshop on Network Meets AI & ML . New York : ACM Press , 2018 : 1 - 7 .
Jiang Y H , Gu H X , Lu Y F , et al . 2D-HRA: two-dimensional hierarchical ring-based all-reduce algorithm in large-scale distributed machine learning [J ] . IEEE Access , 2020 , 8 : 183488 - 183494 .
Pjesivac G J . Towards automatic and adaptive optimizations of MPI collective operations [D ] . Knoxville : The University of Tennessee, Knoxville , 2007 .
Sanders P , Speck J , Träff J L . Two-tree algorithms for full bandwidth broadcast, reduction and scan [J ] . Parallel Computing , 2009 , 35 ( 12 ): 581 - 594 .
Weingram A , Li Y K , Qi H , et al . xCCL: a survey of industry-led collective communication libraries for deep learning [J ] . Journal of Computer Science and Technology , 2023 , 38 ( 1 ): 166 - 195 .
Cavazzoni C . EURORA: a European architecture toward exascale [C ] // Proceedings of the Future HPC Systems: the Challenges of Power-Constrained Performance . New York : ACM Press , 2012 : 1 - 4 .
Cho M , Finkler U , Serrano M , et al . BlueConnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy [J ] . IBM Journal of Research and Development , 2019 , 63 ( 6 ): 1 - 11 .
Kalb J L , Lee D S . Network topology analysis [R ] . 2008 .
Cai Z X , Liu Z Y , Maleki S , et al . Synthesizing optimal collective algorithms [C ] // Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York : ACM Press , 2021 : 62 - 75 .
Shah A , Chidambaram V , Cowan M , et al . TACCL: Guiding collective algorithm synthesis using communication sketches [C ] // Proceedings of the 20th NSDI-2023 USENIX Symposium on Networked Systems Design and Implementation . Berkeley : USENIX Association , 2023 : 593 - 612 .
Liu X T , Arzani B , Kakarla S K R , et al . Rethinking machine learning collective communication as a multi-commodity flow problem [C ] // Proceedings of the ACM SIGCOMM 2024 Conference . New York : ACM Press , 2024 : 16 - 37 .
Takahashi H , Matsuyama A . An approximate solution for steiner problem in graphs [J ] . Math. Japonica , 1980 , 24 ( 6 ): 573 - 577 .