1.西安电子科技大学空天地一体化综合业务网全国重点实验室,陕西 西安 710071
2.新加坡科技设计大学,新加坡 487372
[ "刘飞(1996- ),男,山西吕梁人,西安电子科技大学博士生,主要研究方向为算力网络、集合通信和时变图理论。" ]
[ "王鹏(1995- ),男,河南济源人,博士,新加坡科技设计大学博士后研究员,主要研究方向为算力网络、时间确定性网络和6G网络。" ]
[ "麻涵(1997- ),男,宁夏银川人,西安电子科技大学博士生,主要研究方向为时变图理论、自组织网络资源调度。" ]
[ "李烨(2004- ),女,吉林珲春人,西安电子科技大学博士生,主要研究方向为时变图理论、集合通信。" ]
[ "李红艳(1966- ),女,陕西西安人,博士,西安电子科技大学教授、博士生导师,主要研究方向为时间确定性网络、天地一体化网络和新一代无线局域网等。" ]
收稿:2025-12-27,
修回:2026-03-01,
录用:2026-03-02,
纸质出版:2026-04-20
移动端阅览
刘飞,王鹏,麻涵等.基于组播时间扩展图的集合通信编排算法[J].通信学报,2026,47(04):67-79.
Liu Fei,Wang Peng,Ma Han,et al.Multicast time-expanded graph-based collective communication scheduling algorithm[J].Journal on Communications,2026,47(04):67-79.
刘飞,王鹏,麻涵等.基于组播时间扩展图的集合通信编排算法[J].通信学报,2026,47(04):67-79. DOI: 10.11959/j.issn.1000-436x.2026062.
Liu Fei,Wang Peng,Ma Han,et al.Multicast time-expanded graph-based collective communication scheduling algorithm[J].Journal on Communications,2026,47(04):67-79. DOI: 10.11959/j.issn.1000-436x.2026062.
集合通信时延已成为大模型分布式训练的瓶颈。针对现有算法拓扑感知不足、调度复杂度高等问题,提出一种基于组播时间扩展图的集合通信编排算法。通过构建组播时变图模型,刻画节点间的拓扑关系及时隙冲突约束,将集合通信编排问题转化为时变图中组播路径搜索问题,实现路由与时隙联合规划。仿真结果表明,所提算法完成时间接近最优,同时显著降低计算复杂度,适用于大规模分布式训练场景。
Collective communication latency has become a major performance bottleneck in large-scale distributed training. To address the limitations of existing algorithms in terms of insufficient topology awareness and high computational complexity
a topology-aware collective communication scheduling algorithm based on multicast time-expanded graph was proposed. By constructing a multicast time-varying graph model
both network topology constraints and time-slot conflict constraints were explicitly captured. The collective communication scheduling problem was then transformed into a multicast path search problem on the time-varying graph
enabling joint optimization of routing and time slot planning. Simulation results demonstrate that the proposed algorithm achieves near-optimal communication completion time while significantly reducing computational complexity
making it suitable for large-scale distributed training scenarios.
高翔 , 董斌 , 肖晴 , 等 . 智算场景下集合通信库的挑战与发展趋势 [J ] . 电信科学 , 2025 , 41 ( 4 ): 81 - 94 .
Gao X , Dong B , Xiao Q , et al . Challenges and development trends of collective communication libraries in intelligent computing scenarios [J ] . Telecommunications Science , 2025 , 41 ( 4 ): 81 - 94 .
Liu T R , Hei C Y , Li F L , et al . ResCCL: resource-efficient scheduling for collective communication [C ] // Proceedings of the ACM SIGCOMM 2025 Conference . New York : ACM Press , 2025 : 55 - 70 .
Hui L H , Yang W , Wu F , et al . DirectReduce: a scalable ring AllReduce offloading architecture for torus topologies [J ] . IEEE Internet of Things Journal , 2025 , 12 ( 16 ): 32951 - 32964 .
Zhang Z X , Wen Y B , Lyu H Q , et al . AI computing systems for large language models training [J ] . Journal of Computer Science and Technology , 2025 , 40 ( 1 ): 6 - 41 .
Geng J K , Li D , Cheng Y , et al . HiPS: hierarchical parameter synchronization in large-scale distributed machine learning [C ] // Proceedings of the 2018 Workshop on Network Meets AI & ML . New York : ACM Press , 2018 : 1 - 7 .
Jiang Y H , Gu H X , Lu Y F , et al . 2D-HRA: two-dimensional hierarchical ring-based all-reduce algorithm in large-scale distributed machine learning [J ] . IEEE Access , 2020 , 8 : 183488 - 183494 .
Pjesivac G J . Towards automatic and adaptive optimizations of MPI collective operations [D ] . Knoxville : The University of Tennessee, Knoxville , 2007 .
Sanders P , Speck J , Träff J L . Two-tree algorithms for full bandwidth broadcast, reduction and scan [J ] . Parallel Computing , 2009 , 35 ( 12 ): 581 - 594 .
Weingram A , Li Y K , Qi H , et al . xCCL: a survey of industry-led collective communication libraries for deep learning [J ] . Journal of Computer Science and Technology , 2023 , 38 ( 1 ): 166 - 195 .
Cavazzoni C . EURORA: a European architecture toward exascale [C ] // Proceedings of the Future HPC Systems: the Challenges of Power-Constrained Performance . New York : ACM Press , 2012 : 1 - 4 .
Cho M , Finkler U , Serrano M , et al . BlueConnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy [J ] . IBM Journal of Research and Development , 2019 , 63 ( 6 ): 1 - 11 .
Kalb J L , Lee D S . Network topology analysis [R ] . 2008 .
Cai Z X , Liu Z Y , Maleki S , et al . Synthesizing optimal collective algorithms [C ] // Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . New York : ACM Press , 2021 : 62 - 75 .
Shah A , Chidambaram V , Cowan M , et al . TACCL: Guiding collective algorithm synthesis using communication sketches [C ] // Proceedings of the 20th NSDI-2023 USENIX Symposium on Networked Systems Design and Implementation . Berkeley : USENIX Association , 2023 : 593 - 612 .
Liu X T , Arzani B , Kakarla S K R , et al . Rethinking machine learning collective communication as a multi-commodity flow problem [C ] // Proceedings of the ACM SIGCOMM 2024 Conference . New York : ACM Press , 2024 : 16 - 37 .
Takahashi H , Matsuyama A . An approximate solution for steiner problem in graphs [J ] . Math. Japonica , 1980 , 24 ( 6 ): 573 - 577 .
GUROBI optimization , LLC . Gurobi optimizer reference manual [R ] . 2024 .
0
浏览量
94
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010602201714号