Speaker: Wei Bai (白巍),Microsoft Research Redmond
Host: Qun Huang (黄群)
Time: 14:00 p.m., May 5, 2023, GMT+8
Venue: Science Building#1126 (理科一号楼1126 )
Abstract:
Given the wide adoption of disaggregated storage in public clouds, networking is the key toenabling high performance and high reliability in a cloud storage service. In Azure, we choosRemote Direct Memory Access (RDMA) as our transport and aim to enable it for both storagefrontend traffic (between compute virtual machines and storage clusters) and backend traffic(within a storage cluster) to fully realize its benefits. As compute and storage clusters may belocated in different datacenters within an Azure region, we need to support RDMA at regional scale.
In this talk, I will present our experience in deployingintra-region RDMA to support storageworkloads in Azure. The high complexity and heterogeneity of our infrastructure bring a series ofnew challenges, such as the problem of interoperability between different types of RDMA networkinterface cards. We have made several changes to our network infrastructure to address thesechallenges. Today, around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in allAzure public regions. RDMA helps us achieve significant disk I/0 performance improvements andCPU core savings.
Biography:
Wei Bai is a senior researcher in the Networking Research Group at Microsoft Research Redmond. Hereceived his PhD dearee in computer science from Hong Kong University of Science and TechnologyWei is broadly interested in computer networking with a special focus on data center networking. Hisresearch work has been published in many top conferences and journals, such as SIGCOMM, NSDl, ancIEEE/ACM Transactions on Networking. Wei also has rich experience in developing and operatingproduction cloud networks. Currently, he is mainly focusing on high performance networking for storagegeneral compute and Al supercomputer.
Source: School of Computer Sciences