Azure Accelerated Networking: SmartNICs in the Public Cloud

Cloud providers use SDN to build virtual network. It puts burden on the host CPU, especially when NIC becomes faster to faster (40GbE+):

As a large public cloud provider, Azure has built its cloud network on host-based software-defined networking (SDN) technologies, using them to implement almost all virtual networking features, such as private virtual networks with customer supplied address spaces, scalable L4 load balancers, security groups and access control lists (ACLs), virtual routing tables, bandwidth metering, QoS, and more.

SDN policies change rapidly. It’s not feasible to implement them all on NIC. Thus SR-IOV does not work (I guess ASIC is not suitable for the same reason):

Single Root I/O Virtualization (SR-IOV) [4, 5] has been proposed to reduce CPU utilization by allowing direct access to NIC hardware from the VM. However, this direct access would bypass the host SDN stack, making the NIC responsible for implementing all SDN policies. Since these policies change rapidly (weeks to months), we required a solution that could provide software-like programmability while providing hardware-like performance.

Generic Flow Tables (GFT): a match-action language, is used to make VFP’s complex policy compatible with SR-IOV. Big idea: control-plane and data-plane separation. New flow / table miss: invoke VFP software and JIT compiler. Existing flow: SR-IOV.

Design Goals:

Don’t burn host CPU cores: burning even one physical core for SDN would potentially cost maximally $4500 per lifetime of a server.

At the time of writing this paper, a physical core (2 hyperthreads) sells for $0.10-0.11/hr , or a maximum potential revenue of around $900/yr, and $4500 over the lifetime of a server (servers typically last 3 to 5 years in our datacenters). Even considering that some fraction of cores are unsold at any time and that clouds typically offer customers a discount for committed capacity purchases, using even one physical core for host networking is quite expensive compared to dedicated hardware.

This is an excellent way to show burning host CPU is costly. Also it kinds of says that if a SmartNIC costs less than $4500, then it is worth putting it on a server.
Maintain host SDN programmability of VFP

Offloading all rules is unnecessary. Most SDN policies do not change during the duration of the flow. So all policies can be enforced in VFP software on the first packet of a new TCP/UDP flow, after which the actions for that flow can be cached as an exact-match lookup
Achieve the latency, throughput, and utilization of SR-IOV hardware

SR-IOV is fast. So let’s make use of it.
Support new SDN workloads and primitives over time

VFP evolves so this thing also needs to be able to keep up with it.
Rollout new functionality to the entire fleet

All servers are going to have this thing.
Provide high single-connection performance

An explicit goal of AccelNet is to allow applications to achieve near-peak bandwidths without parallelizing the network processing in their application.
Have a path to scale to 100GbE+

The design needs to be future proof.
Retain Serviceability

It needs to be transparent.

Hardware options:

ASIC-based NICs: high performance but not flexible
Multicore SoC-based NICs: easy to program and high performance, but hard to scale up to 40GbE and plus.
FPGAs: a balanced solution. Also Microsoft has previous experience on deploying FPGAs for Bing.
Host cores: expensive and high performance overhead

FPGAs:

Aren’t FPGAs much bigger than ASICs?

Basically only the packet processing logic part needs to use FPGA, other parts that take more space (SRAM memory, I/O transceivers, driver) can be hardened.
Aren’t FPGAs very expensive?

The FPGA market is competitive. Azure is able to purchase at significant volumes.
Aren’t FPGAs hard to program?

we built our own FPGA team in Azure Networking for SmartNIC. […] with the AccelNet team averaging fewer than 5 FPGA developers on the project at any given time. [Previous FPGA deployments in Microsoft] demonstrate that programming FPGAs is very tractable for production-scale cloud workloads.
Can FPGAs be deployed at hyperscale?

This project would not have been feasible without the prior Catapult work.
Isn’t my code locked in to a single FPGA vendor?

They uses SystemVerilog. Also Project Catapult has some experience.

Because we don’t have to scale across multiple cores, we believe we can scale single connection performance to 50/100Gb line rate using our current architecture as we continue to increase network speeds with ever-larger VMs.

Question: But they didn’t justify how. Just like in the previous text, when they mentioned Multicore SoC-based NICs, those NICs work well on 10GbE but very hard to scale up.

I love the writing of this paper. It explains the problem, goals, options, and justification for their choice, step by step. It’s very easy to follow and I learned lots of background knowledge from this paper. I especially love the way that they show burning host CPU cores are costly. Although the $4500 cost per server lifetime might be very over estimated, I think the cost is a good estimation of the amortized budget of SmartNICs, i.e., if the SmartNIC costs less than $4500, it’s definitely worth deploying it.

One thing I have in doubt is that they “believe” that the SmartNIC would scale well to 100Gb+ but they didn’t justify why. They mention in earlier text that multicore SoC-based NICs works very well, the performance is very high, and looks very promising, but that NIC design fails to scale tot 40Gb. I can see this AccelNet design uses lots of good system design principles (like control-plane and data-plane separation, and letting hardware do heavy lifting works), but that doesn’t directly translate to what they “believe”.