Attack of the Killer Microseconds

Existing system optimizations targeting nanosecond- (e.g. memory access) and millisecond-scale (e.g. reading disk) events are inadequate for events in the microsecond range.
Google strongly prefers synchronous programming model
Simple and consistent synchronous APIs and idioms across different languages
Shifting the burden of managing asynchronous events away from the programmer to the operating system or the thread library makes the code significantly simpler.
Microsecond-scale: fast NIC, fast flash device, non-volatile memory, GPU/accelerator offload
Nice figures: Table 1 and Figure 1.
“A 2015 paper summarizing a multiyear longitudinal study at Google10 showed that 20%–25% of fleetwide processor cycles are spent on lowlevel overheads we call the “datacenter tax.” Examples include serialization and deserialization of data, memory allocation and de-allocation, network stack costs, compression, and encryption.” Also see Atul Adya’s HotOS’19 paper.
One very special aspect of articles from Google is that they put the engineer productivity and code maintainability in a very important position.

This article says about lots of CPU cycles are wasting on (de-)serialization and network stack. This reminds me of Atul Adya’s HotOS’19 paper, which basically advocates people to build stateful applications instead of stateless, so that we can avoid a lot of unnecessary (de-)serialization and network communication.