ASIC Clouds: Specializing the Datacenter

ASIC Clouds are not ASIC supercomputers that scale up problem sizes for a single tightly-coupled computation; rather, ASIC Clouds target workloads consisting of many independent but similar jobs (eg., the same function, but for many users, or many datasets), for which standalone accelerators have been shown to attain improvements for individual jobs.

I guess this makes sense. GPUs are have high throughput on batch jobs but still incur a long latency on small batch-size.

Benefits of ASIC clouds:

Chip design
- replace area-intensive, energy-wasteful instruction interpreters with area-efficient, energy-efficient parallel circuits.
- Customized I/O design.
- Optimized packages and other electrical designs
Server
- Avoids I/O contention and cooling hotspot
Datacenter
- optimizing racklevel and datacenter-level thermals and power delivery that exploit the uniformity of the system

Bitcoin ASIC Clouds required no inter-chip or inter-RCA bandwidth, but have ultra-high power density, because they have little on-chip SRAM. Litecoin ASIC Clouds are SRAM-intensive, and have lower power density. Video Transcoding ASIC Clouds require DRAMs next to each ASIC, and high off-PCB bandwidth. Finally, our DaDianNao-style [22] Convolutional Neural Network ASIC Clouds make use of on-ASIC eDRAM and HyperTransport links between ASICs to scale to large multichip CNN accelerators.

Different applications have very different needs, thus requires different hardware design. This example clearly shows why ASIC clouds could be more efficient than commodity hardware.

I think the ASIC Clouds paper is a very nice tutorial for people who don’t work in hardware (like me) to understand what are the design aspects of an ASIC cloud datacenter. The case studies explain very different application characteristics which leads to very different design choices. These examples clearly show why ASICs are way more efficient than commodity hardware. The idea of Pareto optimality is quite interesting and it’s amazing to see the paper find a way to model the system and find out the Pareto optimal frontier.

My biggest doubt of this ASIC clouds idea is whether it is worth to build an ASIC cloud (or rather, as they say, “When do we go ASIC cloud?”) The paper convinced me that building an ASIC cloud is a huge project that requires a wide range of expertise, including electrical engineering, computer architecture, semiconductor, power supply and cooling (I actually don’t know which major covers these), and economics. The development cycle seems to be long and costly as well. In contrast, cloud customers’ demands and algorithms change all the time. How long can an ASIC cloud last before its functionality becomes completely out of date? I would like to see the paper elaborate more on this section.