Slicer: Auto-Sharding for Datacenter Applications

Slicer is a auto sharding tool for running datacenter applications. The first time I read the title, auto-sharding didn’t make sense to me. Actually, it means load balancing in some sense. The goal of Slicer is to make the load of application servers to be balanced. More concretely, when you look at the Figure 8, you will feel it is very effective, i.e., after turning on Slicer, all servers receives similar number of requests.

The architecture design of Slicer separates Slicer from the application data plane. There is a Slicer library that is embedded to the application server, which is called Slicelet. And there is one embedded to to the application client, which is called Clerk. On the Slicer side, the Slicer service mainly consists of an Assigner and a tiered Distributor. Assigner takes load signal, health signal, and other signals, then calculates a assignment. The multi tiered distributors then propagate the assignment to application servers and application clients.

To provide more context, let’s think about what happens when an application client makes a request. For lots of datacenter applications, there are multiple servers that can serve the same client request. Without Slicer, applications must come up with their own idea on how to decide which server a request should be sent to. This might usually be the source of the load imbalance because the clients might not have a global view of the load. With Slicer, when making a client request, the client will send the request to the specific application server instructed by the Slicer sharding assignment.

On the application server side, using Slicer means that servers can manage resources more effectively. For example, a memcache server can discard all the keys that it is not longer in charge of.

That is the big picture of Slicer architecture. Now let’s look at how Slicer shards the load.

Applications use keys to distinguish requests, e.g., language name in a translation service, memcache key in a memcache service. Slicer takes application key and hash each of them to a 63-bit slice key. The slice key is the atomic unit of work assignment. The big picture idea of load balancing would be that Slicer takes the request rate of each slice key, makes some planning and assign slice keys to application server. But since the keyspace is too big, Slicer doesn’t actually work on individual keys. Slicer works on the slice key range instead.

Hashing keys makes hot keys in the application’s keyspace are likely uniformly distributed in the hashed keyspace. This simplifies the load balancing algorithm but loses locality of applications key. They argue that many Google applications are encouraged to do single key operations instead of scanning, which means that they don’t rely on locality.

The load balancing algorithm seems like a heuristic to me. They made several targets for the algorithm, e.g., the number of slices per task, the percentage of key movement per adjustment. I don’t know if there is a scientific approach to design such algorithms. But perhaps scheduling is just hard and always seems to be heuristic.

Assignment distribution to application servers and application clients forms a two tier distribution tree and can be extended to multiple tiers if needed. I think this is reasonable because the total number of tasks could be huge. To evolve easier, Slicer makes application server library and application client library very simple and put more logic on the Distributor (e.g. identify preferred Assigner). This also makes sense to me. Distribution is a pull model, but I didn’t see the paper mention when subscribers should pull.

Finally, on the fault tolerance.

Slicer is a “off-path” component for applications, which means at worst applications will just fallback to static sharding and can still serve client requests.

Since Slicer allows running multiple assigners and distributors, the architecture is naturally more fault-tolerant.

Slicer assigners save the states in the global storage and can recover from it.

Since distributors and other parts of Slicer share the code base, to prevent correlated failure, Slicer have a simple backup distributor on a separate code base and serves a static sharding. Shall all the Slicer services went down, applications can still work normally, except that the sharding is static.

There are two more points that I want to mention.

One is that Slicer doesn’t take into account available memory and CPU share of a task. At Google, there are independent mechanisms deciding the number of application replicas and the CPU and memory share for each application. “Thus Slicer focuses exclusively on redistributing imbalanced load among available tasks, not on reprovisioning resources for sustained load changes.” I wonder if the two system could fight against each other?

Another thing is that Slicer has a feature called strong consistent assignment. What it does is that it guarantees that at most one application server is in charge of a slice key. But this feature confused me a lot.

I feel Slicer is a very useful infrastructure for deploying enterprise applications. Figure 8 is very impressive. I have a few questions:

The strong consistency mode just confuses me. I’m wondering if a race condition could occur. Assignment ownership change of a key could still happen after isAssignedContinuously() returns true and before externalizing the result. And the paper says that reassigning the consistent assignment involves recalling the lease from Slicelet, but I didn’t see from the API list any function that tells the application server the recall happens. What happens if the Slicelet just quietly and asynchronously recalls the lease while the application server still believes it holds the lease? Also, the paper says “The consistency feature is implemented, but it is not yet deployed by customers in production.”
I love that Slicer is a “off-path” component for applications, which means at worst applications will just fallback to static sharding and can still serve client requests. But I wonder if Slicer is down for a longer period of time, would some application servers just become extremely heavy loaded and then escalates the failure? Perhaps the auto scaler could mitigate this issue?
Using Slicer and the auto scaler at the same time seems like a mystery. Could it oscillates?

Hashing keys makes keys more likely to be uniformly distributed but loses locality. The paper mentions that “Many Google applications are already structured around single-key operations rather than scans, encouraged by the behavior of existing storage systems.”

The task calls getSliceKeyHandle when a request arrives, and passes the handle back to isAssignedContinuously before externalizing the result.

Question: ownership change could still happen after isAssignedContinuously returns true and before externalizing the result.

By default, Slicer load balances on request rate (req/s).

Most applications ignore this API and simply enable transparent integration with Google’s RPC system Stubby or Google’s HTTP proxy GFE (Google Front End).

Benefit of using the same infrastructure among the whole company.

Current uses of Slicer fit three categories: in-memory cache, in-memory store, and aggregation.

Assigners are somewhat stateless, in the sense that it saves the states in the global storage and can recover from it if restarts. There may be brief periods when multiple Assigners are preferred.

Slicer makes assignments for one job in one datacenter at a time. Customers who run jobs in multiple datacenters use a higher-level Google load balancer to route a request to a datacenter, and then within that datacenter, use Slicer to pick one task from the job.

Since Slicer allows running multiple assigners and distributors, the architecture is naturally more fault-tolerant. Since distributors and other parts of Slicer share the code base, to prevent correlated failure, Slicer have a simple backup distributor on a separate code base and serves a static sharding. Shall all the Slicer services went down, applications can still work normally, except that the sharding is static. I think if this failure lasts long enough, there will be risk of failure escalation because of the high load of applications tasks.

The ultimate goal of load balancing is to minimize peak load, i.e., minimize resource provision.

The primary goal of load balancing is to minimize the load imbalance, which we define as the ratio of the maximum task load to the mean task load.

It should also limit key churn, the fraction of the key space affected by reassignment. Key churn itself creates load and increases overhead.

At Google, independent mechanisms (sometimes humans) decide when to add or remove tasks from a job, or add or remove CPU or memory from tasks in a job. Thus Slicer focuses exclusively on redistributing imbalanced load among available tasks, not on reprovisioning resources for sustained load changes.

Initially Slicer assigns the same number of keys to each application task, assuming that key load is uniform. As load imbalances among the keys develops, Slicer adjusts the key assignments. And because the key space is large, Slicer represents keys assignments using key ranges.

The constants in the algorithm (50–150 slices per task, 1% and 9% key movement per adjustment) were chosen by observing existing applications.