I believe that serverless computation has lots of potentials. From reading this paper, I feel remote storage might be a big bottleneck for serverless. I guess large portion of storage reads and writes in the serverless setting might be temporary data. Then, why not write to a less persistent but faster storage instead of S3, especially in this case, loss data can be recomputed. Following this, I think perhaps serverless might be a legitimate use case for distributed shared memory? The paper mentions that ElastiCache didn’t quite work for them and they had to set up Redis cluster on their own. I wonder why AWS doesn’t provide a S3-like memcache service.

I also have some doubts.

  1. Vendor lock-in might be stronger in the serverless setting because serverless has to rely on cloud providers’ storage, computation, and networking infrastructure.
  2. As the paper points out in the MapReduce example, serverless requires more networking communication, e.g., it loses the chance to partially shuffle data within the same host. Assuming there are N servers and each of them run K lambdas, serverless requires (NK)^2 intermediate files to shuffle while traditionally it only requires N^2.