Occupy the Cloud: Distributed Computing for the 99%
I believe that serverless computation has lots of potentials. From reading this paper, I feel remote storage might be a big bottleneck for serverless. I guess large portion of storage reads and writes in the serverless setting might be temporary data. Then, why not write to a less persistent but faster storage instead of S3, especially in this case, loss data can be recomputed. Following this, I think perhaps serverless might be a legitimate use case for distributed shared memory? The paper mentions that ElastiCache didn’t quite work for them and they had to set up Redis cluster on their own. I wonder why AWS doesn’t provide a S3-like memcache service.
I also have some doubts.
- Vendor lock-in might be stronger in the serverless setting because serverless has to rely on cloud providers’ storage, computation, and networking infrastructure.
- As the paper points out in the MapReduce example, serverless requires more networking communication, e.g., it loses the chance to partially shuffle data within the same host. Assuming there are N servers and each of them run K lambdas, serverless requires (NK)^2 intermediate files to shuffle while traditionally it only requires N^2.
© 2020 ScratchPad ― Powered by Jekyll and Textlog theme