SplitFS: Reducing Software Overhead in File Systems for Persistent Memory

I really like the first paragraph. It depicts the characteristics of the new hardware in a few words:

Persistent Memory (PM) is a new memory technology that was recently introduced by Intel. PM will be placed on the memory bus like DRAM and will be accessed via processor loads and stores. PM has a unique performance profile: compared to DRAM, loads have 2-3.7x higher latency and 1/3rd bandwidth, while stores have the same latency but 1/6th bandwidth. A single machine can be equipped with up to 6 TB of PM. Given its large capacity and low la- tency, an important use case for PM will be acting as storage.

Their basic idea is to run metadata operation in kernel and data operation in user-space, whereas existing solutions run both either in kernel only or in user space only.
- Why would it be better than user-space only?
SplitFS optimizes file append by using relink.
Relink: a new primitive.
- When writing a block, SplitFS writes it to a staging block first. Then when fsync(), it relinks the logical block to the physical block. Therefore, there is no need to move block data around.
- I think this is a very interesting primitive that PM enables. It requires that random access has the same speed as sequential access, which does not apply to disks.
  - But what about SSD?
They also put a lot of effort on atomicity. They say databases can remove logging and directly update the database by using SplitFS strict mode. I don’t quite follow the details.
SplitFS kernel space is basically ext4 DAX. They say it can gain more performance and robustness as ext4 DAX grows. And they intercept application IO syscalls by using LD_PRELOAD.
Tom: There is speculation that software running on PM should use mmap interface other than files. If that’s the case, would the benefits of SplitFS go away?
- Author: Yes