Emmanuel Goossaert posted a series of ”Coding for SSDs” based on his research for a key-value store project. In Part 6, he recommends some access patterns for SSDs.
Here we will go through his recommendations (15-30) one by one.
This is not necessary. Inside SSDs, data will be buffered and aggregated to fill up NAND pages. So a small write will not occupy the whole NAND page. But, sequential writes outperform random writes significantly because random writes require much more FTL updates.
Pointless. First, host data may not be written as is to NAND flash. Extra data like CRC may be inserted (think of DIF (Data Integrity Field)); data may also be compressed. Second, actual NAND page size is not aligned to KB boundary at all. It includes data and spare, which is used for ECC (Error Correction Codes). Nominal data size recommended by flash vendor is actually aligned to KB but an SSD controller can choose to store fewer data and use more spare for better protection, or vice versa for more space.
Isn’t this just a rephrased Item 15?
This is true.
Generally it is incorrect, but I can imagine it is true for some SSDs. For a well designed SSD, the IOPS should look like this: 100% read > 50%/50% > 100% write. If you separate reads and writes, you lose the read/write parallism. For example, if all requests are writes, the read data path will be idle. It is true that there may be conflicts/competitions between read and write requests, but it is also true between requests of the same type.
No, it will hurt. If you hold up the TRIMs, the usable space is lowered unnecessarily. Low usable space will put huge pressure on garbage collection. It may have to move your obsolete data around to recover empty blocks. That leads to higher write amplification.
No, unless your random writes are not random at all. If your write size is 16 MB each, they are simply not qualified as random. Each of them is a bunch of sequential writes. And no, it is not because that is aligned to clustered block size. The real reason is it is big enough so it requires roughly the same number of FTL updates as sequential writes.
Misleading. Sequential reads are better than random reads, period.
The same as Item 22.
Yes, it is the purpose of IO queues.
Yes, it will help.
Yes, it is true for any storage.
This is true. But keep in mind a fast interface does not necessarily mean a fast drive.
Yes, higher is better. The best way to over-provision? Keep fewer data on the drive. See Item 20.
Yes, see Item 20.
Yes, but to 4 KB boundary instead of NAND page size. It is for FTL to avoid unaligned IOs. An unaligned write will even become two read-modify-writes. On the other hand, I am curious why file systems on unaligned partitions don’t help to minimize unaligned IOs.