For example imagine a database like cassandra or bigtable handling 1 million qps across 100 servers, each server handle 10k qps unique writes
Each page in ssd is 4kb
Assume each server have 100GB of volume
Each second you get 10k writes, and each write takes 4kb this means without compaction you have 25000000 (100GB / 4kb) writes before filling the entire database
This means in 41min you will fill the entire disk
25000000 / 10000 / 3600 * 60 = 41min
If you do compaction , the throughput will go down and latency will go up
How do SSD handle high writes? Does the firmware do compaction on the fly? Or does the database engine have to handle compaction?
If its handle in database, if the data requires durability, you can’t batch and buffer in memory, so you must write to blank pages, or read from existing block, concat new data, and override that same block.
Also I am guessing certain data structures is better for write amplification? For example LSM will perform better than B tree in SSD?
Is there resource I can read more about?
For Bigtable, durability is ensured because of the commit log on disk. In this case, disk being Google’s distributed file system, not attached storage on the node itself hence losing a node, doesn’t mean you lost the data.
If a node goes down, the log can be replayed to recreate what was in memory. You can read about memtables, sstables and logs in the original Bigtable paper.