The context
With the goal in mind to expand my knowledge of low-level programming, I am dipping my toes for the first time in the fundamentals of persisting data to disk. I come from a (very) theoretical background, so I am trying to understand what can be safely assumed about writes that are interrupted by random accidents.
My question
Consider a setting where I have a file, n
bytes in length. I seek to byte a
and attempt to write (b - a)
bytes (in other words, I am overwriting all bytes in the a..b
range, a
included, b
excluded). Say that, due to some accident (e.g., my process is unexpectedly killed, or the entire system experiences a sudden power outage) my write fails to complete. Of course, I know I cannot make any assumption about what the bytes in the a..b
range will turn out to be: writes to disk are not atomic! I am wondering, however, if I can at least assume that the bytes outside the a..b
range (i.e., 0..a
and b..n
) will be unchanged.
On the one hand, it seems to me quite reasonable that that would be the case: naively, I find it hard to imagine that a write would (temporarily) corrupt bytes around the range being written. I know, however, that file systems usually organize data in blocks. If a..b
was not aligned with a block, maybe part of the data being overwritten might not belong to a..b
, and be corrupted as a result of the interrupted write?
OS / File system / Physical device
I am aware that the answer to my question might depend on the specific operating system, file system, and physical device I am using, although my intuition suggest that most modern combinations (e.g., Linux / ext4 / SSD, or MacOS / APFS / SSD) should provide similar guarantees. In answering, please keep in mind that my goal here is to learn, so if you could outline what range of behaviors I could encounter, that would be amazing. My eventual goal is to develop for servers, so if a specific configuration is required to provide an answer, I would like to know about Linux / ext4 / SSD.
Out-of-the-box solutions
Of course, I am aware that a plethora of database solutions already exist that offer all sorts of atomicity guarantees. The goal of my question is to learn about the fundamentals of persistency, so any pointers to, e.g., key-value stores that can get the job done are appreciated but beyond the scope of my question.