I did 3 experiments
- rb mode
- r mode
- r mode with seek of different offset bytes
with open("small_file.txt", "w") as f:
f.seek(2)
f.write("Content")
with open("small_file.txt", "rb") as f:
content = f.read()
print(f"START|{content}|END")
f.seek(2) during write is to insert some nullbytes (2 being arbitrary).
Output: START|b'x00x00Content'|END
(No surprise here)
If i change rb
to r
, i get START|Content|END
.
Question 1: What are the underlying concepts here to explain the behaviour of not reading (or reading but not printing?) null bytes, is it specified in any documentation?
Then i added a seek before reading
with open("small_file.txt", "w") as f:
f.seek(2)
f.write("Content")
with open("small_file.txt", "r") as f:
f.seek(3)
content = f.read()
print(f"START|{content}|END")
which returns START|ontent|END
.
I see the behaviour is anything <=2 bytes in f.seek during read will produce START|Content|END
with no truncation of data.
I understand this threshold is directly related to f.seek(2) used during write, meaning if write used f.seek(3), the threshold during read will be 3.
Question 2: Why no truncation when <=2? Is this explainable by the same answer to question 1?
I also tried seeking beyond the end in both rb and r using f.seek(9)
(9 because 2 offset during write + 7 letters in content
) and got START||END
.
This is expected and explained by https://docs.python.org/3.8/tutorial/inputoutput.html#methods-of-file-objects . It never mentioned whether this empty string applies to rb
or r
mode so i assume both, please correct if wrong.
If the end of the file has been reached, f.read() will return an empty string (”)