I am building a binary file importer using Python. The specific file structure is defined in a specification. This looks like:
File
- Map Block
String, Short Integer, Long Integer, String, Short Integer, Long Integer…
- Block1
String, String, Long Integer, String, Short Integer, String, Long Integer, Long Integer….
-
Block2
-
Block3
etc
I have proved the concept of collecting and decoding the binary data and I can access each block of data individually by collecting the Map Block first.
However I can’t figure out the best way to Test it.
The current tests I have are:
test_get_string
test_get_short
test_get_long
Can you suggest what suite of tests I might build next?
Would I be best testing with an actual file or create long binary strings to test with?
If you work with a stream to import (as suggested by Killian) then you can test each of the functions to get individual values by mocking that stream. Mocking the stream means that you setup the next few bytes before calling the to be tested function and let the test fail when it requests more than the needed number of bytes. After the function all those bytes should have been read.
Part of the correctness assertion would be that it doesn’t read more or less than the amount of bytes it needs to decode and the correctness of the value.
You can also test getting each block individually again with ensuring it only reads exactly as many bytes it needs and correctness of the read in values.
For testing the map block decoding you can construct one and then query the offsets of some blocks using that map block.
6
The purpose of your implementation is to get comething decoded. It’s not to move data from disk to memory, that is just an incidental auxiliary function. Therefore, your code should be decoupled so that the importer works with a stream, and in practice it should be coupled to a standard file-reading implementation (preferably from the standard library).
Similarly, your unit test should test the decoding logic, not the file I/O – I should hope that that has been tested with the standard library, and anyway it’s not your concern. Therefore, testing with handcrafted input data, without touching the file system, is both faster and better according to sound software engineering principles.
2