I am building an empty image using the following Dockerfile:
FROM scratch
I tried in multiple environnements. I get either an imageID 71de1148337f
or 471a1b8817ee
. Why is that ? Why two different hashcodes? It is not timestamp dependent (I would receive more than 2 IDs).
4
For the literal FROM scratch
, the output from my buildx install for the config (which is hashed to get the image ID) looks like:
{"architecture":"amd64","config":{"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"WorkingDir":"/"},"created":null,"history":null,"os":"linux","rootfs":{"type":"layers","diff_ids":null}}
That happens to hash to sha256:471a1b8817eefb6569017c1a76f288e0d4e5c8476eb199485c469d0b033168bf
. For the 71de11...
image, you’d need to push that somewhere public and I could show the difference, but it’s most likely a minor change in the json formatting (e.g. omitting a null field).
A few things to note from this extreme case:
- There’s no created timestamp. This is typically set in docker builds, but likely excluded for this scenario.
- There are no rootfs diff_ids. Those are the layers, and those layers will have timestamps on the files.
- There are no history entries, which are associated with the layers. Each history entry also has a timestamp.
- A zero layer image is in a gray area of the OCI spec. Some registries and runtimes will reject it. Parts of the spec will not validate it, other parts of the spec specify that at least one layer “should” exist in the image which allow for exceptions that may not be supported.
Reproducibility is a good goal, but non-trivial to do with container images. In my own images, I modify them after building to strip out mutating timestamps and other data like package install logs. The buildkit team is also starting to work on this, as seen with their support for SOURCE_DATE_EPOC in some of the more recent releases.