We store zipped files in the storage of a cloud provider which contain certain fields (metadata). These files are derived from other, larger files. Every time we (re)generate these files, their ‘last changed’ date is set to the generation time, while the content of the file is identical. When we recreate one of these files, which have previously been stored in the online storage, their file hashes (md5/sha) differ. The reason for that is that the zip algorithm seems to include the ‘last changed’ information in the .zip file.
We now have cases where we would like to determine if the data stored in the cloud is identical to the newly derived file. A simple comparison of the local hash against the one provided by the cloud provider api fails, for the reasons just stated. The primitive approach would be to download the stored zip file again, unzip, rehash and then do the comparison. This costs money.
A workaround seems to be to artificially force the timestamp on the zip entries to zero. (In Java this is done by zipEntry.setTime(0)). See e.g.: stack overflow This lets us generate .zip files with reproducible hashes, but we loose the information of when the file was generated. This might be a viable workaround, but feels hacky and wrong.
Now, what would be the elegant way to deal with a scenario like this? Is there some smarter hash function of zip files which ignores the timestamps? Is there some better compression algorithm which is reproducible out of the box?