I am writing a python program which parses zip (currently only zlib, using DEFLATE compression) files and verifies the correctness of their headers and data. One of the things I’m trying to achieve is calculating the uncompressed size of a compressed (DEFLATE-d) file inside a zip archive, without actually uncompressing the file and, obviously, not relying on the uncompressed size field found in the file record’s headers. This is so that I can ensure that none of the zip record’s fields have been tampered with (in this case, the uncompressed size field).
I’ve gone through the ZIP specification (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) over and over but am in sort of a brain fart and don’t see any way to do this without completely parsing the huffman trees and calculating the corresponding stream size, which is what I don’t want to do. I will appreciate any idea or direction regarding how to do this.
To clarify, I’m not looking for a librarymodule to do this for me, rather a direction how it can be done.
Much thanks.
9