I want to convert some JPEG and PNG files to PDF, but I want to do it in a way that the original files are easily recoverable. This is easy for JPEG files: they get embedded directly in an stream object, so you can recover a bit identical copy of them.
In principle, this is impossible with PNG files because the PDF specification defines its own lossless image compression format instead of embedding PNG files as in the case of JPEG.
However, if there were a way to convince the PDF browser to skip the first N bytes of data of an stream, it would be possible to embed some unmodified PNG files in the image stream and the image should read correctly.
In some cases, the compressed image stream starting on the 8th byte of the IDATA chunk (if there is only one, there is no alpha channel and the image is gray or RGB) is compatible with the compressed image stream defined by the PDF specification. In addition, a conformant deflate decompressor ignores any trailing data on a valid compressed stream, so the remaining bytes of the PNG file wouldn’t interfere in the correct parsing of the image stream. The only problem is the leading bytes of the PNG image.
Is there a way to skip some bytes from the beginning of a PDF stream?
Sammet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.