I hope someone can help me.
I am trying to download the attachments of an email where DMARC reports arrive from a domain, I download them using the Gmail API, according to the API documentation the file is downloaded through a base64 encoded string which I obtain correctly “apparently”, but when I try to decode that string with python to obtain the file I get an error that says that the base64 string is not recognized as a valid string and it is possible that it is corrupt.
To rule out that the chain is too long, I tried downloading it in parts and then joining it, but the error continues.
I attach the code I am using and the error that returns:
(The base64 string is just an example)
import base64
import gzip
import io
# Base64 encoded gzip file
base64_gzip = "H4sIAAAAAAAAAKVTuW7cMBCt118RuJcoyl5bCzB0XKRMUrhLI1DUaJexeICkNsfXh5f2MIw0aSTOe6N584Yj8vRLzh-OYJ3Q6uMtrpvbJ3pDJoBxYPyV3mxIIWlTY4LWIOAWjLa-l-DZyDwL0IZou-8Vk0Cfvzx___a1evn8QtAJjBkgmZip0c5L5jzYT0yyP1o5cDXXkqDMx8xSX4x07PCWtVtc8YHz6v6ha6odG6AadtDhu2FiO94SdM6PX4eWoLdM7ZPshgywF4riR_zYdg_3TUNQRhIJakzUXSAjFeNYBF1VOUlcWCZGz4L_7s0yzMIdoIjr4EJRoSbtYM7GChZpNr4KSR1B-ZAgZ6aExHcEDLXwA7gnyKTYnQGXEcM9xbHbeIjAGmIwjO2-k5fYaJc29yh1T-zdacXy6EXhrbNru62dYu7ehvGeSZSHteLCnoE5UPCigYc2baXK0b1w4X6Fj3uitILg_QIpOdG4YS5YXmeQTE4FLGM4G7kSCTeR-ydiBOXFJMJarld5hFkb6Cer5fUNXFMp-wBsBPtO7iWRBN8IEbb4Q2_BLbMvyhc-_n3_abfjh8VrCbLdUxWyjuC_yq3rhN70G9PyMoSNWX_2v5P8j0QNBAAA"
# Ensure the base64 string has the correct padding
missing_padding = len(base64_gzip) % 4
if missing_padding:
base64_gzip += '=' * (4 - missing_padding)
# Decode the base64 string
decoded_gzip = base64.b64decode(base64_gzip)
# Decompress the gzip file
with gzip.GzipFile(fileobj=io.BytesIO(decoded_gzip)) as f:
decompressed_data = f.read().decode('utf-8')
decompressed_data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Cell In[2], line 14
11 base64_gzip += '=' * (4 - missing_padding)
13 # Decode the base64 string
---> 14 decoded_gzip = base64.b64decode(base64_gzip)
16 # Decompress the gzip file
17 with gzip.GzipFile(fileobj=io.BytesIO(decoded_gzip)) as f:
File /usr/local/lib/python3.11/base64.py:88, in b64decode(s, altchars, validate)
86 assert len(altchars) == 2, repr(altchars)
87 s = s.translate(bytes.maketrans(altchars, b'+/'))
---> 88 return binascii.a2b_base64(s, strict_mode=validate)
Error: Incorrect padding
- I tried downloading the base64 string from the api in parts to rule out the string being corrupted in the api response.
- I tried using a different decoding library (binascii) to rule out errors with the base64 library.
- I even tried encoding a file manually (but smaller in size), sending it through the mailbox that I am using, downloading it through the API request and decoding it with the same code and I was successful. Which could indicate that maybe the string I get from the other files does get corrupted, but I don’t know what the reason is.
Adolfo Israel Ramírez Reséndiz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.