I’m using a Python script to create a man-in-the-middle (MITM) proxy for intercepting HTTPS traffic. The script captures and logs the requests and responses to a log file. While the headers of the HTTPS requests and responses are readable as plain text, the bodies are logged as byte strings, making them difficult to interpret.
Here’s the relevant part of my script that handles the HTTPS request and response:
def relay_data(self, s_ssl, conn_ssl, buffer_size, url):
website_response = []
while True:
try:
request = conn_ssl.recv(buffer_size)
if not request:
break
s_ssl.sendall(request)
except socket.error:
pass
try:
response = s_ssl.recv(buffer_size)
if not response:
break
conn_ssl.sendall(response)
if response:
website_response.append(response)
except socket.error:
pass
if website_response:
log_response_data(website_response, "https")
print(f"Request completed (HTTPS) [{url}]")
def log_response_data(website_response, protocol):
formatted_response, demarcation = "", "____________________________________________________________________________________________________"
for response in website_response:
try:
formatted_response += response.decode('utf-8')
except:
formatted_response += response
if formatted_response:
with open(f"{protocol}_log_file", "a") as F:
F.write(f"{demarcation}n{formatted_response}n{demarcation}nn")
The relay_data function collects the HTTPS responses in the website_response list and then attempts to log them using log_response_data. The logging function tries to decode the responses into UTF-8 format. However, since HTTPS response bodies often contain binary data (e.g., images, files, encrypted content), the decoding fails, and the log ends up containing raw byte strings as follows(the below request is truncated):
______________________________________________________________________________________________________________________
b'HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Encoding: br
Content-Type: text/css; charset=utf-8
Access-Control-Allow-Origin: *
Last-Modified: Mon, 01 Jan 2001 08:00:00 GMT
Expires: Mon, 07 Jul 2025 16:39:40 GMT
Cache-Control: public,max-age=31536000,immutable
reporting-endpoints: permissions_policy="https://www.xx.facebook.com/ajax/browser_error_reports/"
timing-allow-origin: *
document-policy: force-load-at-top
permissions-policy: accelerometer=(), attribution-reporting=(), autoplay=(), battery=(self), bluetooth=(), camera=(), ch-device-memory=(), ch-downlink=(), ch-dpr=(), ch-ect=(), ch-rtt=(), ch-save-data=(), ch-ua-arch=(), ch-ua-bitness=(), ch-viewport-height=(), ch-viewport-width=(), ch-width=(), clipboard-read=(), clipboard-write=(), compute-pressure=(), display-capture=(), encrypted-media=(), fullscreen=(self), gamepad=(), geolocation=(), gyroscope=(), hid=(), idle-detection=(), interest-cohort=(), keyboard-map=(), local-fonts=(), magnetometer=(), microphone=(), midi=(), otp-credentials=(), payment=(), picture-in-picture=(), private-state-token-issuance=(), publickey-credentials-get=(), screen-wake-lock=(), serial=(), shared-storage=(), shared-storage-select-url=(), private-state-token-redemption=(), usb=(), usb-unrestricted=(), unload=(self), window-management=(), xr-spatial-tracking=();report-to="permissions_policy"
cross-origin-resource-policy: cross-origin
X-Content-Type-Options: nosniff
report-to: {"max_age":21600,"endpoints":[{"url":"https://www.xx.facebook.com/ajax/browser_error_reports/"}],"group":"permissions_policy"}
content-md5: yBQVMFwk1cIEUVIB4cPFJw==
X-FB-Debug: FVUkfTRg3dvvwbDFhD8Xj5Bxk8qMudZ3UR/oe+x9HEIHxg+Wh2aQAJqhqUReqxiHUgi/KHRKaUjPsQ3bnnIrkg==
Date: Mon, 08 Jul 2024 11:33:02 GMT
X-FB-Connection-Quality: MODERATE; q=0.3, rtt=152, rtx=0, c=13, mss=1368, tbw=2569, tp=-1, tpl=-1, uplat=1, ullat=-1
Alt-Svc: h3=":443"; ma=86400
Connection: keep-alive
Content-Length: 10118
'b'xe2'b'x1dx96x88xa2>x04(Bx86xb9Gx7fixdfx7f~xbe,Uxa36xfbxc0xe3x1bx1bxa4jxa7x9dxe9xbcxeexf6xdaxc9x1e72xd8$Nxccx11sx84x14qxbdx99xf6xa64x00xe4xdcx99xecxe7xb2x95+"x00xcaxf9Lxa9xf1Axc26xefxd5xcdxecx02Ux0bWx05xbaxaa#xbfsxb4x92xefxd7xdd3xdc}3xa0xb0x86x12xb9xcbcx1dx16<xe3xe9dx9cMx7fx96xc8xd9TY|Axa8Rx12)HB=x86xeaxcbxdeox14t"xf2xccxb6exb8lxf4dxe2xf7Hxb0@x91x90xf0xbbpxd3xe8xe9xe0xafx050xbfzxb6x94xdcdxebxadqx92x80xxx17x86xb6Kxd7xa6`xa11rx01uUxc5+Q%xb3xf4xe9x14x81xee*Ax93xbcxa3x8cx97+xecx8e}xdaxbbgx8bvx16x04?*Ex80v3x90xb7?!xecx0b}x87"xb0x9dx83txdb{xb7x11!xe8xc23cx91^xfcxe1xc1xceCxccxa3x9dxb4xc3x10xc5-xb5xe0xd5Kxaaxcfx03xd9xff<xf0xe1xde}xdatx04%x14xc01L}x1c`xefxc3xe8b8xf2xcfxc5x02t6J/ x03x9e9tx1e/<Rxb3(xd8x12x12xd4qxf6xbb'x00x16xd6v6xbcx8dxc5xc2x8a%Yx9ax83xa5x8adxfbxd9(x16rxa6txc2^xeexa51xdexd8mx05x1d/x84DAxc4xc0Zxd6xdb xxffx1exd6xcdx06xe6x1cx8fbxa1xc2Fxccx90xf90xc3cx15}xbbxb5_xd8j0exd6xd0)}x81x9c:xf30xacx8fx01xe8xea}Sxxc8x03xb8x86X<xa3*)4xacx08xcexc2xdexa8x87x1d5I7xd7xdfxdexdcx88xf7xdfx03xc0<>x15xb0xd6e7xd2x98xb9$xfdxddxfaxc9xbe_x0fxdexech#pxd9/x15x81oxc1ix1fx01xf9x99]x01x0exf7xd5FCxecxb2xtx11x88xa5E'xbexf4wx8bxc0x83wxcdxe9Ux97xbbxfbxdfxf9xa9mx86x08xdcxc2xddxd3xdbxeexaf|xfcxcbxfexb2xfbxd1xebpxdb`xf7x8fxe0x87Kxdbxd0xdfxf2vgxf7xadx05x7f{x06xbf[xe1xb2xdbxacx0fe*x016xeenx11xbfxe4vx9bx8dx8dx1bxf0e,Lxfex9bx97xfexed7NQxcc+x16x81xbfxfdx86x91F1Fxd5+x96!xe93?'........
______________________________________________________________________________________________________________________
My questions are:
- Why do the HTTPS response bodies appear as bytes in my log file?
- How can I decode/decypt the bytes data in my log file?
- Should I distinguish between different content types and handle them separately? If so, how can I implement this in my script?
Any guidance on improving the readability of my HTTPS response logs would be greatly appreciated!