I have a google cloud run service which is triggered via pubsub and grabs data from an azure api and places it in a gcp bucket using smart_open.
with smart_open.open(destination, "wb") as fout:
schema = pyarrow_adapters.json_schema_to_pyarrow_schema(self.desired_schema)
data_frame.to_parquet(
path=fout,
engine="pyarrow",
compression="gzip",
schema=schema,
use_compliant_nested_type=False,
)
This saves 100s of thousads of files in a few hours time, each day there’s a few failures with this error which I cannot figure out how to fix.
Connection broken: IncompleteRead(15773 bytes read, 125469 more expected)’, IncompleteRead(15773 bytes read, 125469 more expected))
It’s always 15773 but with differing numbers of expected.
The metrics for cloudrun look good, and I have no idea how to debug this further.