I have a pretty simple function to take an object and upload it to an S3 bucket. Let’s say this object is approximately 1.5gb…
import boto3
import json
from botocore.client import Config
from boto3.s3.transfer import TransferConfig
from io import BytesIO
from pympler import asizeof
def get_s3_client():
config = Config(connect_timeout=60 * 5, retries={"max_attempts": 3})
return boto3.client("s3", config=config, region_name="us-east-1")
def snapshot_json_large(logger, json_data, filepath):
boto3.set_stream_logger("")
config = TransferConfig(
multipart_threshold=1024 * 1024 * 25, # 25mb
max_concurrency=10,
multipart_chunksize=1024 * 1024 * 25,
use_threads=True
)
s3_client = get_s3_client()
s3_client.upload_fileobj(
Fileobj=BytesIO(bytes(json.dumps(json_data, indent=2, default=str).encode("utf-8"))),
Bucket=BUCKET,
Key=filepath + ".json",
ExtraArgs={"ContentType": "application/json"},
Config=config,
)
logger.info(f"transfer size: {asizeof.asizeof(json_data)} bytes")
With the stream logger, everything seems normal, until eventually with no exception logging or anything else, the script is simply killed:
[DEBUG create_endpoint:292] Setting s3 timeout as (300, 60)
2024-06-12 02:59:31,553 botocore.endpoint [DEBUG] Setting s3 timeout as (300, 60)
[DEBUG load_file:174] Loading JSON file: /usr/local/lib/python3.10/dist-packages/botocore/data/_retry.json
2024-06-12 02:59:31,554 botocore.loaders [DEBUG] Loading JSON file: /usr/local/lib/python3.10/dist-packages/botocore/data/_retry.json
[DEBUG _register_legacy_retries:165] Registering retry handlers for service: s3
2024-06-12 02:59:31,555 botocore.client [DEBUG] Registering retry handlers for service: s3
Killed
I’ve tried changing the s3 client configs, the transfer configs for the obj upload, but nothing seems to be working.
When limiting the size of the object this can succeed successfully, transferring files as large as a few hundred megabytes.