I have a bucket with about 5’000’000 objects inside (as shown from the MinIO web interface).
I need to traverse every single object (and later on do some processing on them).
To test the functionality, I want to start counting every individual object (which should add up to the same number as reported by MinIO)
I tried using boto3 using list_object, list_object_v2 and going through boto3 resource, and of course using pagination as well to no avail.
Without prefixes (i.e. counting the whole bucket) it always seem to count up to about 200k objects (a little less).
When I start defining some first-level prefixes, the count goes up.
I can’t afford to list all prefixes as the structure of the data is very messy, and not all data is always on “leaf-paths” (i.e. there might be some data next to some “subfolder”)
I initialize the client / resource as follows:
import boto3
custom_config = Config(
retries = {
'max_attempts': 10,
'mode': 'standard'
},
signature_version='s3v4'
)
minio_client = boto3.client(
's3',
endpoint_url=<URL>,
aws_access_key_id=<ID>,
aws_secret_access_key=<KEY>,
aws_session_token=None,
verify=False,
config=custom_config,
)
minio_resource = boto3.resource(
's3',
endpoint_url=<URL>,
aws_access_key_id=<ID>,
aws_secret_access_key=<KEY>,
aws_session_token=None,
verify=False,
config=custom_config,
)
And then I try counting the objects as such:
bucket_name = <BUCKET_NAME>
bucket = minio_resource.Bucket(bucket_name)
cnt = 0
try:
for page in bucket.objects.all(): # ! Also tried *.page_size(1000):
for obj in page:
cnt += 1
if cnt % 10000 == 0:
print("-------------------------------")
print(obj)
print(f"Processed {cnt} objects so far")
except Exception as e:
print(f"An error occurred: {e}")
print(f"Total objects counted: {cnt}")
try:
paginator = minio_client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket_name) # Or with prefixes: , Prefix=pfx)
for page in page_iterator:
for obj in page.get('Contents', []):
cnt += 1
if cnt % 10000 == 0:
print("-------------------------------")
print(obj)
print(f"Processed {cnt} objects so far")
except Exception as e:
print(f"An error occurred: {e}")
print(f"Total objects counted: {cnt}")
4