I am evaluating MinIO to measuring read performance of blob data. I am performing multiple iterations of read from the same bucket and I don’t want caching to interfere with the performance runs and skew the results. The code to read the data is given below:
from minio import Minio
import time
def read_blobs(bucket_name):
objects = client.list_objects(bucket_name, prefix='', recursive=True)
object_contents = []
start_time = time.time()
for obj in enumerate(objects):
# Read the object content
try:
data = client.get_object(bucket_name, obj.object_name)
content = data.read()
object_contents.append(content)
except Exception as e:
print(f"Error reading object '{obj.object_name}': {e}")
elapsed_time = time.time() - start_time
print(f"Total time to read {num_blobs} objects is {elapsed_time} seconds")
return object_contents
client = Minio(
"localhost:9000",
access_key="minioadmin",
secret_key="minioadmin",
secure=False
)
read_blobs("my-bucket")
I have 1000 objects each of size ~4MB within the bucket. The first measurement typically takes 50 seconds for reading the entire data (approximately ~50 ms per object).
If I run the script again, it is very fast and just takes 5 ms per object read and the whole read of all objects within the bucket finishes within 5 seconds.
I am using SLES 15.1 Linux server and running MinIO in its own container and the application code in its own container, with both containers using host network.
The disk read performance measurements on the host linux system with 4 MB blocks is close to 250MB/s. So, I don’t think it is reading from disk from the 2nd time onwards. Not sure if the data is read from some cache.
dd if=/dev/zero of=/mnt/tstdrv/testfile bs=4096k count=1000 oflag=direct conv=fdatasync > dd-write-drive1.txt
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 29.8557 s, 140 MB/s
dd if=/mnt/tstdrv/testfile of=/dev/null bs=4096k iflag=direct > dd-read-drive1.txt
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 16.5499 s, 253 MB/s
So my question is, across multiple iterations of reading same objects from MinIO, I want to measure performance consistently across all these runs, and I need each run to be treated as a cold read of the data. How do I accomplish this in MinIO. Is there some cache setting that I can disable in MinIO?