I have a local setup of Trino, Hive Metastore, and Minio storage. I have enabled and configured Alluxio caching and Disk Spilling on Trino. The number of requests made to the object storage is higher than expected. Given that I am only testing on a few megabytes of Parquet files.
What could be the problem? and the solution?
Here are my configurations in /etc/trino/config.properties .
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
catalog.management=${ENV:CATALOG_MANAGEMENT}
query.max-memory=2GB
query.max-memory-per-node=700MB
exchange.http-client.max-requests-queued-per-destination=999999
scheduler.http-client.max-requests-queued-per-destination=999999
exchange.http-client.request-timeout=30s
task.info-update-interval=2s
spill-enabled=true
spiller-spill-path=/tmp/spill
spiller-max-used-space-threshold=0.7
spiller-threads= 16
max-spill-per-node=100GB
query-max-spill-per-node=100GB
aggregation-operator-unspill-memory-limit=15MB
spill-compression-codec=LZ4
spill-encryption-enabled=false
Here are my catalog configurations in /etc/trino/catalog/hive.properties
connector.name=hive
hive.metastore=thrift
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.path-style-access=true
hive.s3.endpoint=http://minio:9000
hive.s3.aws-access-key=XXX
hive.s3.aws-secret-key=XXX
hive.non-managed-table-writes-enabled=true
hive.s3.ssl.enabled=false
hive.s3.max-connections=1000
hive.metastore.thrift.client.read-timeout=3000s
hive.timestamp-precision=MILLISECONDS
hive.collect-column-statistics-on-write=false
hive.storage-format=PARQUET
hive.security=allow-all
fs.cache.enabled=true
fs.cache.directories=/tmp/cache
fs.cache.max-disk-usage-percentages=70
fs.cache.ttl=32d
fs.cache.preferred-hosts-count=5
fs.cache.page-size=15MB
Thanks in advance.