We are using the Azure java SDK to connect to ADL2 storage, and occasionally get an error like “connection timed out after 10000 ms”. But I don’t understand where that 10000ms timeout comes from, or how to change it. When we create a data-lake client, we specify all of these timeouts:
DataLakeServiceClientBuilder serviceClientBuilder = new DataLakeServiceClientBuilder()
.endpoint("https://" + account.account_name + AZURE_STORAGE_HOST_SUFFIX + "/")
.retryOptions( new RequestRetryOptions(
RetryPolicyType.EXPONENTIAL
, MAX_TRIES // Maximum number of attempts an operation will be retried, default is 4
, TRY_TIMEOUT_SECONDS // Maximum time allowed before a request is cancelled and assumed failed, default is Integer.MAX_VALUE
, RETRY_DELAY_MS // Amount of delay to use before retrying an operation, default value is 4ms when retryPolicyType is EXPONENTIAL
, MAX_RETRY_DELAY_MS // Maximum delay allowed before retrying an operation, default value is 120ms
, null // secondaryHost - Secondary Storage account to retry requests against, default is none
));
Where the constants are defined as:
private static final Integer MAX_TRIES = 13;
private static final Integer TRY_TIMEOUT_SECONDS = null; // overall timeout limit imposed by retry schedule
private static final Long RETRY_DELAY_MS = 60L;
private static final Long MAX_RETRY_DELAY_MS = 60000L;
I think that the 10000ms timeout is some other setting, or perhaps hard-coded somewhere? Can it be changed?
The stack trace is here:
java/nio/file/Paths.get: reactor.core.Exceptions$ReactiveException: io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
reactor.core.Exceptions.propagate(Exceptions.java:410)
reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:101)
reactor.core.publisher.Flux.blockLast(Flux.java:2815)
com.azure.core.util.paging.ContinuablePagedByIteratorBase.requestPage(ContinuablePagedByIteratorBase.java:102)
com.azure.core.util.paging.ContinuablePagedByItemIterable$ContinuablePagedByItemIterator.<init>(ContinuablePagedByItemIterable.java:75)
com.azure.core.util.paging.ContinuablePagedByItemIterable.iterator(ContinuablePagedByItemIterable.java:55)
com.azure.core.util.paging.ContinuablePagedIterable.iterator(ContinuablePagedIterable.java:141)
com.redpointglobal.rg1.nio.adl2.Adl2FileSystem.listFileStores(Adl2FileSystem.java:151)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.initFileStores(CloudFileSystem.java:179)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getFileStoreOrNull(CloudFileSystem.java:173)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem._getPath(CloudFileSystem.java:117)
com.redpointglobal.rg1.nio.core.cloud.CloudFileSystem.getPath(CloudFileSystem.java:87)
com.redpointglobal.rg1.nio.core.FileSystemBase.getPath(FileSystemBase.java:186)
com.redpointglobal.rg1.nio.core.FileSystemProviderBase.getPath(FileSystemProviderBase.java:111)
net.redpoint.system.FileSystemProviderProxy.getPath(FileSystemProviderProxy.java:72)
java.base/java.nio.file.Path.of(Path.java:208)
java.base/java.nio.file.Paths.get(Paths.java:98)
Caused by:
io.netty.channel.ConnectTimeoutException: connection timed out after 10000 ms: ttsdmsmoke1011storageop.blob.core.windows.net/20.209.154.134:443
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$2.run(AbstractEpollChannel.java:615)
io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:416)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base/java.lang.Thread.run(Thread.java:833)
I’ve tried to chase the call stack through the source code, but get lost pretty quickly trying to figure out where the timeout is coming from, because I don’t understand the SDK’s relation to netty.