I am trying to query 1m entries from an ADX cluster via Java SDK. Since the entries are a lot I am performing it via paginated queries of size 50k. 50k entries for my table correspond to 70MB which as you can already tell I am getting the ‘Query result set has exceeded the internal data size limit 67108864 (E_QUERY_RESULT_SET_TOO_LARGE).’ error.
I have tried using query/memory-consumption limits such as maxmemoryconsumptionperiterator and max_memory_consumption_per_query_per_node with values 64GB (maximum allowed value for my cluster) but still getting the same error (it timeouts which should not be the case because the query takes at most 13sec to return the results via the editor. ADX in Java SDK for query commands has a default timeout of 4 minutes, which suffices.Tried even applying the norequesttimeout request property but again no results). Setting the values of truncationmaxsize, truncationmaxrecords didn’t work as well. But since truncation is not wanted here, I removed those limits.
If I reduce the size of each query to 10k (14MB each), it fetches a few results and then I am getting the same error again.
I even tried using limits such as notruncation but same results.
My kusto query is just querying based on a date range (assuming of getting the first page):
Table
| where date > (the date from)
| where date < (the date to)
| order by date desc
| extend Num = row_number()
| where Num between(1 .. 50000)
Note: I tried using leftouter join (which is not the desired result/method on my problem) on the table which it had the following result:
With size of 10k, it queried all 1m rows successfully.
With size of 50k, it queried 100k rows and then same error.
Any idea how could I overcome this in order to query all rows?
Thank you in advance!
azure test is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.