We’re currently using 3 nodes in our ADX cluster, which are experiencing extreme work load due to running very heavy calculations that results in long periods (between 30-60 mins) of 100% CPU usage of the cluster (all nodes)
The calculation can, and will, be optimize, but this is not the issues.
The main issue is — what is the best way (or ways) to optimized our cluster towards reducing the CPU work load. This is critical since our analytics system is ADX, and when the CPU is at ~100% the dashboards are failing, simple query return with timeout, etc
Another importatn factor is that we’re using ~55%-60% of our cache and in order to optimize towards faster analytics we’re considering the increase the chache from 30 days to 60 days. We have no ingestion issues whatsoever. We’re currently using 3 Standard_L8as_v3 nodes.
We can approch this from a few angles:
- Scalling Up (SKU) – stronger or more suitable SKU
- Scalling Out (Nodes) – more or auto-scalling
- other solutions?
From the research I made it seems that scaling Up is more relevant, but it would be great to hear what you think, and on the experience you had with scaling you ASDX cluster resources 🙂
Another critical think is if there is a risk to data ingestion (I assume not) downtime issues or any other data-related risk with the process and how each approch mimght result at
Many thanks!
we try to increase the numbers of nodes to 6 for an approx 1.5-2 hours but this results with no visible effect
we’re constently optimizing our functions/syntax to be more efficient but the increase in users volume is higher then the optimization rate and thus we thought on more resources