I am running a Python script from the Azure Machine Learning (AML) environment. This script queries data from an Azure Data Explorer (ADX) table, using the Kusto Query Language (KQL).
Here is an example KQL query:
QUERY = "my_adx_table | where relative_timestamp >= 1896 and relative_timestamp <= 2396 and my_file_id == 640"
Most of the time, this query works as expected, but for a handful of examples, AML returns the following error:
requests.exceptions.HTTPError: 400 Client Error: BadRequest for url: https://myclustername.myclusterregion.kusto.windows.net/v2/rest/query
According to this SO answer, “a 400 means that the request was malformed. In other words, the data stream sent by the client to the server didn’t follow the rules.” I therefore assumed that I was dealing a data-related issue.
However when trying to reproduce this error locally, I noticed that:
- running the aforementioned query directly in the ADX query pane succeeds
- calling the query from a Python script executed on my local computer succeeds as well (see code below).
Why am I getting this 400 Client Error when running the code from the AML environment?
APPENDIX: Example code to run the KQL query from a local computer:
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.helpers import dataframe_from_result_table
from azure.kusto.data.exceptions import KustoServiceError
QUERY = "my_adx_table | where relative_timestamp >= 1896 and relative_timestamp <= 2396 and my_file_id == 640"
print("QUERY =",QUERY)
adxconn = {
"cluster":"https://myclustername.myclusterregion.kusto.windows.net",
"client_id":"XXX",
"client_secret":"YYY",
"authority_id":"ZZZ",
"kusto_db":"mydbname",
"kusto_ingest_uri": "https://ingest-myclustername.myclusterregion.kusto.windows.net"
}
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(adxconn['cluster'], adxconn['client_id'], adxconn['client_secret'], adxconn['authority_id'])
client = KustoClient(kcsb)
RESPONSE = client.execute_query(adxconn['kusto_db'], QUERY)
print('response',RESPONSE)
df = dataframe_from_result_table(RESPONSE.primary_results[0])
This sample code returns a pandas dataframe containing the desired data.