I have a Parquet data lake stored in Azure Data Lake Gen2. I am using Synapse to perform serverless SQL queries on this data lake – and I then visualize this via Grafana, using Microsoft SQL Server as the data source. I am using OPENROWSET rather than creating External Tables.
My Parquet files are structured as below:
device/message/yyyy/mm/dd/xyz.parquet
In Grafana, I wish to create ‘Variable’ drop downs for enabling users to dynamically select devices, messages and columns within those messages.
I am struggling to create a single query that extracts the column names as values. I need this in order to create a ‘column’ variable dropdown. I have tried with below query:
SELECT TOP 1
*
FROM
OPENROWSET(
BULK '3BA199E2/CAN2_gnssimu/*/*/*/*',
DATA_SOURCE = 'ParquetDataLake',
FORMAT = 'PARQUET'
) AS r
WHERE
CONCAT(r.filepath(1), '-', r.filepath(2), '-', r.filepath(3)) BETWEEN CONVERT(date, $__timeFrom()) AND CONVERT(date, $__timeTo())
AND
$__timeFilter(t)
This works to some extent as it returns the columns of a selected device/message folder, but it also returns the 1st row. In Grafana, the implication is that the Variable dropdown contains the 1st row values, rather than the column values.
I am looking for a modified query that will instead return the column names as the end result.