Since Spark 3.4, SparkSession.sql(…) supports the use of named parameters in SQL queries:
public Dataset<Row> sql(String sqlText, Map<String,Object> args)
Executes a SQL query substituting named parameters by the given arguments
Unfortunately, the Javadoc is silent about how to specify the named parameters in the SQL query, and in other documentation, a variety of different forms can be found, e.g.
- In the pyspark API reference:
{param}
- In the Spark SQL reference docs:
:param
- In answers to similar SO questions, and in other sources:
${param}
However, none of these work for me, even though spark.sql.variable.substitute
is set to true in my context:
//These throw syntax exceptions because the parameter is not substituted
spark.sql("SELECT * FROM {table}", Map.of("table", "my_table"));
spark.sql("SELECT * FROM {table}", Map.of("{table}", "my_table"));
spark.sql("SELECT * FROM :table", Map.of("table", "my_table"));
spark.sql("SELECT * FROM :table", Map.of(":table", "my_table"));
//These replace the parameter with an empty string
spark.sql("SELECT * FROM ${table}", Map.of("table", "my_table"));
spark.sql("SELECT * FROM ${table}", Map.of("${table}", "my_table"));
The ${param}
form looks slightly more promising than the rest, because at least something happens to the query string.
So, long story short: What is the definitive correct form for named parameters, and how do I pass the values for them?