i am using spark 3.3.1
here is the sql_string to query a ds partitioned table
SELECT
'2024-09-09' AS ds,
AVG(v1) AS avg_v1,
AVG(v2) AS avg_v2,
AVG(v3) AS avg_v3
FROM schema.t1
WHERE ds = '2024-09-09'
GROUP BY 1
if i am passing the sql_string directly into spark.sql(sql_string), it can execute without issue.
if i pass the string into catalyst parser, here is the logical_plan in string representation.
Aggregate [1], [2024-09-09 AS ds#164, 'AVG('v1) AS avg_v1#165, 'AVG('v2) AS avg_v2#166, 'AVG('v3) AS avg_v3#167]
+- 'Filter ('ds = 2024-09-09)
+- 'UnresolvedRelation [schema, t1], [], false
and i want to execute the logical plan
val tracker = new QueryPlanningTracker()
// Analyze the logical plan
val analyzedPlan = sparkSession.sessionState.analyzer.executeAndTrack(logical_plan, tracker)
// Optimize the analyzed plan
optimizedPlan = sparkSession.sessionState.optimizer.executeAndTrack(analyzedPlan, tracker)
it will throw error msg as
[GROUP_BY_POS_OUT_OF_RANGE] GROUP BY position 0 is not in select list (valid range is [1, 4])