I’m currently using Spark SQL without Databricks. I’m trying to utilize Identity Column Feature in Spark SQL. But this is not working for me.
Can this be utilized without Databricks? I am currently using deltalake which is an opensource. And I think even Databricks uses DeltaLake https://docs.delta.io/latest/releases.html
I’m trying to run this code:
spark.sql(f'''
CREATE TABLE {database}.{table_name} (
regionkey BIGINT IDENTITY (START WITH 1 INCREMENT BY 1),
regionname STRING
)
USING DELTA
LOCATION '{target_csv_delta_path}'
''')
It throws an error:
`An error was encountered:
ParseException
[Traceback (most recent call last):
, File "/tmp/spark-669d5c85-0c49-4963-9061-7b776140f49b/shell_wrapper.py", line 113, in exec
self._exec_then_eval(code)
, File "/tmp/spark-669d5c85-0c49-4963-9061-7b776140f49b/shell_wrapper.py", line 106, in _exec_then_eval
exec(compile(last, '<string>', 'single'), self.globals)
, File "<string>", line 1, in <module>
, File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
, File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
, File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
raise converted from None
, pyspark.errors.exceptions.captured.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'IDENTITY'.(line 3, pos 25)
== SQL ==
CREATE TABLE schema123.test (
regionkey BIGINT IDENTITY (START WITH 1 INCREMENT BY 1),
-------------------------^^^
regionname STRING
)
USING DELTA
LOCATION 'abfss:pathToWrite'
Spark Version = 3.4.2
Expectations:
To utilize identity column without using Databricks.
Alternatives:
I know there exists alternatives such as row_number() / monotonically_increasing_id, etc. But I wanted to utilize this feature.
Thanks & Regards,
Yogesh S
Please help.