I am using Pyspark JDBC to connect to an external database.
Simple queries like SELECT
and INSERT
were executed successfully. I wanted to use TRANSACTION
since I will be dealing with a number of tables with large data.
I tried to run an example from this documentation in SSMS, which obviously works as expected, but the same query returns an error when it was executed in Pyspark.
The error was
Py4JJavaError: An error occurred while calling o1198.load.: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword ‘BEGIN’`
Here is my code.
def sql_query(db: str, query: str):
rtn = (
spark.read.format("jdbc")
.option("url", f"jdbc:sqlserver://{SERVER}:1433;database={db};")
.option("USER", USER)
.option("PASSWORD", PASSWORD)
.option("query", query)
.load()
)
return rtn
The query I used is this:
CREATE TABLE ValueTable (id INT);
BEGIN TRANSACTION;
INSERT INTO ValueTable VALUES(1);
INSERT INTO ValueTable VALUES(2);
ROLLBACK;
I tried to search for any documentation about executing T-SQL with Pyspark JDBC but so far I found none.
5