The following code throws an error locally in my IDE with Databricks-connect.
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
spark.sql("CREATE DATABASE IF NOT EXISTS sample")
spark.sql("DROP TABLE IF EXISTS sample.mvp")
spark.sql("DROP TABLE IF EXISTS sample.mvp_from_foreach_batch")
data = [("John", "Doe", 30), ("Jane", "Doe", 25), ("Mike", "Johnson", 35)]
df = spark.createDataFrame(data, ["FirstName", "LastName", "Age"])
df.write.format("delta").mode("overwrite").saveAsTable("sample.mvp")
def foreach_batch_function(df, epoch_id):
df.write.format("delta").mode("overwrite").saveAsTable(
"sample.mvp_from_foreach_batch"
)
spark.readStream.table("sample.mvp").writeStream.foreachBatch(
foreach_batch_function
).outputMode("append").trigger(availableNow=True).start().awaitTermination()
assert spark.table("sample.mvp_from_foreach_batch").count() == 3
This code only works in notebooks or directly on a cluster. It will not run locally in an IDE with Databricks Connect.
Instead error
pyspark.errors.exceptions.connect.SparkException: No PYTHON_UID found for session (some uid)
is raised
In general, Databricks Connect works fine for all other cases.
My local environment:
databricks-connect 14.3.1
databricks-sdk 0.26.0
pyspark 3.5.1
Python 3.11.4
Thomas W. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.