I have a python method along the lines of this:
def my_big_method():
# stuff happens before this
df = (
spark.read
.format("jdbc")
.option("url", f"jdbc:postgresql://{host}:{port}/{database}")
.option("query", query)
.option("user", username)
.option("password", password)
.option("driver", "org.postgresql.Driver")
.load()
)
# stuff happens after this
return df
I’m having trouble mocking the spark session. Basically I want that in the unit test of this method, df
is s predetermined data frame like this:
spark.createDataFrame([(1, "Alice"), (2, "Bob")]
How would I go about doing that? Which method is the one I have to patch? I have tried:
patch("pyspark.sql.SparkSession.builder.getOrCreate")
patch("pyspark.sql.SparkSession")
And in both cases, spark tries to make a call to the DB in the unit test.