I have a simple pyspark setup with a local master and no hive installed.
I create a SparkSession like this:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.conf.set("spark.sql.legacy.createHiveTableByDefault", False)
Next I create a table:
spark.createDataFrame([('Alice', 1)], ['name', 'age']).writeTo("test").create()
This results in a folder test
inside spark-warehouse
, with a parquet file in it.
When I start a new SparkSession in the same way later, this does not read that folder.
It denies that any tables exist:
spark.catalog.listTables()
gives []
And
spark.sql("select * from test")
results in TABLE_OR_VIEW_NOT_FOUND.
How can I make it so that the tables are loaded into the catalog in a new spark session?