I tried running a script like this:
def main():
# Initialize Spark Session with Hive support
spark = SparkSession.builder
.appName("Read Hive Data")
.config("spark.sql.warehouse.dir", "file:///usr/local/hive/warehouse")
.enableHiveSupport()
.getOrCreate()
# Read the table
table_df = spark.sql('SELECT * FROM default.XYZ')
table_df.show()
and it all works fine, it seems the configuration works.
But when I create a separate function in a separate script:
def func1(spark: SparkSession) -> None:
# Read the table
table_df = spark.sql('SELECT * FROM default.XYZ')
table_df.show()
In main then I do:
from script import register_cte_from_hive
spark = SparkSession.builder
.appName("Read Hive Data")
.config("spark.sql.warehouse.dir", "file:///usr/local/hive/warehouse")
.enableHiveSupport()
.getOrCreate()
func1(spark)
And surprisingly, I get:
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view XYZ cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
I verified that I operate on the same venv, in the same project, I don’t understand why spark would find the table without a problem in one script, and not see it in another.
Thanks