Relative Content

Tag Archive for apache-sparkpysparkapache-spark-sqlhive

How to convert hive process column to Pyspark (INPUT__FILE__NAME)

I have a Hive process that I’m migrating to PySpark.
I’m encountering a problem that I can’t seem to solve.
I have an insert into from a staging table to a final table where the partition is generated from the file name.
The select statement looks something like this: