In Spark 3.5.0+, performance differences between Python UDTF vs Custom Python Data Source?
Spark 3.5.0 (well, actually Databricks 15.4)
Max along with Window Function – PySpark
I have a sample dataset available with me, which is as below :
enter image description here