Relative Content

Tag Archive for apache-sparkpyspark

Spark conf parameter as spark-submit vs Spark-Session

I have a spark job where I passed all the conf parameter as form of spark-submit where as in other job (same code and same data) where I created spark session and added same config . However the the job execution time is quite different. Does it really matter in peroformance?

Pyspark | how to convert a column value from String to new dataframe?

The existing dataframe is
| header | body |
| ——– | ————– |
| xxx | ‘{“name”:”john”,”age”:20,”emails”:[“[email protected]”,”[email protected]”]}’|
| xxx | ‘{“name”:”jerry”,”age”:30,”emails”:[“[email protected]”,”[email protected]”]}’ |