I have a dataframe, where I want to get the count of distinct calltypes ((column name).
When I tried using the below spark dataframe transformation code, it gives me data as expected :
fire_df_convrtd_grpby = fire_df_convrtd
.select("CallType")
.where("CallType is not null")
.groupBy("CallType")
.count()
.orderBy("count", ascending=False)
.show()
enter image description here
But when i change he order of the select column, if I have the where method in the begining and select after the count method then the resultant dataset is different
fire_df_convrtd_grpby = fire_df_convrtd
.where("CallType is not null")
.groupBy("CallType")
.count()
.select("CallType")
.orderBy("count", ascending=False)
.show()
enter image description here
why this change of behaviour? as the select coulumn comes in the last does it override the data from count ?
Nazar ahamed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.