I want to create the dataframe using functions.
I have a list of Countries as Row objects.
like below:
country_list=df_correct_countries.select('NewCountry').dropDuplicates().collect()
for i in country_list:
print(i)
###OUTPUT
Row(NewCountry='Senegal')
Row(NewCountry='Algeria')
Row(NewCountry='Nigeria')
Row(NewCountry='Morocco')
Row(NewCountry='Ethiopia')
I am passing this to a create_df function using a for loop, as which takes 2 arguments, (original_df,country):
Below is my create_df function.
def create_df(df,cnt):
#["NewCountry"]
cnt=str(cnt)
# print(cn)
# print(tf)
cnt=df.where(col("NewCountry")==str(cnt))
return cnt
This is how I call my function:
for j in country_list:
create_df(df_correct_countries,j['NewCountry']) ##NewCountry is column name of my coorect country column, which I have collected in list
Inside function, each time the function is called, value of cnt is a country.
I want to create new dataframe ,where I want to filter out rows,which only belongs to current calue of cnt.
But, its not creating the df.
Function runs whithout error, but when I try to display a country, it throws error.
display(EquatorialGuinea)
###Error
NameError: name 'EquatorialGuinea' is not defined
But, when I create dataframe outside the function, with same country, it works.
like this:
df_correct_countries.where(col("NewCountry")=='EquatorialGuinea')
Above works.
Can anyone tell me what’s the issue?
Above all things I have tried.