I have a pyspark dataframe with below columns
Dataframe: httpClient
[capacity: string, version: string]
and I have a list of columns declared as
httpClient_fields = ["capacity", "`httpClient.install`", "date"]
I need to check the dataframe if it has the list items. If items does not exist in the dataframe, I need to add it with empty values.
So, in the result, I need
Dataframe: httpClient
[capacity: string, version: string, `httpClient.install`: string, date: string]
This is my code now:
df_cols = httpClient.columns
for f in httpClient_fields:
if f not in df_cols:
df_res = df_res.withColumn(f, F.lit(''))
httpClient = httpClient.select(*httpClient_fields).dropDuplicates().repartition(1)
httpClient = httpClient.withColumnRenamed("httpClient.install","httpClient_install")
when I execute this, Im getting
cannot resolve '`httpClient.install`'
Please let me know how to solve this