I have a pyspark dataframe in which I need to add “translate” for a column.
I have the below code
df1 = df.withColumn("Description", F.split(F.trim(F.regexp_replace(F.regexp_replace(F.lower(F.col("Short_Description")),
r"[/[/]/{}!-]", ' '), ' +', ' ')), ' '))
df2 = df1.withColumn("Description", F.translate('Description', 'ãäöüẞáäčďéěíĺľňóôŕšťúůýžÄÖÜẞÁÄČĎÉĚÍĹĽŇÓÔŔŠŤÚŮÝŽ',
'aaousaacdeeillnoorstuuyzAOUSAACDEEILLNOORSTUUYZ'))
df3 = df2.withColumn('Description', F.explode(F.col('Description')))
I’m getting datatype mismatch error: argument 1 requires string type, 'Description' is of array<string> type
I need to handle the accented letters in Description column.
Please let me know how to solve this