I have below dataframe with only one column as value
abc,1,2,345,765,876,Kumar r,Raghvan ,04041996
abc,1,2,345,765,876,"sam Bailey,20541789 #here double quote already present after 6th comma
abc,1011,2,32,678,,,,,
I am looking for regular expression in pyspark which add quotes after 6th comma and before digits .
expected output for above values are below
abc,1,2,345,765,876,"Kumar r,Raghvan" ,04041996
abc,1,2,345,765,876,"sam Bailey",20541789
abc,1011,2,32,678,,,,,
I have tried with below code but not received expected outcome
Use regex to add quotes if they don’t already exist around the 6th column
df_with_quotes = df.withColumn("data_with_quotes",regexp_replace(col("data"), r"((?:[^,],){6})([^"].[^"$])(,[^,]+$)", r'1"2"3'))
Any code snippet appreciated here , Thanks in advance
Mayura Pujari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.