I am loading the multiple parquet under a directory but the data type for one of the column is not inferring properly. I tried couple of setting below mentioned on the internet and stackoverflow
Reading the parquet file with inferSchema and timestampformat
val Input_Df = spark.read.option("inferSchema",true).option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSS").parquet(Input_Path)
Adding the below setting to the spark configuration
val conf = new SparkConf()
conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
conf.set("spark.sql.parquet.datetimeRebaseModeInRead", "LEGACY")
conf.set("spark.sql.parquet.datetimeRebaseModeInWrite", "LEGACY")
conf.set("spark.sql.parquet.int96RebaseModeInRead", "LEGACY")
conf.set("spark.sql.parquet.int96RebaseModeInWrite", "LEGACY")
conf.set("spark.sql.streaming.schemaInference","TRUE")
conf.set("spark.sql.parquet.datetimeRebaseModeInRead","LEGACY")
conf.set("spark.sql.parquet.int96AsTimestamp","TRUE")
conf.set("spark.sql.sources.partitionColumnTypeInference.enabled","TRUE")
val spark: sparkSession = SparkSession.builder().appName(s"Load_Oracle_Date").config(conf).getOrCreate()
But when i try to printSchema of the dataframe it still shows the column type is strig. I want to dynamically update the column type so that’s why i am not using the structType to define the schema or spark sql to convert the dataframe and apply casting on the columns. I am using sparkVersion 3.2.1