Background
SparkSQL with Scala
Reading Json file to populate data to Delta table
Problem
Trying to read json using scala. The format is as follows
<code>{
Key : "Value"
Components : [
{
id : 1,
name : "Part 1",
properties : {}
}
{
id : 2,
name : "Part 2",
properties : {}
}
]
}
Reading using select command as : withColumn("Properties", col("Components.properties"))
Exception : org.apache.spark.sql.AnalysisException: [FIELD_NOT_FOUND] No such struct field
</code>
<code>{
Key : "Value"
Components : [
{
id : 1,
name : "Part 1",
properties : {}
}
{
id : 2,
name : "Part 2",
properties : {}
}
]
}
Reading using select command as : withColumn("Properties", col("Components.properties"))
Exception : org.apache.spark.sql.AnalysisException: [FIELD_NOT_FOUND] No such struct field
</code>
{
Key : "Value"
Components : [
{
id : 1,
name : "Part 1",
properties : {}
}
{
id : 2,
name : "Part 2",
properties : {}
}
]
}
Reading using select command as : withColumn("Properties", col("Components.properties"))
Exception : org.apache.spark.sql.AnalysisException: [FIELD_NOT_FOUND] No such struct field
Attempts / Observations
- When dummy value is set even to the first component properties field it works fine
<code>{
id : 1,
name : "Part 1",
properties : { "a" : "b"}
}
</code>
<code>{
id : 1,
name : "Part 1",
properties : { "a" : "b"}
}
</code>
{
id : 1,
name : "Part 1",
properties : { "a" : "b"}
}
In this case it outputs value for component 1 and null for component 2
- Tried setting option dropFieldIfAllNull while reading the file as follows
<code> val jsonData = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").option("dropFieldIfAllNull", false).json("/home/phantom/Data.json").createOrReplaceTempView("Contract")
</code>
<code> val jsonData = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").option("dropFieldIfAllNull", false).json("/home/phantom/Data.json").createOrReplaceTempView("Contract")
</code>
val jsonData = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").option("dropFieldIfAllNull", false).json("/home/phantom/Data.json").createOrReplaceTempView("Contract")
Version
Spark : 3.5.2
Scala : 2.12
OS : Ubuntu 24.04
Command : spark-submit –class JsonParser –master local[3] target/scala-2.12/spark-json-parser_2.12-0.1.0-SNAPSHOT.jar
Question
How to read empty value as NULL or Empty String ?