I am trying to read the following json data which is inside a data.json file:
[
[
{
"haha": 5
},
{
"haha": 6
}
],
[
{
"haha": 7
},
{
"haha": 8
}
]
]
I tried the following pyspark code:
from pyspark.sql import SparkSession, Row
# Initialize SparkSession
spark = SparkSession.builder
.appName("Schema Inference Example")
.getOrCreate()
df = spark.read.option("multiLine", True).json("data.json")
df.show()
but this outputs the following:
+----+
|haha|
+----+
|NULL|
+----+
I suspect it has to do with the array of array of dictionaries structure. Is there no way to load a json file which has this structure in pyspark? My file will have more fields than this example and I won’t be able to provide a schema, I was hoping pyspark could do its magic and infer the schema and give me the dataframe.`
Lua is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.