This is a pyspark code. I am creating two columns, ‘item’ and ‘properties’.
‘item’ column is of String type and ‘properties’ column is of dictionary type where the type of key is String but the type of value is not fixed. How do I implement this?
Here I have written value type of Integer, but I require that value be of any type. What type should I use for that?
from pyspark.sql.types import StructField, StructType, StringType, MapType, IntegerType
schema = StructType([
StructField('item', StringType(), True),
StructField('properties', MapType(StringType(), IntegerType()),True)
])
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
dataDictionary = [
('Eraser',{'cost':1,'color':'white'}),
('Pencil',{'cost':2,'color':'Blue'}),
('Pen',{'cost':3,'color':'black'}),
('Color',{'cost':4,'color':'grey'}),
('Paper',{'cost':5,'color':''})
]
df = spark.createDataFrame(data=dataDictionary, schema = schema)
df.printSchema()
df.show(truncate=False)