https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.Row.html
I’m trying to iterate over a row’s data items. According to this key in row will search through row keys
.
However I don’t see this behavior. Am I missing something???
data = [{"ID": ' ', "PostCode": ' A ', "Value": 121.44, "Truth": True},
{"ID": None, "PostCode": '', "Value": 300.01, "Truth": False},
{"ID": None, "PostCode": ' C', "Value": 10.99, "Truth": None},
{"ID": '', "PostCode": 'E ', "Value": 33.87, "Truth": True}
]
df = spark.createDataFrame(data)
df.show()
cols = [f"any({col} is not null AND trim({col}) != '') as {col}_any_isnot_null_or_empty" for col in df.columns]
rows = df.selectExpr(cols)
rows.show()
row = rows.collect()[0]
print(type(row))
asdict = row.asDict()
for key, val in asdict.items():
print(key, val)
print()
print()
for thing in row:
print(thing, row[thing])
I get results
<class 'pyspark.sql.types.Row'>
ID_any_isnot_null_or_empty False
PostCode_any_isnot_null_or_empty True
Truth_any_isnot_null_or_empty True
Value_any_isnot_null_or_empty True
False False
True True
True True
True True