Relative Content

Tag Archive for pythonpysparkdatabricks

How to extract individual key,values from string type dataframe column in pyspark

dataframe schema is as follows

Flag IDs that have a null value ONLY across repeat observations (pandas/pyspark)

Python/Pyspark noob here. I have a dataset that has an ID variable and multiple rows (# varies) per that ID. An additional variable called ‘description’ is the character variable I’m interested in. I need to check and see if an ID value has all values for description (rows) = null per ID, all rows ne null, or a mixture of null and non-null values. Ideally, I’d want to separate them to where I have a dataset of all null per ID and everything else. My first thought/hope was that the dataset was mutually exclusive and a ID with 1 missing description was missing all rows for description by that ID. Tried testing that by getting unique ID count on all null descriptions and unique count on non-null hoping it would add to the value of total unique ID’s. It doesn’t and it seems like there are some IDs that have both null and non-null descriptions. How do I tease this out? Thanks in advance!

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pythonpysparkdatabricks

How to extract individual key,values from string type dataframe column in pyspark

Flag IDs that have a null value ONLY across repeat observations (pandas/pyspark)