I have a dataframe df:
num_rows = 5
num_cols = 3
data = [
[10, 20, 30],
[10, 50, 60],
[70, 80, 90],
[20, 30, 10],
[20, 10, 20]
]
columns = [f"Column_{i+1}" for i in range(num_cols)]
df = spark.createDataFrame(data, columns)
|Column_1|Column_2|Column_3|
+--------+--------+--------+
| 10| 20| 30|
| 10| 50| 60|
| 70| 80| 90|
| 20| 30| 10|
| 20| 10| 20|
+--------+--------+--------+
I want to create another column with true/false based on the first column, where the first original value is “true”, and any duplicate would be “fasle”. So it would look like:
|Column_1|Column_2|Column_3|Column_4|
+--------+--------+--------+--------+
| 10| 20| 30| TRUE|
| 10| 50| 60| FALSE|
| 70| 80| 90| TRUE|
| 20| 30| 10| TRUE|
| 20| 10| 20| FALSE|