I am a beginner of Pyspark.
I have a dataframe like below and want to calculate the day difference between the “first date” of “Type D” and “last date” of the PREVIOUS row of “Type I”
May I know how I can achieve this with Pyspark codes?
ID Type first_date last_date
A2A I 2023/4/1 2023/4/5
A2A D 2023/4/5 2023/4/7
A2A D 2023/5/10 2023/5/13
BB3 I 2023/5/5 2023/5/6
BB3 I 2023/7/29 2023/8/2
BB3 D 2023/9/30 2023/10/3
5EE I 2023/6/1 2023/6/10
5EE D 2023/7/10 2023/7/12
ID Type first_date last_date Diff_between_first_date_of_D_from_last_date_of_I(expected result)
A2A I 2023/4/1 2023/4/5
A2A D 2023/4/5 2023/4/7 0
A2A D 2023/5/10 2023/5/13 35
BB3 I 2023/5/5 2023/5/6
BB3 I 2023/7/29 2023/8/2
BB3 D 2023/9/30 2023/10/3 59
5EE I 2023/6/1 2023/6/10
5EE D 2023/7/10 2023/7/12 30
Thanks in advance.