I’ve a dataset of the form :
Now for each combination of id1 and id2, so suppose id1 = 1 and id2 = 2, for each date value, i want to pick value from rows that lie within 1 week prior and 1 week post of the date in the current row, but in previous year.
So for example, id1 = 1, id2 = 2, date = 2023-06-01 i want to fetch value column values from rows with id1 = 1, id2 = 2 and date between 2022-05-24 and 2022-06-10, get the values from the value column and explode them into new columns.
So the data will finally look like :
If all days are not present in this range, the value will be filled later on by 0, but i need the data in sorted order, so if any middle day is missing, that should reflect in the resulting column.
How can i do this in pyspark or pandas even. I thought of using window and joins, but couldn’t figure this out.
Any help is greatly appreciated.
1