I’m using polars library in python to manipulate some dataframe.
I’m trying to do the following:
:
For some dataframe:
Person.. | Fight with | On |
---|---|---|
A | B | 3 Jan |
A | C | 4 Jan |
A | D | 5 Jan |
A | E | 5 Jan |
A | B | 10 Jan |
A | B | 20 Jan |
A | C | 20 Jan |
I want to return the “distance” between the current fighter-pair and the first fight they had, such that:
:
Person.. | Fight with | On | Distance |
---|---|---|---|
A | B | 3 Jan | 0 Days |
A | C | 4 Jan | 0 Days |
A | D | 5 Jan | 0 Days |
A | E | 5 Jan | 0 Days |
A | B | 10 Jan | 7 Days (i.e. 10 Jan – 3 Jan); (CurrentDate – ABFirstFight) |
A | B | 20 Jan | 17 Days (i.e. 20 Jan – 3 Jan); (CurrentDate – ABFirstFight) |
A | C | 20 Jan | 16 Days (i.e. 20 Jan – 4 Jan); (CurrentDate – ACFirstFight) |
<What I’ve Tried>:
- polars “first” function: Only returned the head of the dataframe
- polars “first” function with some combinations of “over”/”group_by”/”rolling” functions: Returned some numbers, but I can’t make sense of why the output was that way
Does anyone have any advice on how to attempt this?
I think I might need to use some combination of “group_by” or “over”, “first”, and perhaps “sub” (to subtract two dates?), but I’m not sure how to proceed.
The hardest part for me is to try to extract the first entry of a given group (e.g. first date entry of the A-B pair, or the A-C pair, etc.)