I am working on developing a machine learning model to predict the score for a given team. I want to create a column tracking each team’s cumulative score for their home games up to but not including the current game (row). I can easily calculate the cumulative total, but I want to offset the cumulative total to show the cumulative up to but not including the current game below is an example of the dataset. I would ideally like to create the cumulative column
game_id | game_date | home_id | home_score | cumulative |
---|---|---|---|---|
718730 | 2023-04-03 | 145 | 3 | 0 |
718695 | 2023-04-05 | 145 | 7 | 3 |
718687 | 2023-04-06 | 145 | 6 | 10 |
718683 | 2023-04-06 | 109 | 2 | 0 |
718671 | 2023-04-07 | 109 | 6 | 2 |
718656 | 2023-04-08 | 109 | 12 | 8 |
The code below is what I have done so far to create a cumulative total
import pandas as pd
data = pd.read_csv('game_data.csv')
data['home_cumulative'] = data.groupby('home_id')['home_score'].cumsum()
I have attempted the below with results not matching what I would excpect
data['home_offset'] = data.groupby('home_id')['home_score'].shift(periods = 1).cumsum().fillna(0)