I have the following dataset and I simply want to add the monthly increments (from a column within the dataframe) to my date column (another column in my dataframe).
index monthly_increment date
0 1 2018-12-01
1 2 2018-10-01
2 3 2018-12-01
3 4 2018-01-01
4 5 2018-06-01
I’m hoping to get a new column:
index new_date
0 2019-01-01
1 2018-12-01
2 2019-03-01
3 2018-05-01
4 2018-11-01
There are two relevant posts:
- Using this post, but the timedelta does NOT take month increments
- Using this post, but DateOffset does not take Series/dataframe column.
When I try using DateOffset I get a TypeError
.
>> df['new_date'] = df['date']+ pd.DateOffset(df['monthly_increment'])
TypeError: `n` argument must be an integer, got <class 'pandas.core.series.Series'>
Is there any simple solution to this without using loops/lambda function? My hunch is pandas should have a straightforward solution to this. It is super easy in Spark.
3
Code
It looks like need a loop. In Pandas, you can solve the loop with apply
.
df['new_date'] = df.apply(
lambda x: x['date'] + pd.DateOffset(months=x['monthly_increment']),
axis=1
)
df:
monthly_increment date new_date
0 1 2018-12-01 2019-01-01
1 2 2018-10-01 2018-12-01
2 3 2018-12-01 2019-03-01
3 4 2018-01-01 2018-05-01
4 5 2018-06-01 2018-11-01
If you want to use a for
loop, see the code below.
df['new_date'] = [
date + pd.DateOffset(months=n)
for n, date in zip(df['monthly_increment'], df['date'])
]
same result
Example Code
code that generates the body’s df
as a data frame.
import pandas as pd
data = {'monthly_increment': [1, 2, 3, 4, 5], 'date': ['2018-12-01', '2018-10-01', '2018-12-01', '2018-01-01', '2018-06-01']}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
4
You can do this with some manual date manipulations:
df.date = pd.to_datetime(df.date)
year = (df.date.dt.year + (df.date.dt.month + df.monthly_increment - 1) // 12)
month = (df.date.dt.month + df.monthly_increment - 1) % 12 + 1
pd.to_datetime(dict(year=year, month=month, day=df.date.dt.day))
#0 2019-01-01
#1 2018-12-01
#2 2019-03-01
#3 2018-05-01
#4 2018-11-01
#dtype: datetime64[ns]