If I have a df like this one:
df = pd.DataFrame(
{
'A': range(6),
'D': pd.date_range('20240101', periods = 6, freq = 'D')
}
)
df['D'].astype('datetime64[ns]')
df.loc[2 , 'D'] = pd.NaT
How to fill the missing value in the column ‘D’ using:
.interpolate(method = ‘time’) please?
2
You were almost there:
df = pd.DataFrame(
{
'A': range(6),
'D': pd.date_range('20240101', periods = 6, freq = 'D')
}
)
df['D'] = df['D'].astype('int64') ## give time in ns
df.loc[2 , 'D'] = np.nan
df['D'] = df['D'].interpolate()
df['D'] = df['D'].astype('datetime64[ns]') # back to timestamp
display(df)
A D
0 0 2024-01-01
1 1 2024-01-02
2 2 2024-01-03
3 3 2024-01-04
4 4 2024-01-05
5 5 2024-01-06
0
Just interpolate
, it will work out of the box. method='time'
would make sense if your dates were the index and you wanted to interpolate values based on the dates.
df['D'] = df['D'].interpolate()
Output:
A D
0 0 2024-01-01
1 1 2024-01-02
2 2 2024-01-03
3 3 2024-01-04
4 4 2024-01-05
5 5 2024-01-06
If you want to interpolate based on the values in A
, this is indeed not yet supported for datetime dtypes, and you temporalily need to use integers:
s = df.set_index('A')['D']
df['D_interp'] = pd.to_datetime(s.astype(int)
.where(s.notna())
.interpolate('values')
).values
Output (changing 2
to 2.5
for the demo):
A D D_interp
0 0.0 2024-01-01 2024-01-01 00:00:00
1 1.0 2024-01-02 2024-01-02 00:00:00
2 2.5 NaT 2024-01-03 12:00:00
3 3.0 2024-01-04 2024-01-04 00:00:00
4 4.0 2024-01-05 2024-01-05 00:00:00
5 5.0 2024-01-06 2024-01-06 00:00:00
0