I currently have this quick example to work with:
import pandas as pd
left = pd.DataFrame({"left_val": [1, 2, 3, 6, 7]}, index=pd.to_datetime([1, 2, 3, 6, 7], unit='s'))
right = pd.DataFrame({"right_val": ["a", "b", "c"]}, index=pd.to_datetime([1, 5, 10], unit='s'))
# Filter to contain samples that are within the time interval of left
right_filtered = right[(right.index >= left.index.min()) & (right.index <= left.index.max())]
output = pd.merge_asof(left, right_filtered, left_index=True, right_index=True, direction="nearest")
The output I have is:
left_val right_val
1970-01-01 00:00:01 1 a
1970-01-01 00:00:02 2 a
1970-01-01 00:00:03 3 a
1970-01-01 00:00:06 6 b
1970-01-01 00:00:07 7 b
But I would like to have the following:
left_val right_val
1970-01-01 00:00:01 1 a
1970-01-01 00:00:02 2 Nan
1970-01-01 00:00:03 3 Nan
1970-01-01 00:00:06 6 b
1970-01-01 00:00:07 7 Nan
Main difference is that I want the right values to appear only once in the output dataframe and fill with Nan
the other values so that I can create an sparse dataframe and save some space. I would like to avoid iterating the result to set repeated values to Nan
because of:
- Speed reasons
- If I have two consecutive values within
right
, this method will remove original information
I have been searching for input arguments and methods to do something like this but I could not find it.
Thanks!