My DataFrames are:
import padnas as pd
df_1 = pd.DataFrame(
{
'a': [10, 12, 14, 20, 25, 30, 42, 50, 80]
}
)
df_2 = pd.DataFrame(
{
'start': [9, 19],
'end': [26, 50],
'label': ['a', 'b']
}
)
Expected output: Adding column label
to df_1
:
a label
10 a
12 a
14 a
20 a
25 a
20 b
25 b
30 b
42 b
50 b
df_2
defines the ranges of labels. So for example, the first row of df_2
start of the range is 9 and the end is 22. Now I want to slice df_1
based on start and end and give this label to the slice. Note that start
is exlusive and end
is inclusive. And my labels ranges are overlapping.
These are my attempts. The first one works but I am not sure if it is the best.
# attempt_1
dfc = pd.DataFrame([])
for idx, row in df_2.iterrows():
start = row['start']
end = row['end']
label = row['label']
df_slice = df_1.loc[df_1.a.between(start, end, inclusive='right')]
df_slice['label'] = label
dfc = pd.concat([df_slice, dfc], ignore_index=True)
## attempt 2
idx = pd.IntervalIndex.from_arrays(df_2['start'], df_2['end'], closed='both')
label = df_2.iloc[idx.get_indexer(df_1.a), 'label']
df_1['label'] = label.to_numpy()