I have a dataset with 2 columns acting as ID columns and a value column, like this
ID1 | ID2 | value | |
---|---|---|---|
0 | A | X | 10 |
1 | A | Y | 3 |
2 | B | X | 3 |
3 | B | Z | 5 |
4 | A | Z | 7 |
5 | A | X | 23 |
I also have a list of pairs of IDs that I want to select from the dataframe , like this
ID1 | ID2 |
---|---|
A | X |
B | Y |
B | Z |
I want to select all the rows in the dataframe that have both the IDs of each pair in my list of pairs.
In this example the rows of the dataframe that should be returned are
ID1 | ID2 | value | |
---|---|---|---|
0 | A | X | 10 |
3 | B | Z | 5 |
5 | A | X | 23 |
Assuming that both tables are stored as pandas.DataFrame, I know that if it was only one column I could filter using
df[df['ID1'].isin(list_of_pairs['ID1'].values)
But I can’t extend this method for 2 columns like this:
cond1 = df[df['ID1'].isin(list_of_pairs['ID1'].values)
cond2 = df[df['ID2'].isin(list_of_pairs['ID2'].values)
condition = cond1 & cond2
df[condition]
Because this will return all the pairs whose ID1 and ID2 are both present in the list of pairs, but not necessarily only the pairs that are in the list of pairs. In this example all the possible combinations of (A,B) and (X,Y,Z) will be returned.
One workaround that i found is to define
df['ID3'] = df.apply(lambda x: x['ID1'] + x['ID2'], axis=1)
and then filtering on this single column, but I don’t really like this solution.
Do you know how to select these rows using 2 columns as identifiers?