I am trying to find instances of where three columns may exceed some number that is stored in a dictionary. I know this code works, but I think there is a better more pythonic way of doing this.
l = []
for index, row in df_usage.iterrows():
item_id = row['item_id']
try:
if (
row['start_count'] > historical_orders_dict[item_id]
or row['end_count'] > historical_orders_dict[item_id]
or row['received_total'] > historical_orders_dict[item_id]
):
l.append(df_usage.loc[[index], :])
except:
pass
df_too_many_consuables = pd.concat(l).reset_index(drop=True)
df_too_many_consuables.shape
Here is an example
historical_orders_dict = {
'10346': 28644.99,
'10877': 28979.99,
'10695': 6200.0,
'70020': 1960.0,
'40265': 57300.0,
'91524': 9750.0,
'60022': 200.0,
'10210': 156.0,
'11040': 49350.0}
data = {
'item_id': ['10346', '10877', '10695', '70020', '40265', '91524',
'60022', '10210','11040'],
'start_count': [100000000, 2, 3, 4, 5, 6, 7, 8, 9],
'end_count': [10, 11, 12, 13, 14, 15, 16, 17, 18],
'received_total': [19, 20, 21, 22, 23, 24, 46, 45, 34]}
df_usage = pd.DataFrame(data=data)
7
Here’s one approach:
item_map = df_usage['item_id'].map(historical_orders_dict)
mask = (df_usage[['start_count', 'end_count', 'received_total']]
.gt(item_map, axis=0)
.any(axis=1)
)
out = df_usage[mask]
Output:
item_id start_count end_count received_total
0 10346 100000000 10 19
Equality OP’s method:
out.equals(df_too_many_consuables)
# True
Explanation
- Select all columns to compare and apply
df.gt
onaxis=0
to check if any value is larger than the mapped values for column ‘item_id’ (Series.map
). Intermediate:
df_usage[['start_count', 'end_count', 'received_total']].gt(item_map, axis=0)
start_count end_count received_total
0 True False False # True: 100000000 > 28644.99
1 False False False
2 False False False
3 False False False
4 False False False
5 False False False
6 False False False
7 False False False
8 False False False
- Chain
df.any
to check row-wise (axis=1)
for anyTrue
. - Use the resulting
mask
for boolean indexing.
Note that if a value from ‘item_id’ doesn’t have a match in historical_orders_dict
, it will get mapped as NaN
. Since NaN
values in pandas
are considered to be neither greater than nor less than any other value, such an ‘item_id’ will never show up in out
. If you do want such a value to show up, you can use Series.map
+ Series.fillna
with some threshold, e.g. 0
.
E.g., suppose we use:
historical_orders_dict = {
'10346': 28644.99,
'NOT_10877': 28979.99, # value 10877 absent
'10695': 6200.0,
'70020': 1960.0,
'40265': 57300.0,
'91524': 9750.0,
'60022': 200.0,
'10210': 156.0,
'11040': 49350.0}
out
with df_usage['item_id'].map(historical_orders_dict)
:
item_id start_count end_count received_total
0 10346 100000000 10 19
out
with df_usage['item_id'].map(historical_orders_dict).fillna(0)
:
item_id start_count end_count received_total
0 10346 100000000 10 19
1 10877 2 11 20