I want to combine two statements into one:
inspections_df["Model"] = inspections_df["Model"].mask(
inspections_df['Boiler Model'].str.contains(
product['search for'], na=False, case=False), product['name'])
inspections_df["Manufacturer"] = inspections_df["Manufacturer"].mask(
inspections_df['Boiler Model'].str.contains(
product['search for'], na=False, case=False), manufacturer['manufacturer name'])
They update values in columns Model
and Manufacturer
to constant values stored as product['name']
and manufacturer['manufacturer name']
, where column Boiler Model
matches a regular expression stored in product['search for']
. How can I write this as a single statement and why doesn’t mine below work?
inspections_df["Model", "Manufacturer"] = inspections_df["Model", "Manufacturer"].mask(
inspections_df['Boiler Model'].str.contains(
product['search for'], na=False, case=False), , manufacturer['manufacturer name']])
My attempt resulted in:
KeyError: (‘Model’, ‘Manufacturer’)
referencing inspections_df["Model", "Manufacturer"]
immediately following the =
.
1
The error KeyError: (‘Model’, ‘Manufacturer’) is a direct result of pandas DataFrame not support multiple columns indexing in the manner of df["col1", "col2"]
. Instead, you should use df[["col1", "col2"]]
.
But even after correcting the column indexing, you’ll face another issue. The method mask
cannot be used directly to update multiple columns. Instead, you have to break the process down into two steps, but you can still get it done in one line of code, using the help of numpy.where()
. Here is how you do it:
import numpy as np
mask = inspections_df['Boiler Model'].str.contains(product['search for'], na=False, case=False)
inspections_df["Model"], inspections_df["Manufacturer"] = np.where(mask, (product['name'], manufacturer['manufacturer name']), (inspections_df["Model"], inspections_df["Manufacturer"]))
Please note that this line of code only works when the size of inspections_df["Model"]
and inspections_df["Manufacturer"]
are the same.
This line of code is using the mask to find where the condition applied to the [‘Boiler Model’] column is true, and where it is true, it replaces the values of [“Model”] and [“Manufacturer”] with new values product['name']
and manufacturer['manufacturer name']
respectively, and where it’s false, it keeps the old values.
user27422100 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.