I just discovered the difference between pandas’s Series.replace()
and Series.str.replace()
. By default, the first assumes that the search pattern is not a regular expression while the second assumes that it is. The named argument that controls this is regex
. I’ve got a mixture of them throughout my code, so I’m adding exlicit specification of regex
everywhere. It makes for longer lines code, and if I want to avoid lines that take up too much width, I end up continuing logical lines onto multiple physical lines. This makes otherwise simple code messier. For example:
# Before I found out Series.replace vs. Series.str.replace
if ifKeepInnerSpc:
dfShipName['ShpNm'] = dfShipName.ShipName.replace(' +',' ')
else:
dfShipName['ShpNm'] = dfShipName.ShipName.replace(' +','')
# After
if ifKeepInnerSpc:
dfShipName['ShpNm'] =
dfShipName.ShipName.str.replace(' +',' ',regex=True)
else:
dfShipName['ShpNm'] =
dfShipName.ShipName.str.replace(' +','',regex=True)
By itself, it’s harmless, but when surrounded by other code, streamlining has clear advantages.
I can combat the sprawl by dispensing with regex=True
, but it might come back to bite me on the one day (of likely many days) when I’m asleep at the wheel.
What are some common guidelines for when to be explicit about default arguments, knowing that one is buying clarity from explicitness, but at the cost of noisier code?