I am fairly new to Python and pandas. In my data cleaning, I would like to see the I performed previous cleaning steps correctly on a string column. In particular, I want to see where the strings begin and end, regardless of whether they have leading/trailing white space.
The following is meant to bookend each string with a pair of single underscores, but it seems to generate two extra unintended underscores at the end, resulting in a total of three trailing underscores:
>>> df = pd.DataFrame({'A':['DOG']})
>>> df.A.str.replace(r'(.*)',r'_1_',regex=True)
0 _DOG___
Name: A, dtype: object
I’m not entirely new to regular expressions, having used them with sed
, vim
, and Matlab
. What is it about Python’s implementation that I’m not understanding?
I am using Python 3.9 for compatibility with other work.