Hi Guys I have a massive Excel file that I need to investigate for certain data quality Issues via python.
I’ve chosen the following block to highlight certain problems identified by regex expressions:
def highlight_text(s, props=''):
return np.where(s.str.contains(r'[0-9]{5} [a-zA-Zß]*, [a-zA-Zß]* [0-9]*'), props, '')
However in this case my regex is to determine, wether an adress is correctly formatted. In my sheet I want to highlight all cells that DON’T follow this pattern.
This is a Sample of my Adress Data:
12345 Oftersheim
23456 Heidelberg, Müllerstraße 84
95746 Weinheim
23456 Heidelberg, Haldenweg 69
Only the values with Zip, City Street and Number shouldn’t be highlighted.
How can I use the opposite of the np.where?
1