Relative Content

Tag Archive for pythonsqlpysparknlppython-polars

Polars/Spark/SQL Standardize similar company names in table column

I have a table with a column of company names. The same company can appear with a variety of names (e.g. 'Ciao', 'Ciao Inc', 'Ciao Inc User').
I want to provide the same company under different names with a unique identifier, as per the following example (assume the arrays are columns):
['Ciao', 'Ciao Inc', 'HB', 'Ciao Inc User', 'HB lmtd'] -> [1, 1, 2, 1, 2]