I have a CSV file with a column company code. The company code is repeated in different rows in CSV. For e.g. – if there are 100 rows then uniquely I may get 20 company codes if I put distinct. Now, my intention is to first uniquely identifies each company code and then either:-
-
Option a) apply hash on unique company code – so that I can save it
in some different delta table -
Option b) maintain some kind of auto-increment number mapping,
1 -> company_code_1
,2 -> company_code_2
, etc
Once I get the hash or auto-increment number that I can use a primary key later in some delta table. And if the mapping is already present then simply get the existing corresponding hash or value of autoincrement number?