Consider a Scenario i have a two files one is metadata file and another one some list of data frame values, in my metadata file i have the list column name and the data type as the values
The metadata file can be like this
+——————–+——————-+
| dwh_column_name| dwh_data_type|
+——————–+——————-+
| product_id | string|
| phone_num| int|
| phone_format_code| string|
| phone_num_type| string|
+——————–+——————-+
and the data frame file is like
+———–+—————–+———-+——————-+
|product_id |phone_format_code| phone_num| phone_num_type|
+———–+—————–+———-+——————-+
| 3241 | (001)706-2887|5829545023| Home Cell Phone|
| 43df | (391)210-4894|988nn09488| Home Fax Number|
| 45fg | (001)202-8944|37aa501564| Business Phone|
now in the data frame file i need to do the transformation
by using the metadata file i need to check the data types of each and every values if the data are as expected i need to add it in the valid data frame and suppose if the values are not as expected i need to add them in the invalid data frame
i need this to be done in the Py-spark
i need to get the out of same data type in the valid data frame
and the invalid data type of entire row in the invalid data frame
all this transformation to be done in the py-spark