I have a pyspark dataframe as below from a Data Quality results table.
+-------------------+---------------+---------+---------------+-------------+---------------------+
|table_name |key_missing |key_value|retailer_name |detail |report_generated_date|
+-------------------+---------------+---------+---------------+-------------+---------------------+
|customer |customer_id |118 |Apple |Missing |2024-06-05 |
|customer |customer_id |349 |Mueller |Missing |2024-06-05 |
|product_line |product_id |XX097h5 |ECOMEDIA AG |Missing |2024-06-05 |
|purchase_master |purchase_id |907 |kit_retailer_id|Duplicates |2024-06-05 |
|activity_summary |act_id |1208vtt |Media Markt |Duplicates |2024-06-05 |
+-------------------+---------------+---------+---------------+-------------+---------------------+
Now, I would like to pick rows related to each table_name
and add it to separate sections in an excel file along with header using pyspark.
How can I achieve this? Thanks in advance.