I am copying files from ftp to ADLS in bronze folder under the partitioned folder heirarchy like
bronze
|
|
+—– run_date=2023-10-23
|
|
+—- file__2023-10-23.csv
|
|
+—–run_date=2023-10-30
|
|
+—- file__2023-10-30.csv
where this folder is getting created dynamically at ADF copy activity sink side.
Csv file has 3 columns – sid, country, site
Now in Databricks I have created a External table with CSV format which will read the files present on above ADLS location. and when we refresh the table we would be able to query data from the external bronze table.
sid INTEGER,
country STRING,
site STRING,
__run_date__ DATE
)
USING CSV
OPTIONS (header = "true",delimiter=";")
PARTITIONED BY (__run_date__)
LOCATION '{EXTERNALDATA_DATALAKE_NAME_ADLS}'```
it will have 4 columns sid, country, site, __run_date__ (which is generated automatically)
Now when i try to partition this bronze table by adding PARTITIONED BY clause in CREATE TABLE statement for __run_date__. column
Then table gets created but when I query it. It fails to read the table and gives 0 output.
I found out thorugh the spark documentation that if the partitioned table is created from existing data, partitions are not registered automatically in the Hive metastore. User needs to run MSCK REPAIR TABLE to register the partitions.
but when i try to execute MSCK REPAIR TABLE table_identifier this command. I get following error.
org.apache.hadoop.hive.ql.metadata.HiveException: Failure to initialize configuration for storage account [REDACTED].dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
How can i solve this problem?