I am writing a dataframe to a Hudi table(which is stored in Minio) by using Spark. Table is synching with Hive. The column that I choose to be partition column have a value like this in some rows: “Mağaza/Dükkan”. It has a slash character middle of it so when i check from Minio, there are a folder named “Mağaza” and a folder named “Dükkan” in it, instead of single folder with name “Mağaza/Dükkan”.
So, hive treats this type of structure as two different values, as shown in the exception:
ERROR HMSDDLExecutor: default.mytable add partition failed
MetaException(message:Invalid partition key & values; keys [uzmanlik_alanlari, ], values [Mağaza, Dükkan, ])
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
..
..
ERROR:main:Py4JJava Error: An error occurred while calling o291.save.
: org.apache.hudi.exception.HoodieMetaSyncException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
..
..
I know I can change the ‘/’ characters with something else but it will cause data loss. How can I resolve this? Thank you.