I’m trying to to write my structured streaming data to Apache Hudi in a non-partitioned table and then sync it with BigQuery. But even though it is a new table and I’ve set no partitioning configurations I get this error:
<code>Caused by: com.google.cloud.bigquery.BigQueryException: Error while reading table: hudi_cow_2024_07_01_npt, error message: Table hudi_cow_2024_07_01_npt requested hive partitioning, but no partition keys were detected. This is a sign of misconfiguration. Please ensure the source uri prefix provided is the prefix immediately before partition encoding begins. For instance, 'gs://bucket/table/key1=today/key2=5/foo.txt' would have 'gs://bucket/table' as a valid source uri prefix. 'gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/' was provided as the source uri prefix.
</code>
<code>Caused by: com.google.cloud.bigquery.BigQueryException: Error while reading table: hudi_cow_2024_07_01_npt, error message: Table hudi_cow_2024_07_01_npt requested hive partitioning, but no partition keys were detected. This is a sign of misconfiguration. Please ensure the source uri prefix provided is the prefix immediately before partition encoding begins. For instance, 'gs://bucket/table/key1=today/key2=5/foo.txt' would have 'gs://bucket/table' as a valid source uri prefix. 'gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/' was provided as the source uri prefix.
</code>
Caused by: com.google.cloud.bigquery.BigQueryException: Error while reading table: hudi_cow_2024_07_01_npt, error message: Table hudi_cow_2024_07_01_npt requested hive partitioning, but no partition keys were detected. This is a sign of misconfiguration. Please ensure the source uri prefix provided is the prefix immediately before partition encoding begins. For instance, 'gs://bucket/table/key1=today/key2=5/foo.txt' would have 'gs://bucket/table' as a valid source uri prefix. 'gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/' was provided as the source uri prefix.
Here are my Hudi write configs
<code>"hoodie.datasource.write.recordkey.field" -> "profileid",
"hoodie.datasource.write.precombine.field" -> "firstlaunch_date",
"hoodie.datasource.write.partitionpath.field" -> "",
"hoodie.table.name" -> "hudi_cow_2024_07_01_npt",
"hoodie.datasource.write.table.type" -> "COPY_ON_WRITE", //MERGE_ON_READ, COPY_ON_WRITE hoodie.table.type
"hoodie.metadata.enable" -> "true",
"hoodie.write.set.null.for.missing.columns" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "false",
"hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.NonpartitionedKeyGenerator",
"hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.NonPartitionedExtractor",
"hoodie.gcp.bigquery.sync.use_bq_manifest_file" -> "true",
"hoodie.bq.manifest.enable" -> "true",
"hoodie.meta.sync.client.tool.class" -> "org.apache.hudi.gcp.bigquery.BigQuerySyncTool",
"hoodie.gcp.bigquery.sync.project_id" -> "sr-pr-jio-voot-non-prod",
"hoodie.gcp.bigquery.sync.dataset_name" -> "temp",
"hoodie.gcp.bigquery.sync.dataset_location" -> "asia-south1",
"hoodie.gcp.bigquery.sync.source_uri" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/*.parquet",
"hoodie.gcp.bigquery.sync.source_uri_prefix" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.gcp.bigquery.sync.base_path" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.datasource.meta.sync.enable" -> "true",
"hoodie.clustering.inline" -> "false",
"hoodie.compact.inline" -> "false"
</code>
<code>"hoodie.datasource.write.recordkey.field" -> "profileid",
"hoodie.datasource.write.precombine.field" -> "firstlaunch_date",
"hoodie.datasource.write.partitionpath.field" -> "",
"hoodie.table.name" -> "hudi_cow_2024_07_01_npt",
"hoodie.datasource.write.table.type" -> "COPY_ON_WRITE", //MERGE_ON_READ, COPY_ON_WRITE hoodie.table.type
"hoodie.metadata.enable" -> "true",
"hoodie.write.set.null.for.missing.columns" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "false",
"hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.NonpartitionedKeyGenerator",
"hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.NonPartitionedExtractor",
"hoodie.gcp.bigquery.sync.use_bq_manifest_file" -> "true",
"hoodie.bq.manifest.enable" -> "true",
"hoodie.meta.sync.client.tool.class" -> "org.apache.hudi.gcp.bigquery.BigQuerySyncTool",
"hoodie.gcp.bigquery.sync.project_id" -> "sr-pr-jio-voot-non-prod",
"hoodie.gcp.bigquery.sync.dataset_name" -> "temp",
"hoodie.gcp.bigquery.sync.dataset_location" -> "asia-south1",
"hoodie.gcp.bigquery.sync.source_uri" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/*.parquet",
"hoodie.gcp.bigquery.sync.source_uri_prefix" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.gcp.bigquery.sync.base_path" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.datasource.meta.sync.enable" -> "true",
"hoodie.clustering.inline" -> "false",
"hoodie.compact.inline" -> "false"
</code>
"hoodie.datasource.write.recordkey.field" -> "profileid",
"hoodie.datasource.write.precombine.field" -> "firstlaunch_date",
"hoodie.datasource.write.partitionpath.field" -> "",
"hoodie.table.name" -> "hudi_cow_2024_07_01_npt",
"hoodie.datasource.write.table.type" -> "COPY_ON_WRITE", //MERGE_ON_READ, COPY_ON_WRITE hoodie.table.type
"hoodie.metadata.enable" -> "true",
"hoodie.write.set.null.for.missing.columns" -> "true",
"hoodie.datasource.write.hive_style_partitioning" -> "false",
"hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.NonpartitionedKeyGenerator",
"hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.NonPartitionedExtractor",
"hoodie.gcp.bigquery.sync.use_bq_manifest_file" -> "true",
"hoodie.bq.manifest.enable" -> "true",
"hoodie.meta.sync.client.tool.class" -> "org.apache.hudi.gcp.bigquery.BigQuerySyncTool",
"hoodie.gcp.bigquery.sync.project_id" -> "sr-pr-jio-voot-non-prod",
"hoodie.gcp.bigquery.sync.dataset_name" -> "temp",
"hoodie.gcp.bigquery.sync.dataset_location" -> "asia-south1",
"hoodie.gcp.bigquery.sync.source_uri" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/*.parquet",
"hoodie.gcp.bigquery.sync.source_uri_prefix" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.gcp.bigquery.sync.base_path" -> "gs://dev-bucket-asia-south-1/vinayak77/cdc-poc/kafka-hudi-poc/hudi_cow_2024_07_01_npt/",
"hoodie.datasource.meta.sync.enable" -> "true",
"hoodie.clustering.inline" -> "false",
"hoodie.compact.inline" -> "false"