I’m able to successfully load a table on my EMR 7 cluster from the Glue Data Catalog with the default Spark catalog via:
val catalog = spark.sessionState.catalogManager.catalog("spark_catalog")
catalog.loadTable(some-info)
Now I want to use Iceberg’s default catalog. I follow this https://iceberg.apache.org/docs/1.5.0/spark-configuration/#replacing-the-session-catalog
and set the following Spark configurations:
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive
spark.sql.catalog.spark_catalog.uri thrift://{cluster-private-dns}:9083
where {cluster-private-dns}
is replaced by what’s found on the EMR console for the primary node’s private DNS name. I also double checked /etc/hive/conf/hive-site.xml
which should be what the default Spark catalog is using as well for the uri:
<property>
<name>hive.metastore.uris</name>
<value>thrift://{cluster-private-dns}:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
So I compared the two and they’re the same. Now running the same initial code block:
4/05/22 18:43:18 INFO metastore: Trying to connect to metastore with URI thrift://{cluster-private-dns}:9083
24/05/22 18:43:18 WARN metastore: Failed to connect to the MetaStore Server...
24/05/22 18:43:18 INFO metastore: Waiting 1 seconds before next connection attempt.
24/05/22 18:43:19 INFO metastore: Trying to connect to metastore with URI thrift://{cluster-private-dns}:908324/05/22 18:43:19 WARN metastore: Failed to connect to the MetaStore Server...
org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34)
at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56)
at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158)
at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:643)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:159)
at org.apache.iceberg.spark.SparkSessionCatalog.loadTable(SparkSessionCatalog.java:139)
at software.amazon.andes.spark.internal.DelegatingCatalogExtension.loadTable(DelegatingCatalogExtension.scala:49)
at software.amazon.andes.spark.AndesCatalog.loadTable(AndesCatalog.scala:58)
... 49 elided
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1742)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:101)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:60)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:72)
at org.apache.iceberg.common.DynMethods$StaticMethod.invoke(DynMethods.java:185)
at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:63)
... 70 more
Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
Why can't Iceberg's SparkSessionCatalog connect to the Hive metastore using the same URI?