I’m trying to connect from Hadoop cluster to GreenPlum database in Kerberized environment.
Simple java JDBC application connects successfully.
try (Connection conn = DriverManager.getConnection(url, user, pwd)) {
ResultSet resultSet = conn.prepareStatement("select 1").executeQuery();
resultSet.next();
resultSet.getInt(1);
System.out.println("Connection successful.");
} catch (SQLException e) {
e.printStackTrace();
}
But Spark application fails to connect.
ss = SparkSession.builder().getOrCreate();
ss.read().format("jdbc")
.option("url", url)
.option("user", user)
.option("dbschema", schema)
.option("dbtable", table)
.load()
.show();
I start java JDBC program like this:
java -jar -Djava.security.auth.login.config=pgjdbc.conf -Dsun.security.jgss.debug=true ./test-jdbc-connection-1.0-SNAPSHOT.jar param:url=<URL> param:user=<GP_USER>
and get the following output:
Debug is true storeKey false useTicketCache true useKeyTab false doNotPrompt true ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
Principal is <HDP_PRINCIPAL>
Commit Succeeded
Search Subject for Kerberos V5 INIT cred (<HDP_PRINCIPAL>, sun.security.jgss.krb5.Krb5InitCredential)
Connection successful.
where <HDP_PRINCIPAL> is my correct principal.
The Spark application is launched with this script:
export SPARK_MAJOR_VERSION=3
spark-submit
--name "test_GP_connect"
--master yarn
--deploy-mode "client"
--queue <MY_YARN_QUEUE>
--conf .....
........
--conf spark.driver.extraJavaOptions="-Dsun.security.jgss.native=true -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.jgss.debug=true -Dsun.security.krb5.debug=true -Djava.security.auth.login.config=pgjdbc.conf "
--conf spark.executor.extraJavaOptions="-Dsun.security.jgss.native=true -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.jgss.debug=true -Dsun.security.krb5.debug=true -Djava.security.auth.login.config=pgjdbc.conf "
--num-executors 1
--executor-memory 20g
--driver-memory 5g
--files pgjdbc.conf
--jars ./gsp.jar
--class ru.pal.testsparkjdbcconn.Main
test-spark-jdbc-conn-3.0.jar
param:url=<URL>
param:user=<GP_USER>
param:schema=<GP_SCHEMA>
param:table=<GP_TABLE>
and gives the following output:
Debug is true storeKey false useTicketCache true useKeyTab false doNotPrompt true ticketCache is null isInitiator false KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
......................
get normal credential
Principal is <HDP_PRINCIPAL>
Commit Succeeded
Search Subject for Kerberos V5 INIT cred (<GP_USER>, sun.security.jgss.wrapper.GSSCredElement)
As you can see, Spark tries to kinit wiht <GP_USER>!!! instead of using <HDP_PRINCIPAL>
Exception is:
Exception in thread "main" com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: GSS Authentication failed
......
at ru.pal.testsparkjdbcconn.Main.main(Main.java:85)
......
Caused by: org.postgresql.util.PSQLException: GSS Authentication failed
......
... 25 more
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Can't find client principal <GP_USER>@<HDP_REALM> in cache collection)
......
... 40 more
New contributor
pal548 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.