I am trying to rollback a configuration change in an HDInsight kafka cluster with version 5.1.3000.0 running kafka 3. As background, an attempt was made to enable SASL_PLAINTEXT (in test cluster) and when this was not working as expected a configuration rollback (Using Ambari UI) was performed. As standard the HDInsight Kafka cluster are running GSSAPI, and this is the relevant (?) broker config values after rollback.
sasl.enabled.mechanisms: GSSAPI
sasl.mechanism.inter.broker.protocol: GSSAPI
security.inter.broker.protocol: PLAINTEXT
The zookeeper do not really have anthing regarding SASL in it’s config files that was ever enabled, but here it the config value of a SASL related config.
quorum.auth.enableSasl=false
However after the rollback the connection between broker and zookeeper is not working due to AUTH_FAILED. As far as I can se nothing in the zookeeper configs suggest that anything should remain of the SASL attempt. Brokers and zookeepers have been restarted.
At this stage I assume the zookeeper has stored something outside the config files that are not rolled back when the config is rolled back.
This is the zookeeper log from when the broker try to connect. Seems to indicate that is still want to use SASL?
2024-05-02 09:29:41,724 - ERROR [NIOWorkerThread-5:ZooKeeperServer@1712] - cnxn.saslServer is null: cnxn object did not initialize its saslServer properly.
2024-05-02 09:29:41,780 - WARN [NIOWorkerThread-3:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /10.XX.XX.XX:57540, session = 0x2000cb2e54b03f6
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
This is the log from the broker.
[2024-05-02 09:05:07,741] DEBUG Initializing task scheduler. (kafka.utils.KafkaScheduler)
[2024-05-02 09:05:07,747] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2024-05-02 09:05:07,801] DEBUG [ZooKeeperClient Kafka server] Received event: WatchedEvent state:SyncConnected type:None path:null (kafka.zookeeper.ZooKeeperClient)
[2024-05-02 09:05:07,803] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2024-05-02 09:05:07,811] DEBUG [ZooKeeperClient Kafka server] Received event: WatchedEvent state:AuthFailed type:None path:null (kafka.zookeeper.ZooKeeperClient)
[2024-05-02 09:05:07,812] ERROR [ZooKeeperClient Kafka server] Auth failed, initialized=true connectionState=AUTH_FAILED (kafka.zookeeper.ZooKeeperClient)
[2024-05-02 09:05:07,820] DEBUG Scheduling task auth-failed with initial delay 1000 ms and period -1 ms. (kafka.utils.KafkaScheduler)
[2024-05-02 09:05:07,885] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /consumers
at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:566)
at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1728)
at kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1626)
at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1(KafkaZkClient.scala:1618)
at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1$adapted(KafkaZkClient.scala:1618)
at scala.collection.immutable.List.foreach(List.scala:431)
at kafka.zk.KafkaZkClient.createTopLevelPaths(KafkaZkClient.scala:1618)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:501)
at kafka.server.KafkaServer.startup(KafkaServer.scala:203)
at kafka.Kafka$.main(Kafka.scala:109)
at kafka.Kafka.main(Kafka.scala)
The log is copied from two different attempts, and this is why the time differs.
Any pointers?