We have two services running in OpenShift/K8s platform developed using Vertx framework, the cluster.xml is configure to use Kubernetes API and the cluster is getting formed correctly(logs below).
Vertx eventbus is set as in below code
Service 1 : `vertx.eventBus().request("child",s,res-> { /* response handler goes here */ });`
Service 2 : vertx.eventBus().consumer("child", (message) -> { /** message handler goes here **/ });
Service 1:
INFO: [192.168.54.48]:5701 [demo] [5.3.6] Using Discovery SPI 149Apr 23, 2024 12:59:44 > AM com.hazelcast.cp.CPSubsystem
150WARNING: [192.168.54.48]:5701 [demo] [5.3.6] CP Subsystem is not
enabled. CP data structures will operate in UNSAFE mode! Please note
that UNSAFE mode will not provide strong consistency guarantees.
151Apr 23, 2024 12:59:45 AM
com.hazelcast.internal.diagnostics.Diagnostics 152INFO:
[192.168.54.48]:5701 [demo] [5.3.6] Diagnostics disabled. To enable
add -Dhazelcast.diagnostics.enabled=true to the JVM arguments. 153Apr
23, 2024 12:59:45 AM com.hazelcast.core.LifecycleService 154INFO:
[192.168.54.48]:5701 [demo] [5.3.6] [192.168.54.48]:5701 is STARTING
155Apr 23, 2024 12:59:45 AM
com.hazelcast.spi.discovery.integration.DiscoveryService 156INFO:
[192.168.54.48]:5701 [demo] [5.3.6] Cannot fetch the current zone,
ZONE_AWARE feature is disabled 157Apr 23, 2024 12:59:45 AM
com.hazelcast.spi.discovery.integration.DiscoveryService 158INFO:
[192.168.54.48]:5701 [demo] [5.3.6] Kubernetes plugin discovered node
name: cld-paas-d-eusw1b-3-k659z-worker-ds03-zjshb 159Apr 23, 2024
12:59:46 AM com.hazelcast.kubernetes.KubernetesClient 160WARNING:
Cannot fetch public IPs of Hazelcast Member PODs, you won’t be able to
use Hazelcast Smart Client from outside of the Kubernetes network
161Apr 23, 2024 12:59:51 AM
com.hazelcast.internal.cluster.ClusterService 162INFO:
[192.168.54.48]:5701 [demo] [5.3.6] 163 164Members {size:1, ver:1} [
165Member [192.168.54.48]:5701 – 3ee30ad6-4788-4aec-8050-7ed5603abdfa
this 166]
** Service 1 : [After 2nd member in cluster is deployed]**
** Service 1 logs**
INFO: [192.168.54.48]:5701 [demo] [5.3.6] Initialized
new cluster connection between /192.168.54.48:5701 and
/127.0.0.6:52235 202Apr 23, 2024 1:09:26 AM
com.hazelcast.internal.cluster.ClusterService 203INFO:
[192.168.54.48]:5701 [demo] [5.3.6] 204 205Members {size:2, ver:2} [
206Member [192.168.54.48]:5701 – 3ee30ad6-4788-4aec-8050-7ed5603abdfa
this 207Member [192.168.58.91]:5701 –
096a8f32-9212-4323-8fcf-5f505c11587f 208]
Service 2 :
INFO:
[192.168.58.91]:5701 [demo] [5.3.6] Kubernetes plugin discovered node
name: cld-paas-d-eusw1b-3-k659z-worker-ds04-ckrsv 58Apr 23, 2024
1:09:20 AM com.hazelcast.kubernetes.KubernetesClient 59WARNING: Cannot
fetch public IPs of Hazelcast Member PODs, you won’t be able to use
Hazelcast Smart Client from outside of the Kubernetes network 60Apr
23, 2024 1:09:21 AM
com.hazelcast.internal.server.tcp.TcpServerConnection 61INFO:
[192.168.58.91]:5701 [demo] [5.3.6] Initialized new cluster connection
between /192.168.58.91:38785 and /192.168.54.48:5701 62Apr 23, 2024
1:09:26 AM com.hazelcast.internal.cluster.ClusterService 63INFO:
[192.168.58.91]:5701 [demo] [5.3.6] 64 65Members {size:2, ver:2} [
66Member [192.168.54.48]:5701 – 3ee30ad6-4788-4aec-8050-7ed5603abdfa
67Member [192.168.58.91]:5701 – 096a8f32-9212-4323-8fcf-5f505c11587f
this 68]
Issue 1:
Sending message from one service to other service is not working getting below error.
Not connected to server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 – starting queuing
Draining the queue for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9
Cluster connection closed for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9
01:13:42.674 [vert.x-eventloop-thread-2] DEBUG
io.vertx.core.eventbus.impl.clustered.ConnectionHolder –
tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Not connected to server
c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 – starting queuing 01:13:42.774
[vert.x-eventloop-thread-2] DEBUG
io.vertx.core.eventbus.impl.clustered.ConnectionHolder –
tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Draining the queue for
server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9 01:13:42.780
[vert.x-eventloop-thread-2] DEBUG
io.vertx.core.eventbus.impl.clustered.ConnectionHolder –
tx.id=0e24b487-2840-40a5-9103-4f56d31d74ea Cluster connection closed
for server c7ee11dc-98d4-4217-969c-e2ee2ac85ce9
(NO_HANDLERS,-1) No handlers for address child
To troubleshoot this, I added plain Java socket server in Service2 and Socket client in service1 and was able to send message continuosly for 1 hour with 500ms interval, so it doesnt look like a connectivity issue.
Issue 2 :
When service 2 is deployed and service 1 is trying to initialize a cluster connection, Instead of the connecting to the cluster ip, it is trying to connect an unknown ip but cluster is formed with the correct service 2 ip.
Service 1 ip : 192.168.54.48
Service 2 ip : 192.168.58.91
Unknown ip : 127.0.0.6:52235
Application hazelcast cluster.xml file
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-4.0.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.discovery.enabled">true</property>
<property name="hazelcast.rest.enabled">true</property>
<property name="hazelcast.partial.member.disconnection.resolution.heartbeat.count">5</property>
<property name="hazelcast.partial.member.disconnection.resolution.algorithm.timeout.seconds">10</property>
</properties>
<cluster-name>demo</cluster-name>
<!--<split-brain-protection enabled="true" name="probabilistic-split-brain-protection">
<minimum-cluster-size>3</minimum-cluster-size>
<protect-on>READ_WRITE</protect-on>
<probabilistic-split-brain-protection acceptable-heartbeat-pause-millis="5000"
max-sample-size="500" suspicion-threshold="10" />
</split-brain-protection>
<set name="split-brain-protected-set">
<split-brain-protection-ref>probabilistic-split-brain-protection</split-brain-protection-ref>
</set>-->
<network>
<join>
<multicast enabled="false"/>
<kubernetes enabled="true" />
</join>
<interfaces enabled="true">
<interface>192.168.*.*</interface>
</interfaces>
</network>
</hazelcast>