I have read lot of articles on google how kafka cluster works. Trying to summarizing my understanding here . Please let me know if any point is not correct here?
Producer Side
- Say Kafka cluster have one Topic with three partition(P1,P2,P3) on three nodes.
- Each partition has two replica(R11, R12),(R21, R22),(R31, R32) on other nodes.
- Cluster has a zookeeper.
- On kafka startup, kafka elects the leader for each partition say out of P1, R11, R12 , P1 is the leader whereas R11 and R12 are followers.
- Producer connects to Kafka cluster which can be any node in cluster and get all metadata info on client side through kafka client side library.
- Producer connects to leader partition as leader is responsible for all read and writes.
- Partition can be either specified/or selected at run time
based on message key and no of partition/or round robin fashion. - Producer can go for any acknowledgement level i.e. fire and forget/leader acknowledgement/all insynch replica(ISR) acknowledgement. As explained here
There are two common strategies for keeping replicas in sync,
primary-backup replication and quorum-based replication.In
primary-backup replication, the leader waits until the write completes
on every replica in the group before acknowledging the client. If one
of the replicas is down, the leader drops it from the current group
and continues to write to the remaining replicas. A failed replica is
allowed to rejoin the group if it comes back and catches up with the
leader. With f replicas, primary-backup replication can tolerate f-1
failures.In the quorum-based approach, the leader waits until a write completes on a majority of the replicas. The size of the replica group
doesn’t change even when some replicas are down. If there are 2f+1
replicas, quorum-based replication can tolerate f replica failures. If
the leader fails, it needs at least f+1 replicas to elect a new
leader.
- In case leader fails, zookeeper will select new leader from remaining ISR here (after comparing logs either based on primary-backup replication and quorum-based replication) .
My question on point 8 is that if leader fails in any of the strategy i.e. primary based or quorum based, will Kafka system not be available for producer and consumers ?
Consumer side
- Say I have nine consumers all belonging to single consumer group.
- Each consumer will be assigned each specific partition out of nine partition.
- In case any consumer or partition dies, consumer group coordinator will trigger the re balance as explained here
Right, the Producer writes directly to the partition leader and the Consumer reads similarly, so if the leader is down both producing and consuming will be unavailable on that partition until a new leader is elected.
The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier.
https://kafka.apache.org/documentation/#theproducer