I am trying to implement the algorithm described in the image, using MPI. It is part of a University project where we are building a distributed satellite to ground station communication system. I have made multiple iterations of it, but I can never manage to break ties when multiple candidate winners are elected.
Here is the function that handles the election process. This function is called for every single Ground station process (there are 5 in their communication group, GS 0, 1, 2, 3 and 4) and is started via a START_LELECT_GS
event from a coordinator process (rank 10).
void perform_gs_leader_election(int coordinator_rank, int rank, MPI_Comm comm) {
MPI_Status status;
int is_leader = 0;
int received_count = 0;
int neighbor_received_from[num_of_neighbors];
printf("Process %d: Starting leader electionn", rank);
// Determine if the process is a leaf node
const int is_leaf = (num_of_neighbors == 1);
printf("Process %d: Is leaf? %dn", rank, is_leaf);
// Send initial <ELECT> messages only if the process is a leaf node
if (is_leaf) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[0], ELECT, comm);
printf("Leaf Process %d: Sent ELECT to %dn", rank, neighbor_gs[0]);
}
// Receive and process messages from neighbors
while (1) {
int sender_rank;
MPI_Recv(&sender_rank, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, comm, &status);
printf("Process %d: Received message from %d with tag %dn", rank, status.MPI_SOURCE, status.MPI_TAG);
if (status.MPI_TAG == ELECT) {
received_count++;
neighbor_received_from[status.MPI_SOURCE] = 1;
printf("Process %d: Received ELECT from %dn", rank, status.MPI_SOURCE);
// If all neighbors except one have sent ELECT messages, probe for incoming messages before sending ELECT to the remaining neighbor
if (received_count == num_of_neighbors - 1) {
int flag;
MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, comm, &flag, &status);
if (flag && status.MPI_TAG == TERMINATE_LELECT_GS) {
printf("Process %d: Received TERMINATE_LELECT_GS from %dn", rank, status.MPI_SOURCE);
if (status.MPI_SOURCE > rank) {
printf("Process %d: Surrendering leadership to process %dn", rank, status.MPI_SOURCE);
MPI_Send(&rank, 1, MPI_INT, status.MPI_SOURCE, GS_LEADER, comm);
is_leader = 0;
} else {
printf("Process %d: Declaring leadership over process %dn", rank, status.MPI_SOURCE);
is_leader = 1;
}
// send out terminate to all neighbors but the sender
for (int i = 0; i < num_of_neighbors; i++) {
if (neighbor_gs[i] != status.MPI_SOURCE) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[i], TERMINATE_LELECT_GS, comm);
printf("Process %d: Forwarded TERMINATE_LELECT_GS to %dn", rank, neighbor_gs[i]);
}
}
break;
}
if (!flag) {
// if we got elect from all but one, send elect to remaining neighbor
for (int i = 0; i < num_of_neighbors; i++) {
if (!neighbor_received_from[neighbor_gs[i]]) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[i], ELECT, comm);
printf("Process %d: Sent ELECT to remaining neighbor %dn", rank, neighbor_gs[i]);
break;
}
}
}
}
// If all neighbors have sent ELECT messages, this process is a candidate for the leader
if (received_count == num_of_neighbors) {
printf("Process %d: Candidate for leadern", rank);
is_leader = 1;
// send terminate to all neighbors
for (int i = 0; i < num_of_neighbors; i++) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[i], TERMINATE_LELECT_GS, comm);
printf("Process %d: Sent TERMINATE_LELECT_GS to %dn", rank, neighbor_gs[i]);
}
break;
}
} else if (status.MPI_TAG == TERMINATE_LELECT_GS) {
printf("Process %d: Received TERMINATE_LELECT_GS from %dn", rank, status.MPI_SOURCE);
for (int i = 0; i < num_of_neighbors; i++) {
if (neighbor_gs[i] != status.MPI_SOURCE) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[i], TERMINATE_LELECT_GS, comm);
printf("Process %d: Forwarded TERMINATE_LELECT_GS to %dn", rank, neighbor_gs[i]);
}
}
break;
} else if (status.MPI_TAG == GS_LEADER) {
printf("Process %d: Declared leader due to GS_LEADER messagen", rank);
is_leader = 1;
for (int i = 0; i < num_of_neighbors; i++) {
if (neighbor_gs[i] != status.MPI_SOURCE) {
MPI_Send(&rank, 1, MPI_INT, neighbor_gs[i], TERMINATE_LELECT_GS, comm);
printf("Process %d: Sent TERMINATE_LELECT_GS to %dn", rank, neighbor_gs[i]);
}
}
break;
}
}
MPI_Barrier(comm);
printf("Process %d: Terminatedn", rank);
if (is_leader) {
printf("Process %d: Sending LELECT_GS_DONE to coordinatorn", rank);
MPI_Send(&rank, 1, MPI_INT, coordinator_rank, LELECT_GS_DONE, MPI_COMM_WORLD);
}
}
Here is an example of a problematic output:
parsing START_LELECT_GS event
Process 0: Starting leader election
Process 0: Is leaf? 0
Process 1: Starting leader election
Process 1: Is leaf? 0
Process 4: Starting leader election
Process 4: Is leaf? 1
Leaf Process 4: Sent ELECT to 0
Process 2: Starting leader election
Process 3: Starting leader election
Process 3: Is leaf? 1
Leaf Process 3: Sent ELECT to 1
Process 2: Is leaf? 1
Leaf Process 2: Sent ELECT to 1
Process 0: Received message from 4 with tag 22
Process 0: Received ELECT from 4
Process 0: Sent ELECT to remaining neighbor 1
Process 1: Received message from 3 with tag 22
Process 1: Received ELECT from 3
Process 1: Received message from 2 with tag 22
Process 1: Received ELECT from 2
Process 1: Sent ELECT to remaining neighbor 0
Process 1: Received message from 0 with tag 22
Process 1: Received ELECT from 0
Process 1: Candidate for leader
Process 1: Sent TERMINATE_LELECT_GS to 3
Process 1: Sent TERMINATE_LELECT_GS to 0
Process 1: Sent TERMINATE_LELECT_GS to 2
Process 0: Received message from 1 with tag 22
Process 0: Received ELECT from 1
Process 0: Candidate for leader
Process 0: Sent TERMINATE_LELECT_GS to 1
Process 0: Sent TERMINATE_LELECT_GS to 4
Process 4: Received message from 0 with tag 24
Process 4: Received TERMINATE_LELECT_GS from 0
Process 3: Received message from 1 with tag 24
Process 3: Received TERMINATE_LELECT_GS from 1
Process 2: Received message from 1 with tag 24
Process 2: Received TERMINATE_LELECT_GS from 1
Process 0: Terminated
Process 0: Sending LELECT_GS_DONE to coordinator
Process 3: Terminated
Process 4: Terminated
Process 1: Terminated
Process 1: Sending LELECT_GS_DONE to coordinator
process 10 waiting at barrier
Process 2: Terminated
Two candidates arise but despite me trying to probe for an intermediate termination message so that I can break the tie early, it does not work but I cannot figure out the reason why.