epoll_ctl(2) states (emphasis is mine):
EPOLLEXCLUSIVE (since Linux 4.5)
Sets an exclusive wakeup mode for the epoll file
descriptor that is being attached to the target file
descriptor, fd. When a wakeup event occurs and multiple
epoll file descriptors are attached to the same target
file using EPOLLEXCLUSIVE, one or more of the epoll file
descriptors will receive an event with epoll_wait(2). The
default in this scenario (when EPOLLEXCLUSIVE is not set)
is for all epoll file descriptors to receive an event.
EPOLLEXCLUSIVE is thus useful for avoiding thundering herd
problems in certain scenarios.
Would you be so kind as to tell me whether it means the following:
- EPOLLEXCLUSIVE should be used in a
separate epoll fd for each thread
scenario (i.e. and NOT when there is aglobally shared epoll fd
). - It does NOT necessarily mean that only one of the threads (or processes) will be awoken from epoll_wait(2), thus one or more threads could be awoken.
Below is a sample program with 1 event and 10 threads that features a separate epoll fd for each thread
(see how many threads are woken up on just that one event – indicated by WOKEN UP -[...]
))
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <unistd.h>
#define NUM_THREADS 10
#define NUM_EVENTS 1
int p[2];
pthread_t threads[NUM_THREADS];
int event_count[NUM_THREADS];
struct epoll_event evt = {.events = EPOLLIN};
void die(const char *msg) {
perror(msg);
exit(-1);
}
void *run_func(void *ptr) {
int i = 0;
int j = 0;
int ret;
int epfd;
char buf[4];
int id = *(int *)ptr;
int *contents;
if ((epfd = epoll_create(1)) < 0) {
die("create");
}
evt.events |= EPOLLEXCLUSIVE;
ret = epoll_ctl(epfd, EPOLL_CTL_ADD, p[0], &evt);
if (ret) {
perror("epoll_ctl add error!n");
}
while (1) {
ret = epoll_wait(epfd, &evt, 10000, -1);
printf("WOKEN UP - id %d, ret %dn", id, ret);
fflush(stdout);
ret = read(p[0], buf, sizeof(int));
if (ret == 4) {
event_count[id]++;
}
}
}
int main(int argc, char *argv[]) {
int ret, i, j;
int id[NUM_THREADS];
int total = 0;
int nohit = 0;
int extra_wakeups = 0;
if (pipe(p) < 0) {
die("pipe");
}
for (i = 0; i < NUM_THREADS; ++i) {
id[i] = i;
pthread_create(&threads[i], NULL, run_func, &id[i]);
}
for (j = 0; j < NUM_EVENTS; ++j) {
write(p[1], p, sizeof(int));
usleep(100);
}
for (i = 0; i < NUM_THREADS; ++i) {
pthread_cancel(threads[i]);
printf("joined: %dn", i);
printf("event count: %dn", event_count[i]);
total += event_count[i];
if (!event_count[i]) nohit++;
}
printf("total events is: %dn", total);
printf("nohit is: %dn", nohit);
}