I am currently trying to benchmark some XDP programs. In this benchmark, I’m varying the throughput of the sender packet generator, between 1 and 98Gbps, and I’m also varying the number of cores that will receive the packets, between 1 and 20 cores.
However, I’m facing a strange phenomenon with the following program:
SEC("xdp")
int simply_drop(struct xdp_md *ctx) {
int cpu = get_and_check_cpu_id();
__u64 arrival_time = bpf_ktime_get_ns();
__u64 finish_time = bpf_ktime_get_ns();
if(!update_info(arrival_time, finish_time, cpu)) {
bpf_printk("Error while looking up timer mapn");
}
return XDP_DROP;
}
It basically gets two times and saves it to a map, for it to be later used to show the average latency when running things between the two bpf_ktime_get_ns()
calls.
But the thing is, when I increase the throughput of the packet generator, the latency decreases. For instance, when I send 1Gbps the latency is around 100ns, while when I send 98Gbps, the latency is 30ns.
Additionally, when the number of cores that process arriving packets increases, the latency increases also.
But since this program in specific, doesn’t do anything in between the two bpf_ktime_get_ns()
, there shouldn’t be any contingency between threads, no? Could the threads be preempted in between the two calls?
In any case, I made sure that all cores were located in the same NUMA node.