Friends, I am measuring the performance of XDMA on the Z19-P board, PCIe Gen4x16 (the IP core only supports Gen3x16) – and I can’t reach the theoretical speed of at least 12 GB/s, but i get only 5-6 GB/s
My system is Fedora 39, Linux Kernel 6.5.2
As a result of extensive work, I have tried absolutely all debugging options available to me, but the speed remained the same. Here are my observations and assumptions:
-
The processor is not fully utilized. The maximum is 60%. I find this strange because, for example, the
command dd of=/dev/null if=/dev/zero bs=1MB count=10000
fully loads the processor at 100%, whiledd of=/dev/null if=/dev/xdma0_c2h_0 bs=1MB count=10000
, which also transfers bytes via XDMA (I checked), also shows 60% and the same bandwidth. This detail is one of my arguments for why the programs I am using (like dma_from_device.c, etc.) are not related to speed limitations; rather, it is the driver that limits it. Even the oldest Linux system commanddd
cannot handle the transfer properly. Perhaps the current state of the driver is incompatible with some component of the system, for example, with the new version of Fedora 39, and XDMA simply does not deliver full performance due to some bug. -
The Hardware Numbers program (the second graph) shows excellent results. This Xilinx program was created so that we could learn the potential performance figures of our PCIe interface without software and drivers. Thus, the problem is definitely not in the hardware but somewhere in the OS or XDMA.
-
When reconfiguring the XDMA IP core from Gen 3×16 to Gen 3×8, I expected to see the same 5-6 GB/s as in Gen 3×16, but I saw 2.2 GB/s. Both values for both configurations are ~30% of the maximum. Something during operation cuts the speed by ~70%.
-
I noticed that speed increases when I build and do insmod xdma on different versions of the Linux kernel. On 6.5.2, it works 10-20% faster than on 6.9.9.
-
Different versions of the dma_ip_drivers repository do not yield significant results.
-
Previously, I had a problem with Poll Mode. It was performing worse. But with the help of
#define XDMA_DEBUG 1
(Xilinx Support AR71435), I fixed the issue (https://github.com/gonsolo/dma_ip_drivers/commit/c008109f22dae117a748373c58b73e2c482ecceb); however, it did not help the overall performance. I didn’t find anything else strange in this debug log. Except, perhaps, that for some reason a strange number of descriptors is allocated. For example, if the transfer is 1MB, 255 descriptors are allocated, but only 16 are actually used (also writesnents 16/256
). These messages in dmesg are output by the following function from libxdma.c, line 3040:
#ifdef __LIBXDMA_DEBUG__
static void sgt_dump(struct sg_table *sgt)
{
int i;
struct scatterlist *sg = sgt->sgl;
pr_info("sgt 0x%p, sgl 0x%p, nents %u/%u.n", sgt, sgt->sgl, sgt->nents,
sgt->orig_nents);
for (i = 0; i < sgt->orig_nents; i++, sg = sg_next(sg))
pr_info("%d, 0x%p, pg 0x%p,%u+%u, dma 0x%llx,%u.n", i, sg,
sg_page(sg), sg->offset, sg->length, sg_dma_address(sg),
sg_dma_len(sg));`
}
The first graph shows the final results of my measurements with honest figures.
I will be very grateful for any hint from you
First graph
Second graph
DimanYLT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.