I was working around an example from a book compairing performance between C-Contiguous and Fortran-Contiguous arrays, where i observed that the Fortran-Contiguous arrays performed better for row-wise operations than C-Contiguous array when using np.sum(axis=1) ie. opposite of what should be expected. While using np.cumsum(axis=1) performance of C-contiguous was better as expected theoretically. What could be the reason?
The Jupyter Notebook code:
import numpy as np
arr_c=np.ones((1000,1000),order='C')
arr_f=np.ones((1000,1000),order='F')
The observed results were just opposite of what should expected theoretically when used np.sum(axis=1).
%timeit arr_c.sum(axis=1)
605 µs ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit arr_f.sum(axis=1)
398 µs ± 25.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
However the results were as expected when used np.cumsum(axis=1).
%timeit arr_c.cumsum(axis=1)
4.01 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr_f.cumsum(axis=1)
15.6 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Kshitiz Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.