Performance difference between C-Contiguous and Fortran-Contiguous arrays while using numpy functions
I was working around an example from a book compairing performance between C-Contiguous and Fortran-Contiguous arrays, where i observed that the Fortran-Contiguous arrays performed better for row-wise operations than C-Contiguous array when using np.sum(axis=1) ie. opposite of what should be expected. While using np.cumsum(axis=1) performance of C-contiguous was better as expected theoretically. What could be the reason?