Referring to Why is NumPy Fast?, it is said that vectorization can avoid for-loops.
So for example, addition of 2 ndarrays:
x1 = np.arange(0, 16).reshape(4, 4)
x2 = np.arange(17, 33).reshape(4, 4)
y = x1 + x2
print(y)
This prints:
array([[17, 19, 21, 23],
[25, 27, 29, 31],
[33, 35, 37, 39],
[41, 43, 45, 47]])
So in this y = x1 + x2
, right, there is no for-loop in this python program. And according to the above document, they are implemented in optimized C.
My question is, in the “optimized C”, is there no for loop?
If no loop in C also, how is the vectorization implemented? I already know numpy ndarray’s basic structure: single byte array, shape, strides, copy/view, etc.
I want to know the basic idea of how vectorization is implemented without for-loop, rather than implementation detail in a specific source code.