The following two functions logically do the same thing, but func2
, which stores the result in a temporary variable, is slower.
def func1(a):
return a.astype(np.float64) / 255.0
def func2(a):
t = a.astype(np.float64)
return t / 255.0
Benchmark:
import sys
import timeit
import numpy as np
def func1(a):
return a.astype(np.float64) / 255.0
def func2(a):
t = a.astype(np.float64)
return t / 255.0
def main():
print(f"{np.__version__=}")
print(f"{sys.version=}")
size = 1_000_000
n_repeats = 100
n_epochs = 5
a = np.random.randint(0, 256, size=size, dtype=np.uint8)
for f in [func1, func2] * n_epochs:
times = timeit.repeat(lambda: f(a), number=1, repeat=n_repeats)
print(f"{f.__name__}: {min(times)}")
main()
Result:
np.__version__='1.26.4'
sys.version='3.12.4 (main, Jul 16 2024, 19:42:31) [GCC 11.4.0]'
func1: 0.0013509448617696762
func2: 0.0035865511745214462
func1: 0.0013513723388314247
func2: 0.0034992704167962074
func1: 0.0013509979471564293
func2: 0.003565799444913864
func1: 0.0013509783893823624
func2: 0.003563949838280678
func1: 0.0013510659337043762
func2: 0.003569650463759899
For a size
of 1_000_000_000
:
func1: 2.503432061523199
func2: 3.3956982269883156
func1: 2.503927574492991
func2: 3.393561664968729
func1: 2.5052043283358216
func2: 3.3980945963412523
func1: 2.503149318508804
func2: 3.39398608263582
func1: 2.5073573794215918
func2: 3.396817682310939
Although the relative difference has decreased, the absolute difference is now almost 1 second.
So, I assume there is some difference that affects the entire array, but I couldn’t figure out what it was.
Here are the bytecodes for the two functions printed by the dis module.
8 0 RESUME 0
9 2 LOAD_FAST 0 (a)
4 LOAD_ATTR 1 (NULL|self + astype)
24 LOAD_GLOBAL 2 (np)
34 LOAD_ATTR 4 (float64)
54 CALL 1
62 LOAD_CONST 1 (255.0)
64 BINARY_OP 11 (/)
68 RETURN_VALUE
12 0 RESUME 0
13 2 LOAD_FAST 0 (a)
4 LOAD_ATTR 1 (NULL|self + astype)
24 LOAD_GLOBAL 2 (np)
34 LOAD_ATTR 4 (float64)
54 CALL 1
62 STORE_FAST 1 (t)
14 64 LOAD_FAST 1 (t)
66 LOAD_CONST 1 (255.0)
68 BINARY_OP 11 (/)
72 RETURN_VALUE
As you can see, the only difference is STORE_FAST
and LOAD_FAST
of t
, which I thought would not affect the entire array. What am I misunderstanding?
Please note that I was only able to reproduce this on Ubuntu (probably GCC), and not on Windows.
I can’t tell whether it’s OS-dependent or hardware-dependent.