I did a simple performance test on python 3.12.0
against python 3.13.0b3
compiled with a --disable-gil
flag. The program executes calculations of a Fibonacci sequence using ThreadPoolExecutor
or ProcessPoolExecutor
. The docs on the PEP introducing disabled GIL says that there is a bit of overhead mostly due to biased reference counting followed by per-object locking (https://peps.python.org/pep-0703/#performance). But it says the overhead on pyperformance benchmark suit is around 5-8%. My simple benchmark shows a significant difference in the performance. Indeed, python 3.13 without GIL utilize all CPUs
with a ThreadPoolExecutor
but it is much slower than python 3.12 with GIL. Based on the CPU utilization and the elapsed time we can conclude that with python 3.13 we do multiple times more clock cycles comparing to the 3.12.
Program code:
<code>from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from functools import partial
format='%(levelname)s: %(message)s',
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
cpus = multiprocessing.cpu_count()
pool_executor = ProcessPoolExecutor if len(sys.argv) > 1 and sys.argv[1] == '1' else ThreadPoolExecutor
python_version_str = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
logger.info(f'Executor={pool_executor.__name__}, python={python_version_str}, cpus={cpus}')
def fibonacci(n: int) -> int:
raise ValueError("Incorrect input")
return fibonacci(n-1) + fibonacci(n-2)
start = datetime.datetime.now()
with pool_executor(8) as executor:
for task_id in range(30):
executor.submit(partial(fibonacci, 30))
executor.shutdown(wait=True)
end = datetime.datetime.now()
logger.info(f'Elapsed: {elapsed.total_seconds():.2f} seconds')
<code>from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import datetime
from functools import partial
import sys
import logging
import multiprocessing
logging.basicConfig(
format='%(levelname)s: %(message)s',
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
cpus = multiprocessing.cpu_count()
pool_executor = ProcessPoolExecutor if len(sys.argv) > 1 and sys.argv[1] == '1' else ThreadPoolExecutor
python_version_str = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
logger.info(f'Executor={pool_executor.__name__}, python={python_version_str}, cpus={cpus}')
def fibonacci(n: int) -> int:
if n < 0:
raise ValueError("Incorrect input")
elif n == 0:
return 0
elif n == 1 or n == 2:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
start = datetime.datetime.now()
with pool_executor(8) as executor:
for task_id in range(30):
executor.submit(partial(fibonacci, 30))
executor.shutdown(wait=True)
end = datetime.datetime.now()
elapsed = end - start
logger.info(f'Elapsed: {elapsed.total_seconds():.2f} seconds')
</code>
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import datetime
from functools import partial
import sys
import logging
import multiprocessing
logging.basicConfig(
format='%(levelname)s: %(message)s',
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
cpus = multiprocessing.cpu_count()
pool_executor = ProcessPoolExecutor if len(sys.argv) > 1 and sys.argv[1] == '1' else ThreadPoolExecutor
python_version_str = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
logger.info(f'Executor={pool_executor.__name__}, python={python_version_str}, cpus={cpus}')
def fibonacci(n: int) -> int:
if n < 0:
raise ValueError("Incorrect input")
elif n == 0:
return 0
elif n == 1 or n == 2:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
start = datetime.datetime.now()
with pool_executor(8) as executor:
for task_id in range(30):
executor.submit(partial(fibonacci, 30))
executor.shutdown(wait=True)
end = datetime.datetime.now()
elapsed = end - start
logger.info(f'Elapsed: {elapsed.total_seconds():.2f} seconds')
Test results:
<code># TEST Linux 5.15.0-58-generic, Ubuntu 20.04.6 LTS
INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.54 seconds
INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.33 seconds
INFO: Executor=ThreadPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.48 seconds
INFO: Executor=ProcessPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.03 seconds
<code># TEST Linux 5.15.0-58-generic, Ubuntu 20.04.6 LTS
INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.54 seconds
INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.33 seconds
INFO: Executor=ThreadPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.48 seconds
INFO: Executor=ProcessPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.03 seconds
</code>
# TEST Linux 5.15.0-58-generic, Ubuntu 20.04.6 LTS
INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.54 seconds
INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.33 seconds
INFO: Executor=ThreadPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.48 seconds
INFO: Executor=ProcessPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.03 seconds
Can anyone explain why do I experience such a difference when comparing the overhead to the one from pyperformance benchmark suit?