I understand that in general it’s (somewhat) faster to iterate over a list rather than over a set or dict keys. Confirming:
import timeit
list_setup = 'my_list = list(range(100))'
list_stmt = 'for n in my_list: n**2'
print(f'list : {timeit.timeit(setup=list_setup, stmt=list_stmt, number=100000)}')
set_setup = 'my_set = set(range(100))'
set_stmt = 'for n in my_set: n**2'
print(f'set : {timeit.timeit(setup=set_setup, stmt=set_stmt, number=100000)}')
dict_setup = 'my_dict = {n: None for n in range(100)}'
dict_stmt = 'for n in my_dict.keys(): n**2'
print(f'dict : {timeit.timeit(setup=dict_setup, stmt=dict_stmt, number=100000)}')
produces:
list : 5.110044667
set : 5.137189292
dict : 5.179844875000001
But if the values of the dict are objects instead of None
, iterating over the dict keys is significantly faster:
dict_obj_setup = '''
class MyClass(object):
pass
my_dict_obj = {n: MyClass() for n in range(100)}
'''
dict_obj_stmt = 'for n in my_dict_obj.keys(): n**2'
print(f'dict_obj: {timeit.timeit(setup=dict_obj_setup, stmt=dict_obj_stmt, number=100000)}')
dict_obj: 1.7423309159999985
Why is this happening? And if I have a group of integers that I need to iterate over many, many times, should I create a dict whose keys are my integers and whose values are any random object?!
By the way, the difference goes away if I use n*n
instead of n**2
in all of the loops:
list : 0.32518725
set : 0.344315166
dict : 0.3399045409999999
dict_obj: 0.3662140410000001