I am curious whether I can access the value of a huge dictionary faster from a huge list of arrays.
Here is a simple example:
import numpy as np
my_list = [np.array([ 1, 2, 3, 4, 5, 6, 8, 9, 10]), np.array([ 1, 3, 5, 6, 7, 10]), np.array([ 1, 2, 3, 4, 6, 8, 9, 10]), np.array([ 1, 3, 4, 7, 15]), np.array([ 1, 2, 4, 5, 10, 16]), np.array([6, 10, 15])]
my_dict = {1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 2, 11: 2, 12: 2, 13: 2, 14: 2, 15: 3, 16: 3}
Each key in my_dict correspond to the values in a list named my_list
I used the following code to get the desired output list:
numpy_arr = np.array([my_dict[i] for i in range(1, max(my_dict) + 1)])
output = {frozenset(np.unique(numpy_arr[l - 1])) for l in my_list}
res = [" ".join(map(str, s)) for s in output]
For instance, I have a bigger dimension of my_list and my_dict that can be download here: https://gitlab.com/Schrodinger168/practice/-/tree/master/practice_dictionary
Here are my codes:
import ast
from timeit import timeit
file_list = "list_array.txt"
file_dictionary = "dictionary_example.txt"
with open(file_dictionary, "r") as file_dict:
my_dict = ast.literal_eval(file_dict.read())
my_list = []
with open(file_list, "r") as file:
for line in file:
my_list.append(np.array(list(map(int, line.split()))))
def benchmark(my_list, my_dict):
numpy_arr = np.array([my_dict[i] for i in range(1, max(my_dict) + 1)])
output = {frozenset(np.unique(numpy_arr[l - 1])) for l in my_list}
return [" ".join(map(str, s)) for s in output]
t1 = timeit("benchmark(my_list, my_dict)", number=1, globals=globals())
print(t1)
Output time:
2.538002511000059
This approach is quietly good already but in my practical work, I have at least 5 to 10 times bigger than my_list and my_dict in this case and this approach is less favorable.
Are there any alternatives that can make this work even faster with less than a second I mean for example in milleseconds?