The Question:
Here is a simple function that works with numpy but not numba:
# @numba.jit(nopython=True, fastmath=False, parallel=False)
def testgetvalue(tgvarray, tgvindex):
tgvalue = tgvarray[tuple(tgvindex)]
return tgvalue
How do I make a version of this function that works in numba?
I tried:
@numba.jit(nopython=True, fastmath=False, parallel=False)
def testgetvalue2(tgvarray, tgvindex):
tgvalue = tgvarray[tuple(tgvindex)]
currentdex = tgvindex[0]
tgvtemp = tgvarray[currentdex]
for idx in range(1, len(tgvindex)):
currentdex = tgvindex[idx]
tgvtemp = tgvtemp[currentdex]
return tgvalue
but this also fails in numba
I found this question in which an answer says that it is possible to:
More generally, you cannot generate a N-ary tuple where N is variable in a Numba function. However, you can instead to generate and compile a function for a specific N. If N is very small (e.g. <15)
This would solve my problem, but the answer does not explain how to generate and then compile a function for a specific N… unless the suggestion is that I write a script to generate a .py file that defines a function with the jit decorator that I can then call from the .py file. I guess could possibly work given how infrequent the changes in dimensions are. I’m not sure if it’s within best practices to go with this, but I will start working on a script to generate .py files until someone answers otherwise.
Please continue reading before suggesting that I am duplicating something like this question because my real problem does not necessarily involve tuples i.e. the version of the question in parenthesis.
Why I am asking the question:
I have code where the number of dimensions of an array object can occasionally vary with the number of dimensions ranging between 1 and 15 dimensions. But once a change occurs there will be tens of thousands of repeated operations on that multidimensional array. For many of those operations I would like to be able to use an array of indices to modify the value at a location in the multidimensional array.
This leads into the alternative question in parenthesis:
In an earlier version of my code I transformed the multidimensional indices into one dimensional indices by taking the array of sizes in each dimension and doing:
multipliers = np.cumprod(array_of_sizes_in_each_dimension)
multipliers = np.roll(multipliers, 1)
multipliers[0] = 1
I can then multiply each value in a multidimensional index by the corresponding value in multipliers
to get my one dimensional index. This works fine when I’m going from the multidimensional index to the one dimensional index. However, I am unable to think of a fast function for cases in which I need to go from a one dimensional index to the multidimensional index. Currently, my fastest version is to construct a lookup table, i.e. a two_dimensional_array
of size np.prod(array_of_sizes_in_each_dimension)
X len(array_of_sizes_in_each_dimension)
that has lists all of the multidimensional indices such that two_dimensional_array[one_dimensional_index]
returns the corresponding multidimensional index. This works fine when my multidimensional array happens to only have a few dimensions and is short in each of those dimensions. However, as the number of dimensions grows, I end up with two_dimensional_array
being so large that I have a memory bottleneck and the code slows down by a couple orders of magnitude (like 8 minutes for a 3 dimensional array and 8 days for an 11 dimensional array). So if anyone has an idea for a fast function to replace the two_dimensional_array
lookup table, that would also solve my problem.
Nathan Gabriel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
It is actually possible to create a tuple inside numba if its size is known at compile time using numba.np.unsafe.ndarray
. This function is labelled as unsafe and I could not find any trace of it in the documentation, so I do not know how reliable it is in the long term, but I found it to work very well for numba 0.58
to 0.60
. You can for instance use it as a shape to initialize a ndim array.
import numpy as np
import numba
@numba.njit
def numba_make_tuple(old_tuple):
CONST = len(old_tuple)
a = np.empty((CONST,), dtype=np.int64)
for i in range(CONST):
a[i] = i
new_tuple = numba.np.unsafe.ndarray.to_fixed_tuple(a, CONST)
return new_tuple
t1 = (1,)
nt1 = numba_make_tuple(t1)
print(nt1, type(t1))
t2 = (2,3,1)
nt2 = numba_make_tuple(t2)
print(nt2, type(t2))
Note that I use a tuple as input to enforce compile time knowledge of CONST
. I could not find anything in the official documentation, but it is discussed e.g. at https://github.com/numba/numba/issues/8812
1
Try this:
import numba
import numpy as np
@numba.njit
def testgetvalue(tgvarray, tgvindex):
tgvtemp = tgvarray
for idx in range(len(tgvindex)):
tgvtemp = tgvtemp[tgvindex[idx]]
return tgvtemp
1