Suppose I have an array like this:
x = np.array([["Hello", 0, 1], ["World", 1, 2]], dtype=np.object_)
I’d like to do better in terms of how I define the dtype
for this array. More specifically, I want the array to allocate for each element as many bytes as would be required for the largest element.
This is what I think of as a “union” of types.
The answer to the question c-style union with numpy dtypes? refers to “union dtypes” being allowed in numpy
after 1.7+.
This is the relevant section in the numpy
documentation. “Union types” documentation refers to (base_dtype, new_dtype)
‘s documentation in the section on data types titled Specifying and constructing data types.
(base_dtype, new_dtype)
‘s documentation says something interesting (emphasis mine):
This form also makes it possible to specify struct dtypes with overlapping fields, functioning like the ‘union’ type in C. This usage is discouraged, however, and the union mechanism is preferred.
What precisely is the “union mechanism”? Searching for it in the numpy
docs doesn’t lead me to anything that stands out clearly as the mechanism being referred to. More importantly, I cannot do something like this in (base_dtype, new_dtype)
format: (np.object_, (np.uint32, np.uint32))
, because np.object_
cannot be used as a base_dtype
.
My best guess at the moment: documentation for numpy
states that we can allow the structured array fields to overlap…which would give union like behaviour? After all, the structured arrays docs state (emphasis mine):
For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure.
The docs have sufficiently confused me around what the preferred mechanism is for making a “union type”…