I would like to add type hints to a simple function. Since it internally only uses numpy calls, it is very flexible with its inputs. Basically, it accepts all array-like objects, for which there is the numpy.typing.ArrayLike
type.
Defining the return type is not as straight forward however. For some input types like lists, the numpy functions casts the return to a numpy array, which means I could use -> np.NDArray
.
Some other input types like pandas.DataFrame
s, which I use heavily in my code. This would usually be a good reason to use a TypeVar bound to the input type.
How do I keep the flexibility of numpy while also providing meaningful type hints, for example for mypy
?
Note: The example method is used to calculate sound pressure levels.
Both code snippets are perfectly viable code, the methods only differ in their type hints.
They cause different errors with static type checkers, demonstrating the limits of each approach.
Version 1:
def get_decibels1(p2: npt.ArrayLike) -> npt.NDArray:
return (10 * np.log10(np.divide(p2, 4e-10)))
df = pd.DataFrame([[4, 5, 6], [7, 8, 9]])
get_decibels1(df).columns
# --- Causes mypy Error:
# error: "ndarray[Any, dtype[Any]]" has no attribute "columns" [attr-defined]
Version 2:
T = TypeVar('T', bound=npt.ArrayLike)
def get_decibels2(p2: T) -> T:
return (10 * np.log10(np.divide(p2, REF_P2)))
ls = [4.0, 5, 6]
get_decibels2(ls).shape
# --- Causes mypy error:
# error: "list[float]" has no attribute "shape" [attr-defined]
How do I sensibly combine the two approaches?
Update:
I figured that I maybe can approach this with @overload
. But this does not seem to work either, as the signatures overlap.
T = TypeVar('T', bound=Union[pd.DataFrame, pd.Series])
@overload
def get_decibels(p2: T) -> T: ...
@overload
def get_decibels(p2: npt.ArrayLike) -> npt.NDArray: ...
def get_decibels(p2: npt.ArrayLike):
return (10 * np.log10(np.divide(p2, 4e-10)))
# --- Causes mypy error:
# error: Overloaded function signatures 1 and 2 overlap with incompatible return types [overload-overlap]
I was under the impression that mypy just chooses the first signature that matches, which would have solved it. Any ideas on how to resolve this?
ti-sch is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Personally, I’d rather have the function return a single type (NDArray
) in all cases by applying numpy.array
or numpy.asarray
. Moreover, I believe np.divide(a, b)
and a/b
are equivalent and finally, why overwrite a pandas.DataFrame
when you can just extend the table? (if you have memory issues, then I understand and you might want to check out the polars library). Also, do you not need clip_db
?
Although it is of personal taste, here goes my suggestion with a function that always returns a numpy array:
import numpy as np
from numpy import typing as npt
import pandas as pd
def get_decibels(p2: npt.ArrayLike) -> npt.NDArray:
p2 = np.asarray(p2)
return 10 * np.log10(p2/4e-10)
def get_decibels_clip(p2: npt.ArrayLike, clip_db: float=0) -> npt.NDArray:
p2 = np.asarray(p2)
return np.maximum(10 * np.log10(p2/4e-10), clip_db) # is this what was asked?
df = pd.DataFrame([[4, 5, 6], [7, 8, 9]])
df[["db1", "db2", "db3"]] = get_decibels(df)
print(df)
2