Python is a dynamic language. This means that types are dynamic at runtime and Python makes use of the concept of Ducktyping.
What this means is that for any object x
was can do
x.some_function()
x.some_property = y
without knowing until runtime whether or not x
has the attribute some_property
or the function some_function
.
Python has class inheritance. It also has the concept of “abstract base classes” provided by the abc
module.
This module provides a decorator @abstractmethod
. The decorator prevents subclasses from being instantiated at runtime without those subclasses first having provided implementations for any methods marked as abstract using @abstractmethod
.
This is just a piece protection logic which is triggered at runtime when a class is instantiated. It behaves approximatly as if the runtime check for required methods occurs in the __init__
function.
This question is about both of these concepts, but particularly in the context of type hints. Type hints provide ways for static analyzers to provide aids to the developed in IDEs. An obvious example being the Python language server and Python extension for VS Code.
Type hints have other purposes and uses too, but in this context the interaction with static analysis tools is what I am interested in.
Consider an example inhertiance hierachy.
class AbstractMessage():
pass
class GetterMessage(AbstractMessage):
pass
class SetterMessage(AbstractMessage):
pass
In this example I have defined two message types which inherit from an abstract base class. However, there is no reason for this ABC to exist, and code which uses the GetterMessage
and SetterMessage
would work no differently without it.
Here are two possible ways to implement a function which can return either type.
def example_function() -> GetterMessage|SetterMessage:
pass
def example_function() -> AbstractMessage:
pass
def example_code_block():
return_value = example_function()
return_value. # <- static analyzer kicks in here
In this example, I have shown two possible ways to specify the return type:
AbstractMessage
GetterMessage | SetterMessage
What, if any, are the differences between either choice? Is there a reason to prefer one over the other? Is there a reason why one must be chosen over the other?
It would seem to me that either would work just as well. However, in some cases, there may be multiple return types which must be specified using the or (|
) syntax, because the types which are returned are not related, and so cannot be combined into an inheritance hierachy.
Some simple tests
The following simple tests show the suggestions for attributes of my_object
or my_object_2
made by the VS Code/Pylance static analyzer.
(This test is specific to VS Code with Pylance, with whichever versions of those software are currently installed on the machine I ran these tests on, and does not necessarily give a general answer.)
What is potentially interesting is that the or syntax Type1|Type2
gives the full set of possible options, where as the inheritance hierachy ABC return type does not.
class Type1():
def common_function():
pass
def function_1():
pass
class Type2():
def common_function():
pass
def function_2():
pass
class AbstractBaseType():
def abstract_base_type_function():
pass
class ConcreteType1(AbstractBaseType):
def common_function():
pass
def function_1():
pass
class ConcreteType2(AbstractBaseType):
def common_function():
pass
def function_2():
pass
def example_function_1(input: str) -> Type1 | Type2:
if input == 'Type1':
return Type1()
else:
return Type2()
def example_function_2(input: str) -> AbstractBaseType:
if input == 'Type1':
return ConcreteType1()
else:
return ConcreteType2()
def test_function():
my_object_1 = example_function_1()
my_object_2 = example_function_2()
my_object_1. # <- suggestions include `common_function()`, `function_1()`, `function_2()`
my_object_2. # <- suggestions include `abstract_base_type_function()`
AbstractMessage
GetterMessage | SetterMessage
What, if any, are the differences between either choice? Is there a reason to prefer one over the other? Is there a reason why one must be chosen over the other?
One obvious difference when using the abstract base class is that it gives your code room to anticipate future subclasses of AbstractMessage
that do not yet exist (consider further that such subclasses may, in some cases, be user-defined and not necessarily within your code). By comparison, the second approach can make user subclassing (or making a new message type altogether) harder to do while taking advantage of the type hints you’ve written.
Consider further that this has different potential implications for parameter type hints as opposed to return value type hints. Suppose the following options for annotating a function argument:
def example_1(arg: AbstractMessage) -> Any:
# ... assume this function only makes use of common method(s)
arg.common_method()
def example_2(arg: GetterMessage | SetterMessage) -> Any:
...
Further suppose a user wants to define their own message (where we assume it is not necessary/prudent to be a subtype of either GetterMessage or SetterMessage), but leverage those existing functions you wrote:
class MyMessage(AbstractMessage):
...
msg = MyMessage()
example_1(msg) # passes type checks
example_2(msg) # fails type checks, with limited ability to workaround
A possible third option that preserves the ability of user-defined types without requiring a common base class would be to use a Protocol instead. The same preceding ideas still apply here. Though, this has the same effect you observe in the VSCode LSP as hinting with the base class.
Of course, if you consider your system a ‘closed subject’ in the sense that users will never need to derive their own substitute types (or you want to intentionally preclude/discourage users from doing this), then you can probably safely use the union type approach without any substantial consequences.
Regarding autocomplete options, I’m not sure what to say about the LSP behavior other than the fact that an LSP doesn’t have to be implemented this way. An LSP could be made to autocomplete methods that belong to known subclasses of the type you’re working with, but that probably is counter-productive in a lot of cases. So, as you observe, this is a potential advantage about being more specific about the types that can be returned, at least in cases where you can be reasonably sure that specificity makes sense.
Something worth noting here, however, is that even in the case where you are using a union of the concrete types as the return type annotation, you would still have to narrow the union type down to pass type checking when using any of the methods that are exclusive to either type:
my_object_1.function_1()
my_object_1.function_2()
t.py:56: error: Item "Type2" of "Type1 | Type2" has no attribute "function_1" [union-attr]
t.py:57: error: Item "Type1" of "Type1 | Type2" has no attribute "function_2" [union-attr]
And when you perform this type narrowing to fix this typing error, the LSP also catches on, even when you annotate the base class as the return type:
my_object_2 = example_function_2()
if isinstance(my_object_2, ConcreteType2):
# my_object_2. <--- now provides autocomplete for ConcreteType2 methods
my_object_2.function_2('test') # passes mypy type checks