In opposition to standard classes in Python dataclasses fields must have a type annotation. But I don’t understand what the purpose of these type annotations really is. One can create a dataclass like this
from dataclasses import dataclass, fields
@dataclass
class Broken:
field1: str = "default_string1",
field2: "" = "default_string2",
field3 = "default_string3"
And this class will be accepted by the python interpreter (and your IDE/static code checker might also not see anything wrong with it). However if you use the class
b = Broken()
print("Wannabe type for field1:", fields(b)[0].type)
print("Real type for field1:", type(b.field1))
print("Wannabe type for field2:", fields(b)[1].type)
print("Real type for field2:", type(b.field2))
try:
print("Wannabe type for field3:", fields(b)[2].type)
except:
print("field3 is not part of fields(b)")
print("Real type for field3:", type(b.field3))
you will get some surprising output
Wannabe type for field1: <class 'str'>
Real type for field1: <class 'tuple'>
Did you notice the trailing comma after the default value of field1? So the type annotation is not used to check, that the default value has the correct type, so the real type is tuple
instead of the type str
used in the annotation.
Wannabe type for field2:
Real type for field2: <class 'tuple'>
You can even use an empty string as type annotation and Python won’t raise an eyebrow.
field3 is not part of fields(b)
Real type for field3: <class 'str'>
If you leave away the type annotation completely, then the field will not be shown when calling the fields()
function. However it can still be accessed via the object.
So what is the purpose of the type annotation for dataclasses? Are they really just used to check, if a field should be listed with the fields()
function? Or did the Python maintainers anticipate future functionality for types in dataclasses, that wasn’t implemented, yet? Why do I need to add type annotations to dataclasses in Python?
4
There is nothing too mysterious going on here. The documentation clearly says
The member variables […] are defined using PEP 526 type annotations.
[…] Afield
is defined as a class variable that has a type annotation.
To review your example point by point:
-
field1
is defined to be of typestr
, but has a default value of typetuple
. The@dataclasses
-decorator (or the underlying module as a whole) simply make no assumptions and give no guarantees about the type of the default value (("default_string1", )
) not matching what the type of the field should be, as the type-annotation is almost completely ignored:With two exceptions described below, nothing in @dataclass examines the type specified in the variable annotation.
Those exceptions deal with class-variables and init-only variables; other than a warning by a third-party type-checker, it’s up to Python’s duck-typing mechanism.
-
The type of
field2: ""
is<class 'str'>
, unlikefield2: str
, which would correctly result infield2
‘s type being of<class 'type'>
. As withfield1
, while this makes no sense to any type-checker, thedataclasses
-module does not require the type-annotation to make sense; it’s as simple as that. -
field3
has no type-annotation, so it is ignored by thedataclass
-mechanism. But this does not mean the field is ignored by the class itself.field3
is simply a normal class-variable that can be accessed on any instance ofBroken
or on the class itself.
What all fields set with PEP 526 type annotations give you (unless ignored by @dataclass
if explicitly told so) is methods. In your example above, only field3
is ignored by @dataclass
. Accordingly, this field is not part of the provided __init__()
(you can’t instantiate a Broken(field3="foo")
), it’s not part of the provided __repr__()
, __hash__()
or any other method.
In other words, the type-annotation is simply a marriage of convenience: The @dataclass
-decorator only cares about type-annotated class variables, and uses type-annotations for it’s own good while doing so (see the KW_ONLY
pseudo-type). Beyond that, normal Python duck-typing rules apply: You are free to lie about your duck as long as it quacks properly.
1
So, one key thing to understand. A “data class” isn’t a different kind of class. The dataclasses.dataclass
decorator is a code generator to generate various boilerplate methods that produces a type definition that is the same as you could make manually without the decorator. The whole point is to avoid boilerplate, and the annotation syntax is chosen because it is expressive and succinct. That’s it.
If you don’t want to use annotations, you can just write a normal class definition statement with no annotations and it will function exactly the same as a class that was generated with the dataclasses.dataclass
code generator, of course, you would be writing a bunch of boilerplate.