Unfortunately I have to load a dictionary containing an invalid name (which I can’t change):
dict = {..., "invalid-name": 0, ...}
I would like to cast this dictionary into a dataclass
object, but I can’t define an attribute with this name.
from dataclasses import dataclass
@dataclass
class Dict:
...
invalid-name: int # can't do this
...
The only solution I could find is to change the dictionary key into a valid one right before casting it into a dataclass
object:
dict["valid_name"] = dict.pop("invalid-name")
But I would like to avoid using string literals…
Is there any better solution to this?
2
One solution would be using dict-to-dataclass. As mentioned in its documents it has two options:
1.passing dictionary keys
It’s probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don’t, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
@dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
- Custom converters
If you need to convert a dictionary value that isn’t covered by the defaults, you can pass in a converter function using field_from_dict’s converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
@dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
1
The following code allow to filter the nonexistent keys :
import dataclasses
@dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
However, I’m sure there should be a better way to do it since this is a bit hacky.
2
I would define a from_dict
class method anyway, which would be a natural place to make the change.
@dataclass
class MyDict:
...
valid_name: int
...
@classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
Whether you should modify d
in place or do something to avoid unnecessary copies is another matter.
Another option could be to use the dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings, as needed in this case.
I’ve also timed it with the builtin timeit
module, and found it to be (on average) about 5x faster than a solution with dict_to_dataclass
. I’ve added the code I used for comparison below.
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
@dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
@dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
Results, on my Mac OS X laptop:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098