I have a hierarchy of frozen dataclasses, and I need to implement hashing for these dataclasses such that the hashes of every unique instance across the whole hierarchy are unique. I am defining “unique” here to mean that either the fields or the type of two instances differ. However, since the default dataclass __hash__
is a function only of the dataclass fields and not of the type, instances of different dataclass types which share the same fields hash to the same value by default.
Simplified Example
Below is a 3-level nested dataclass.
from dataclasses import dataclass
import abc
@dataclass(frozen=True)
class Element(abc.ABC):
pass
@dataclass(frozen=True)
class StepType(Element, abc.ABC):
@classmethod
def name(cls):
return cls.__name__
class Skip(StepType): pass
class Hop(StepType): pass
@dataclass(frozen=True)
class Stepper(Element, abc.ABC):
step_type: StepType = Skip()
foo: int = 1
@abc.abstractmethod
def step(self):
pass
@dataclass(frozen=True)
class Single(Stepper):
def step(self):
return self.step_type.name() + " once"
@dataclass(frozen=True)
class Double(Stepper):
def step(self):
return self.step_type.name() + " twice"
@dataclass(frozen=True)
class Speed(Element, abc.ABC):
@abc.abstractmethod
def how_fast(self):
pass
@dataclass(frozen=True)
class Slow(Speed):
def how_fast(self):
return "real slow"
@dataclass(frozen=True)
class Fast(Speed):
def how_fast(self):
return "quickly"
@dataclass(frozen=True)
class Walker:
speed: Speed = Slow()
stepper: Stepper = Single()
def walk(self):
return " ".join([self.stepper.step(), self.speed.how_fast()])
Inheritance Structure
Here’s summary of the inheritance structure of Element
.
Element
|- StepType
| |- Skip
| |- Hop
|
|- Stepper
| |- Single
| |- Double
|
|- Speed
|- Slow
|- Fast
Nesting Structure
Here’s summary of the nested fields structure of Walker
.
Walker
|- stepper: Stepper
| |- step_type: StepType
|
|- speed: Speed
Tested Behavior
Different instances of Walker
and its Element
instances hash to the same value.
a = Walker(speed=Slow(), stepper=Single())
b = Walker(speed=Fast(), stepper=Double(step_type=Hop()))
print(f"a: {hash(a)}nb: {hash(b)}")
>> a: -5704360693866892300
b: -5704360693866892300
Desired Behavior
I would like the hashes of unique Walker
instances to be reliably different and maintain functionality for various dataclass features.
Constraints
- This need not be done via overriding
__hash__
, it could be via creating new methods inWalker
and/orElement
. But it must be recursive sinceElement
instances can contain otherElement
s to arbitrary depths. Ideally, I’d like to leverage the reliable default dataclass__hash__
rather than overriding it completely. - The hashes will be saved to and read from disk, so they must be consistent across runs. E.g., they can’t depend on
id(self)
likeobject.__hash__
does. - Everything except
Walker
must inherit fromElement
. I cannot refactor any of the dataclasses into regular classes, Enums, etc. I must use the dataclass framework.
Ideas
I think it would work well to add repr(type(self))
to the hash function as well as the fields.
Idea 1
My first idea was something like:
class Element:
def hash(self):
return hash(self) ^ hash(repr(type(self)))
But this doesn’t recurse down to nested Element
s.
Idea 2
This is the best idea I have at the time of posting.
class Element:
def __hash__(self):
return hash(
hash(repr(type(self))) ^
(hash(
hash((key, val)) for key, val in self.__dict__.items()
)**3
)
)
This does recurse, and it passes my unit tests, but it overrides the built-in dataclass __hash__
. I’m worried this could create some broken corner cases. Any input on a more robust method, pointing out where this would break, or a simple thumbs up is appreciated.