Suppose I have a class that looks vaguely like this:
class Foo(object):
_value = None # XXX: see below
def __init__(self, value):
self._value = value
@property
def value(self):
"""The value of this Foo."""
return self._value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
return type(self) is type(other) and self.value == other.value
(I know the _value = None
line is wrong; in my actual code, _value
is a data descriptor with a default value. It never raises AttributeError
even if it hasn’t been set yet, and I’m just trying to emulate that behavior in as few lines of code as possible. I could rip the behavior out, but it would significantly complicate things in other areas of my codebase.)
Have I violated the rules? Should I be setting _value
in Foo.__new__()
instead? Clearly it needs to be set somewhere; there will be a point in time at which the hash()
of our instance changes. When should that happen, exactly?
If there is not anything wrong with the above code, then why is the following code wrong? Logically, this combination is illegal, but I’m not sure which class has broken the rules:
bar = set()
class Baz(object):
def __init__(self, *args, **kwargs):
bar.add(self)
super(Baz, self).__init__(*args, **kwargs) # play nice with MI
class Qux(Baz, Foo):
pass
assert Qux('quuux') in bar # assertion fires because the hash() changed
1
I would say Baz
has broken the rules. By using self
as hashable before even invoking other __init__
methods in the MRO, it assumes that those methods do nothing to influence the hash. That is a wrong assumption. Although technically it is safer to construct immutable classes so that __new__
sets up all state and makes it completely, utterly, immutable (like tuple
and namedtuple
), this is often very clunky and unnecessary.
Setting some private variables that are read-only for the outside world and never mutated in any methods (other than __init__
where they are first set) is a perfectly valid and common way to implement immutable classes.
This ties into a more general point: Doing anything at all with an object while it’s still being initialized is very risky and need to be thought out very well. Many invariants may be violated before __init__
has finished, so most code can’t meaningfully interact with a partially-initialized object. Luckily, a sane __new__
implementation takes care that the new object is first passed to __init__
(and, of course, any code that __init__
invokes itself — which, again, needs to be carefully chosen).
The instance should be immutable at the moment it is used as a key in a dict. Before that, it can mutate. But preferably it should be incapable of mutation after the moment of construction, just to avoid programming errors.
The builder pattern fits this well: the final .build()
step returns an immutable object. You can additionally prevent inadvertent attribute modification by setting instance’s __setattr__
to something raising an exception.
Unfortunately, you did not provide a minimal runnable example of the problem in the second code fragment. Next time please do. I only can suspect that somewhere in the real problematic code you conflate self
of the original object with something else. (The idea of updating a global hash table from a constructor does not strike me as a particularly elegant one, though memoizing decorators usually do this reliably.)
2