I was reading through SymPy 1.13.0 release notes when an entry caught my attention (emphasis mine):
The hash function for
Floats
and expressions involving Float now respects the hash invariant that ifa == b
thenhash(a) == hash(b)
. This ensures that it is possible to use such expressions in a guaranteed deterministic way in Python’s coreset
anddict
data structures. It is however not recommended to mix up sympyBasic
objects with non-sympy number types such as core Python’sfloat
orint
in the sameset/dict
.
I mixed up Python’s numbers with SymPy’s numbers in a set, and I can’t explain what’s going on (sympy 1.13.0).
I thought elements of a set all have different hashes. For example, in the following code block there are 4 strings, a, b, c1, c2
. They are distinct objects in memory. However, c1
is equal to c2
, so they have the same hash. Hence, set([a, b, c1, c2])
only contains three elements, as expected:
a, b = "a", "b"
c1 = "This is a test"
c2 = "This is a test"
t = [a, b, c1, c2]
print("ids:", ", ".join([str(id(e)) for e in t]))
print("hashes:", ", ".join([str(hash(e)) for e in t]))
s = set(t)
print(s)
# hashes: -8635426522778860779, 3774980000107278733, 3487163462126586929, 3487163462126586929
# ids: 11791568, 11792624, 135187170594160, 135187170598448
# {'This is a test', 'a', 'b'}
In the following exampe there are three different objects in memory, n1, n2, n3
. They all generates the same hash. I expected the resulting set to only contain one element. Instead, it contains two elements, apparently sharing the same hash.
from sympy import *
n1 = 2.0
n2 = Float(2.0)
n3 = Float(2.0, precision=80)
t = [n1, n2, n3]
print("ids:", ", ".join([str(id(e)) for e in t]))
print("hashes:", ", ".join([str(hash(e)) for e in t]))
s = set(t)
print(s)
# ids: 135913432385776, 135912654307472, 135912654307632
# hashes: 2, 2, 2
# {2.0, 2.0000000000000000000000}
What is going on?