I’ve got a use case for what I’ll call an “invertible dictionary” — because I don’t know a better term for it. I’ve got a set of data where the values for some properties are repeated all over the place… think about a database where there are a million records but they share maybe a few dozen unique values for cities or states or countries or whatever.
The optimization assumes that you have a much smaller number of unique values, and that you’re doing lookups on the key string far more often than you’re doing inserts or modifications.
What I came up with was what I call a string dictionary, which I hacked up below. It basically takes a dict, then builds a list of the unique entries so we can use the index of that list to very quickly get the string. The indirection allows me to store just the index for each record instead of the whole string.
Anyway… this works fine, but it seems like python always has some builtin or well-maintained version of everything I write like this, so I figured I’d ask if such a thing already exists…
class StrMap(dict):
""" provides a fast mapping of strings to integer indices in a list.
all keys of type int have values of type str
all keys of type str have values of type int
examples: d['ace'] : 0, d['bob'] : 1, d[0] = 'ace', d[1] = 'bob' and so on
"""
def __init__(self):
dict.__init__(self)
self.L = []
def add(self, newval : str) -> int:
assert isinstance(newval, str)
if newval not in self:
newidx = len(self.L)
self.L.append(newval)
self[newval] = newidx
return newidx
def retr(self, multikey, addOnMiss=True):
if isinstance(multikey, int):
return self.L[multikey]
elif isinstance(multikey, str):
try:
return self[multikey]
except KeyError:
if addOnMiss:
return self.add(multikey)
else:
print(f"addOnMiss is not set and string key {multikey} is not in dict")
raise KeyError
7
The bidict
library provides bidirectional mapping, you can efficiently look up keys by values and vice versa.
from bidict import bidict
my_dict = bidict({'ace': 0, 'bob': 1})
print(my_dict['ace']) # 0
print(my_dict.inverse[1]) # 'bob'
You can read more about it in the bidict.readthedocs.io