I have been reading up on distributed hashing. I learnt that consistent hashing is used for distributing the keys among cache machines. I also learnt that, a key is duplicated on mutiple caches to handle failure of cache hosts. But what I have come across on memcached doesn’t seem to be in alignment with all this. I read that all cache nodes are independent of each other and that if a cache goes down, requests go to DB. Theres no mention of cache miss on a host resulting in the host directing the request to another host which could either be holding the key or is nearer to the key.
Can you please tell me how these two fit together? Is memcached a very preliminary form of distributed hashing which doesnt have much sophistication?
There are two parts to this story: memcached servers and libmemcached clients.
Memcached servers are indeed independent of each other. Memcached server is just an efficient key-value store implemented as in-memory hash table. What makes memcached distributed is the client, which in most implementations can connect to a pool of servers. Typical implementations use consistent hashing, which means that when you add or remove server to/from a pool of N servers, you only have to remap 1/N keys. Typically keys are not duplicated on various hosts, as memcached is not meant to be persistent store and gives no guarantees that your value will persist (for example when running out of assigned memory, memcached server drops least recently used (LRU) elements). Thus it’s assumed that your application should handle missing keys.
if a cache goes down, requests go to DB
If you happen to use memcached to cache results of DB queries. Which is one of many possible uses. Memcached is just generic key-value store, can be used for other purposes.
The memcached servers does not communicate with each other. It is the responsibility of clients to choose which memcached server to use. This methodology reduces the overhead between servers and it doesn’t need to lookup key from each server. Therefore memcached is very fast not only by using RAM. Although many cached objects would be missed during server rearrangement or failure, the cached object could still be found by chance,
e.g., you have 3 memcached servers and each of them hold key 1
, 2
, 3
. When server with key 3
is down, clients ask for key 3
would go to server with key 1
, and client ask for key 1
would still go to sever with key 1
, assume the key is distributed with odd and even by now.