McRouter loses key on scale up

7/2/2018

I'm running a k8s cluster in GKE and used their walkthrough of putting together a McRouter setup with memcached. Initially we were using consul keystores but our cache is too large and causes consul to use too much memory, so we decided to test out memcache in it's place. I spin up the mcrouter daemonset and have a single pod of memcache and everything works just fine. This is when I added some test keys. They get and delete ok. The issue comes when I leave keys in place and scale.

I scale up the memcache statefulset and add the second server name to the configmap for mcrouter. Once I see the new server using stats servers I then run a get and one of the keys is no longer there. I've telneted to 11211 on the original memcache pod and run a get and can retrieve the same key just fine. The config provided in the configmap is below:

  {
    "pools": {
      "A": {
        "servers": [
          "memcached-0.memcached.default.svc.cluster.local:11211",
          "memcached-1.memcached.default.svc.cluster.local:11211"
        ]
      }
    },
    "route": "PoolRoute|A"
  }

I've also moved to using a statefulset for mcrouter to limit to only one pod, and also switched to using the official docker image rather than the one in the k8s example Helm Chart, and had no luck. No matter what I do, I keep getting a "not found" via the mcrouter get on at least one key after scaling while other keys are still found fine. Help?

-- Alex Liffick
google-kubernetes-engine
high-availability
memcached

1 Answer

7/13/2018

With PoolRoute, routes to the destination are based on key hash. So, when you scale up the memcache stateful set with a new memcache, the key hash of your key change. That's why Mcrouter is not able to get your key even if it's already present in your cache.

Let's have a look at the wiki: https://github.com/facebook/mcrouter/wiki/List-of-Route-Handles

And maybe this kind of configuration may help you:

{
  "pools": {
    "A": {
      "servers": [
          "memcached-0.memcached.default.svc.cluster.local:11211",
          "memcached-1.memcached.default.svc.cluster.local:11211"
      ]
    },
    "fallback": {
      "servers": [
        "memcached-fallback.memcached.default.svc.cluster.local:11211"
      ]
    }
  },
  "route": {
    "type": "FailoverRoute",
    "normal": {
      "type": "PoolRoute",
      "pool": "A"
    },
    "failover": "PoolRoute|fallback"
  }
}
-- Luc Charpentier
Source: StackOverflow