Velero Backups: Restoring Statefulset (e.g couchbase) leads to "Multi-Attach error for volume"

11/10/2019

I'm backing up couchbase on kubernetes using velero. Backups worke fine, even stores of the namespace and PV,PVC. However, the couchbase node(s) fail to come up following restore, due to what appears to be some kind of race condition which results in two PVs being created and attempting to mount to the couchbase pod, with one PV being on the wrong kubernetes node.

Restoring and recovering non-Statefulset applications is successful but with a Statefulset, only backup, restore and not recovery is successful, suing the procedure:

  1. Backup a namespace containing a single statefulset (couchbase) running 1 replica only.
  2. Delete the namespace completely.
  3. Restore from the backup of the namespace

Result:

  • The single pod remains in ContainerCreating
  • Two PVs are created associated with the pod
  • A single PVC is created in the "Lost" state
  • The following error is seen in the event log:
23m         Warning   FailedAttachVolume       pod/couchbase-0                             Multi-Attach error for volume "pvc-1b9860c6-0208-11ea-b826-5a269cbf3473" Volume is already exclusively attached to one node and can't be attached to another
14m         Warning   FailedMount              pod/couchbase-0                             Unable to mount volumes for pod "couchbase-0_lolcorp-uat-az1-test-cbdeploy(f7092210-0231-11ea-b826-5a269cbf3473)": timeout expired waiting for volumes to attach or mount for pod "lolcorp-uat-az1-test-cbdeploy"/"couchbase-0". list of unmounted volumes=[datadir]. list of unattached volumes=[datadir default-token-xdl76]

My question is:

  • What is the correct way to configure statefulset (specifically couchbase) restores to avoid this situation?

Ideally, this should happen:

  • PVC, PV and POD would be restored along with the namespace
  • The single PV would mount successfully
  • Only one claim would be associated with the PV and pod
  • The statefuleset pod would transition to "Running"

Details of the setup:

  • Velero version (use velero version):
0.1.1
  • Velero features (use velero client config get features):
features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"
  • Kubernetes installer & version:

Azure Kubernetes: AKS

  • Cloud provider or hardware configuration:

Azure

  • OS (e.g. from /etc/os-release):

Ubuntu 18.

-- Traiano Welcome
azure-aks
azure-kubernetes
kubernetes
stateful
velero

0 Answers