How does kubernetes provide HA for stateful applications with volumes attached?

4/28/2020

I am unable to configure my stateful application to be resilient to kubernetes worker failure (the one where my application pod exists)

$ kk get pod -owide
NAME                                READY   STATUS    RESTARTS   AGE     IP                NODE               NOMINATED NODE   READINESS GATES
example-openebs-97767f45f-xbwp6     1/1     Running   0          6m21s   192.168.207.233   new-kube-worker1   <none>           <none>

Once I take the worker down, kubernetes notices that the pod is not responding and schedules it to a different worker.

marek649@new-kube-master:~$ kk get pod -owide
NAME                                READY   STATUS              RESTARTS   AGE   IP                NODE               NOMINATED NODE   READINESS GATES
example-openebs-97767f45f-gct5b     0/1     ContainerCreating   0          22s   <none>            new-kube-worker2   <none>           <none>
example-openebs-97767f45f-xbwp6     1/1     Terminating         0          13m   192.168.207.233   new-kube-worker1   <none>           <none>

This is great, but the new container is not able to start since it is trying to attach the same pvc that the old container was using and kubernetes does not release the binding to the old (not responding) node.

$ kk describe pod example-openebs-97767f45f-gct5b
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/example-openebs-97767f45f
Containers:
  example-openebs:
    Container ID:   
    Image:          nginx
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from demo-claim (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4xmvf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  demo-claim:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  example-pvc
    ReadOnly:   false
  default-token-4xmvf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4xmvf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age   From                       Message
  ----     ------              ----  ----                       -------
  Normal   Scheduled           2m9s  default-scheduler          Successfully assigned default/example-openebs-97767f45f-gct5b to new-kube-worker2
  Warning  FailedAttachVolume  2m9s  attachdetach-controller    Multi-Attach error for volume "pvc-911f94a9-b43a-4cac-be94-838b0e7376e8" Volume is already used by pod(s) example-openebs-97767f45f-xbw
p6
  Warning  FailedMount         6s    kubelet, new-kube-worker2  Unable to attach or mount volumes: unmounted volumes=[demo-claim], unattached volumes=[demo-claim default-token-4xmvf]: timed out waiti
ng for the condition

I am able to resolve this situation by manually force deleting the containers, unbounding the PV and recreating containers but this is far from high availability that I am expecting.

I am using openEBS jiva volumes and after manual intervention I am able to restore the container with correct data on the PV which means that data gets replicated to other nodes correctly.

Can someone please explain what am I doing wrong and how to achieve a fault tolerance for k8s applications with volumes attached?

I found this related but I don;t see any suggestions how to overcome this issue https://github.com/openebs/openebs/issues/2536

-- marek
high-availability
kubernetes

3 Answers

4/28/2020

To deploy stateful application kubernetes has Statefulset object with might help you in this case.

StatefulSets are valuable for applications that require one or more of the following.

  • Stable, unique network identifiers.
  • Stable, persistent storage.
  • Ordered, graceful deployment and scaling.
  • Ordered, automated rolling updates.
-- hoque
Source: StackOverflow

4/30/2020

For unmanaged Kubernetes Clusters, this is a hard problem that applies to all types of RWO volumes.

There have been several discussions around this in the Kubernetes community, which are summarized in these issues:

The current thought process is to take the help of the NodeTolerations to come up with a solution and implement the solution via the CSI driver.

At openebs, when we looked at how the cloud providers handle this case, we found that when a node is shutdown, its corresponding node object is deleted from the cluster. There is no harm done with this operation since when the node comes back online, the node object is recreated.

-- Kiran Mova
Source: StackOverflow

4/28/2020

It will eventually release the volume, usually limiting factor is the network storage system being slow to detect the volume is unmounted. But you are correct that it's a limitation. The usual fix would be to use a multi-mount capable volume type instead, such as NFS or CephFS.

-- coderanger
Source: StackOverflow