GKE Regional Disk for Pvc and StorageClass failover

2/4/2021

I have one pod which requires a persistent disk. I have 1 pod running on us-central1-a and if that zone goes down I want to migrate to another zone without data loss to another zone (us-central1-*).

Is it possible to migrate a pod to another zone(where i know the disks exists) and use the regional disk for the pod in the new zone?

Approach 1

Using the below StorageClass my pod is always unable to claim any of these and my pod never starts. I had the understanding this regional disk with all zones configured would make the disk available to all zones in case of zone failure. I do not understand why I cannot claim any of these.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: regionalpd-storageclass
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values:
    - us-central1-a
    - us-central1-b
    - us-central1-c
    - us-central1-f

Error: My PVC status is always pending

  Normal   NotTriggerScaleUp  106s                cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
      Warning  FailedScheduling   62s (x2 over 108s)  default-scheduler   0/8 nodes are available: 8 node(s) didn't find available persistent volumes to bind.

Attempt 2

This storage config will allow me to run my pod in 2/4 zones with 1 zone being the initial zone and 1 being random. When I intentionally reduce and move out of my initial pods zone I will get the below error unless i'm lucky enough to have chosen the other randomly provisioned zone. Is this functionality intentional because Google assumes a very low chance of 2 zone failures? If one does fail wouldn't i have to provision another disk in another zone just in case?

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: regionalpd-storageclass
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer

Errors:

Normal   NotTriggerScaleUp  4m49s                  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Warning  FailedScheduling   103s (x13 over 4m51s)  default-scheduler   0/4 nodes are available: 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
  Warning  FailedScheduling   43s (x2 over 43s)      default-scheduler   0/3 nodes are available: 1 node(s) were unschedulable, 2 node(s) had volume node affinity conflict.
  Warning  FailedScheduling   18s (x3 over 41s)      default-scheduler   0/2 nodes are available: 2 node(s) had volume node affinity conflict.

My pvc

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-pvc
  namespace: mynamespace
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
  storageClassName: regionalpd-storageclass

My Pod volume

volumes:

  - name: console-persistent-volume
    persistentVolumeClaim:
      claimName: my-pvc
-- rubio
gce-persistent-disk
google-cloud-platform
google-kubernetes-engine
kubernetes
persistent-volumes

1 Answer

2/4/2021

A regional Persistent Disk on Google Cloud is only available in two zones, so you must change your StorageClass to only two zones.

See example StorageClass on Using Kubernetes Engine to Deploy Apps with Regional Persistent Disks and more details on GKE: Provisioning regional persistent disks

-- Jonas
Source: StackOverflow