How do I get Kubernetes Physical Volumes to deploy in proper zone?

4/28/2017

I'm running a kubernetes 1.6.2 cluster across three nodes in different zones in GKE and I'm trying to deploy my statefulset where each pod in the statefulset gets a PV attached to it. The problem is that kubernetes is creating the PVs in the one zone where I don't have a node!

$ kubectl describe node gke-multi-consul-default-pool-747c9378-zls3|grep 'zone=us-central1'
            failure-domain.beta.kubernetes.io/zone=us-central1-a
$ kubectl describe node gke-multi-consul-default-pool-7e987593-qjtt|grep 'zone=us-central1'
            failure-domain.beta.kubernetes.io/zone=us-central1-f
$ kubectl describe node gke-multi-consul-default-pool-8e9199ea-91pj|grep 'zone=us-central1'
            failure-domain.beta.kubernetes.io/zone=us-central1-c

$ kubectl describe pv pvc-3f668058-2c2a-11e7-a7cd-42010a8001e2|grep 'zone=us-central1'
        failure-domain.beta.kubernetes.io/zone=us-central1-b

I'm using the standard storageclass which has no default zone set:

$ kubectl describe storageclass standard
Name:       standard
IsDefaultClass: Yes
Annotations:    storageclass.beta.kubernetes.io/is-default-class=true
Provisioner:    kubernetes.io/gce-pd
Parameters: type=pd-standard
Events:     <none>

So I thought that the scheduler would automatically provision the volumes in a zone where a cluster node existed, but it doesn't seem to be doing that.

For reference, here is the yaml for my statefulset:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: "{{ template "fullname" . }}"
  labels:
    heritage: {{.Release.Service | quote }}
    release: {{.Release.Name | quote }}
    chart: "{{.Chart.Name}}-{{.Chart.Version}}"
    component: "{{.Release.Name}}-{{.Values.Component}}"
spec:
  serviceName: "{{ template "fullname" . }}"
  replicas: {{default 3 .Values.Replicas}}
  template:
    metadata:
      name: "{{ template "fullname" . }}"
      labels:
        heritage: {{.Release.Service | quote }}
        release: {{.Release.Name | quote }}
        chart: "{{.Chart.Name}}-{{.Chart.Version}}"
        component: "{{.Release.Name}}-{{.Values.Component}}"
        app: "consul"
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
    spec:
      securityContext:
        fsGroup: 1000
      containers:
      - name: "{{ template "fullname" . }}"
        image: "{{.Values.Image}}:{{.Values.ImageTag}}"
        imagePullPolicy: "{{.Values.ImagePullPolicy}}"
        ports:
        - name: http
          containerPort: {{.Values.HttpPort}}
        - name: rpc
          containerPort: {{.Values.RpcPort}}
        - name: serflan-tcp
          protocol: "TCP"
          containerPort: {{.Values.SerflanPort}}
        - name: serflan-udp
          protocol: "UDP"
          containerPort: {{.Values.SerflanUdpPort}}
        - name: serfwan-tcp
          protocol: "TCP"
          containerPort: {{.Values.SerfwanPort}}
        - name: serfwan-udp
          protocol: "UDP"
          containerPort: {{.Values.SerfwanUdpPort}}
        - name: server
          containerPort: {{.Values.ServerPort}}
        - name: consuldns
          containerPort: {{.Values.ConsulDnsPort}}
        resources:
          requests:
            cpu: "{{.Values.Cpu}}"
            memory: "{{.Values.Memory}}"
        env:
        - name: INITIAL_CLUSTER_SIZE
          value: {{ default 3 .Values.Replicas | quote }}
        - name: STATEFULSET_NAME
          value: "{{ template "fullname" . }}"
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: STATEFULSET_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: datadir
          mountPath: /var/lib/consul
        - name: gossip-key
          mountPath: /etc/secrets
          readOnly: true
        - name: config
          mountPath: /etc/consul
        - name: tls
          mountPath: /etc/tls
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/sh
                - -c
                - consul leave
        livenessProbe:
          exec:
            command:
            - consul
            - members
          initialDelaySeconds: 300
          timeoutSeconds: 5
        command:
          - "/bin/sh"
          - "-ec"
          - "/tmp/consul-start.sh"
      volumes:
      - name: config
        configMap:
          name: consul
      - name: gossip-key
        secret:
          secretName: {{ template "fullname" . }}-gossip-key
      - name: tls
        secret:
          secretName: consul
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
      {{- if .Values.StorageClass }}
        volume.beta.kubernetes.io/storage-class: {{.Values.StorageClass | quote}}
      {{- else }}
        volume.alpha.kubernetes.io/storage-class: default
      {{- end }}
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          # upstream recommended max is 700M
          storage: "{{.Values.Storage}}"
-- Mark VanDeWeert
kubernetes

3 Answers

4/28/2017

Answer from the Kubernetes documentation about Persistent Volumes: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#gce zone: GCE zone. If not specified, a random zone in the same region as controller-manager will be chosen. I guess your controller manager is in region us-central-1, so any zone can be choosen from that region, in your case I guess the only zone that is not covered is us-central-1b, so you have to start a node there as well, or set the zone in the StorageClass resource.

-- Nándor Krácser
Source: StackOverflow

4/29/2017

You could create storage classes for each zone, then a PV/PVC may specify that storage class. Your stateful sets/deployments could be set up to target a specific node via nodeSelector so they always get scheduled on a node in a specific zone (see built-in node labels)

storage_class.yml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: us-central-1a
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  zone: us-central1-a

persistent_volume.yml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: some-volume
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: us-central-1a

Note that you can use storageClassName in kubernetes 1.6, or otherwise the annotation volume.beta.kubernetes.io/storage-class should work too (however will deprecate in the future).

-- Nabeel
Source: StackOverflow

10/23/2017

There is a bug open for this issue here.

The workaround in the meantime is to set the zones parameter in your StorageClass to specify the exact zones where your Kubernetes cluster has nodes. Here is an example.

-- Michelle
Source: StackOverflow