I'm running a kubernetes 1.6.2 cluster across three nodes in different zones in GKE and I'm trying to deploy my statefulset where each pod in the statefulset gets a PV attached to it. The problem is that kubernetes is creating the PVs in the one zone where I don't have a node!
$ kubectl describe node gke-multi-consul-default-pool-747c9378-zls3|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-a
$ kubectl describe node gke-multi-consul-default-pool-7e987593-qjtt|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-f
$ kubectl describe node gke-multi-consul-default-pool-8e9199ea-91pj|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-c
$ kubectl describe pv pvc-3f668058-2c2a-11e7-a7cd-42010a8001e2|grep 'zone=us-central1'
failure-domain.beta.kubernetes.io/zone=us-central1-b
I'm using the standard storageclass which has no default zone set:
$ kubectl describe storageclass standard
Name: standard
IsDefaultClass: Yes
Annotations: storageclass.beta.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/gce-pd
Parameters: type=pd-standard
Events: <none>
So I thought that the scheduler would automatically provision the volumes in a zone where a cluster node existed, but it doesn't seem to be doing that.
For reference, here is the yaml for my statefulset:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: "{{ template "fullname" . }}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
component: "{{.Release.Name}}-{{.Values.Component}}"
spec:
serviceName: "{{ template "fullname" . }}"
replicas: {{default 3 .Values.Replicas}}
template:
metadata:
name: "{{ template "fullname" . }}"
labels:
heritage: {{.Release.Service | quote }}
release: {{.Release.Name | quote }}
chart: "{{.Chart.Name}}-{{.Chart.Version}}"
component: "{{.Release.Name}}-{{.Values.Component}}"
app: "consul"
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
securityContext:
fsGroup: 1000
containers:
- name: "{{ template "fullname" . }}"
image: "{{.Values.Image}}:{{.Values.ImageTag}}"
imagePullPolicy: "{{.Values.ImagePullPolicy}}"
ports:
- name: http
containerPort: {{.Values.HttpPort}}
- name: rpc
containerPort: {{.Values.RpcPort}}
- name: serflan-tcp
protocol: "TCP"
containerPort: {{.Values.SerflanPort}}
- name: serflan-udp
protocol: "UDP"
containerPort: {{.Values.SerflanUdpPort}}
- name: serfwan-tcp
protocol: "TCP"
containerPort: {{.Values.SerfwanPort}}
- name: serfwan-udp
protocol: "UDP"
containerPort: {{.Values.SerfwanUdpPort}}
- name: server
containerPort: {{.Values.ServerPort}}
- name: consuldns
containerPort: {{.Values.ConsulDnsPort}}
resources:
requests:
cpu: "{{.Values.Cpu}}"
memory: "{{.Values.Memory}}"
env:
- name: INITIAL_CLUSTER_SIZE
value: {{ default 3 .Values.Replicas | quote }}
- name: STATEFULSET_NAME
value: "{{ template "fullname" . }}"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: STATEFULSET_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: datadir
mountPath: /var/lib/consul
- name: gossip-key
mountPath: /etc/secrets
readOnly: true
- name: config
mountPath: /etc/consul
- name: tls
mountPath: /etc/tls
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
livenessProbe:
exec:
command:
- consul
- members
initialDelaySeconds: 300
timeoutSeconds: 5
command:
- "/bin/sh"
- "-ec"
- "/tmp/consul-start.sh"
volumes:
- name: config
configMap:
name: consul
- name: gossip-key
secret:
secretName: {{ template "fullname" . }}-gossip-key
- name: tls
secret:
secretName: consul
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
{{- if .Values.StorageClass }}
volume.beta.kubernetes.io/storage-class: {{.Values.StorageClass | quote}}
{{- else }}
volume.alpha.kubernetes.io/storage-class: default
{{- end }}
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
# upstream recommended max is 700M
storage: "{{.Values.Storage}}"
Answer from the Kubernetes documentation about Persistent Volumes: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#gce zone: GCE zone. If not specified, a random zone in the same region as controller-manager will be chosen.
I guess your controller manager is in region us-central-1, so any zone can be choosen from that region, in your case I guess the only zone that is not covered is us-central-1b, so you have to start a node there as well, or set the zone in the StorageClass resource.
You could create storage classes for each zone, then a PV/PVC may specify that storage class. Your stateful sets/deployments could be set up to target a specific node via nodeSelector
so they always get scheduled on a node in a specific zone (see built-in node labels)
storage_class.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: us-central-1a
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zone: us-central1-a
persistent_volume.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: some-volume
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
storageClassName: us-central-1a
Note that you can use storageClassName
in kubernetes 1.6, or otherwise the annotation volume.beta.kubernetes.io/storage-class
should work too (however will deprecate in the future).