I am trying to deploy a stateful set mounted on a Persistent Volume.
I installed Kubernetes on AWS via kops.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
According to this issue I need to create the PVC first:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: zk-data-claim
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: zk-logs-claim
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
The default
storage class exists, and the PVC binds a PV successfully:
$ kubectl get sc
NAME PROVISIONER AGE
default kubernetes.io/aws-ebs 20d
gp2 (default) kubernetes.io/aws-ebs 20d
ssd (default) kubernetes.io/aws-ebs 20d
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
zk-data-claim Bound pvc-5584fdf7-3853-11e8-a73b-02bb35448afe 2Gi RWO default 11m
zk-logs-claim Bound pvc-5593e249-3853-11e8-a73b-02bb35448afe 2Gi RWO default 11m
I can see these two volumes in the EC2 EBS Volumes list as "available" at first, but then later becomes "in-use".
And then ingest it in my StatefulSet
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: zk
spec:
serviceName: zk-cluster
replicas: 3
template:
metadata:
labels:
app: zookeeper
spec:
volumes:
- name: zk-data
persistentVolumeClaim:
claimName: zk-data-claim
- name: zk-logs
persistentVolumeClaim:
claimName: zk-logs-claim
containers:
....
volumeMounts:
- name: zk-data
mountPath: /opt/zookeeper/data
- name: zk-logs
mountPath: /opt/zookeeper/logs
Which fails with
Unable to mount volumes for pod "zk-0_default(83b8dc93-3850-11e8-a73b-02bb35448afe)": timeout expired waiting for volumes to attach/mount for pod "default"/"zk-0". list of unattached/unmounted volumes=[zk-data zk-logs]
I'm working in the default namespace.
Any ideas what could be causing this failure?
Yeah, that is a very, very well known problem with AWS and kubernetes. Most often it is caused by a stale directory on another Node causing the EBS volume to still be "in use" from the other Node's perspective, and thus the Linux machine will not turn loose of the device when requested by the AWS API. You will see plenty of chatter about that in both kubelet.service
journals, on the machine with the EBS and the machine that wants the EBS.
It has been my experience that only ssh-ing into the Node where the EBS volume is currently attached, finding the mount(s), unmounting them, and then waiting for the exponential back-off timer to expire will solve that :-(
The hand-waving version is:
## cleaning up stale docker containers might not be a terrible idea
docker rm $(docker ps -aq -f status=exited)
## identify any (and there could very well be multiple) mounts
## of the EBS device-name
mount | awk '/dev\/xvdf/ {print $2}' | xargs umount
## or sometimes kubernetes will actually name the on-disk directory ebs~ so:
mount | awk '/ebs~something/{print $2}' | xargs umount
You may experience some success involving lsof
in that process, too, but hopefully(!) cleaning up the exited containers will remove the need for such a thing.
The problem was that my cluster was made with C5 nodes. C5 and M5 nodes follow a different naming convention (NVMe), and the naming is not recognised.
Recreate the cluster with t2 type nodes.