I have a Pod
mounting a volume from a PersistentVolumeClaim
. The PVC uses a StorageClass
provisioning EBS volumes with the xfs
filesystem. The setup is as below:
volumeMounts:
- mountPath: "/opt/st1"
name: opt-st1
volumes:
- name: opt-st1
persistentVolumeClaim:
claimName: st1-xfs-pvc
kind: PersistentVolumeClaim
metadata:
name: st1-xfs-pvc
labels:
app: st1-xfs-pvc
spec:
storageClassName: st1-xfs-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: st1-xfs-sc
provisioner: kubernetes.io/aws-ebs
parameters:
type: st1
fsType: xfs
reclaimPolicy: Retain
mountOptions:
- debug
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-east-1a
When I run this setup on an EKS-based cluster (version 1.13), I get the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45s default-scheduler Successfully assigned jira-node-deployment-5f4f59c44d-jbc4c to ip-10-237-86-124.ec2.internal
Warning FailedAttachVolume 40s (x4 over 44s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-50996814-bf53-11e9-848f-0ec61103f6e0" : "Error attaching EBS volume \"vol-077709885f54252c7\"" to instance "i-0fe9867c4129f058e" since volume is in "creating" state
Normal SuccessfulAttachVolume 33s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-50996814-bf53-11e9-848f-0ec61103f6e0"
Warning FailedMount 24s kubelet, ip-10-237-86-124.ec2.internal MountVolume.MountDevice failed for volume "pvc-50996814-bf53-11e9-848f-0ec61103f6e0" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7 --scope -- mount -t xfs -o debug,defaults /dev/xvdbp /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7
Output: Running scope as unit run-979548.scope.
mount: /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7: wrong fs type, bad option, bad superblock on /dev/xvdbp, missing codepage or helper program, or other error.
Warning FailedMount 22s kubelet, ip-10-237-86-124.ec2.internal MountVolume.MountDevice failed for volume "pvc-50996814-bf53-11e9-848f-0ec61103f6e0" : mount failed: exit status 32
If I connect to the Kubernetes worker, and run the same command manually, I am able to reproduce the error:
$ systemd-run --description='Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-068d85e415249b896' --scope -- mount -t xfs -o debug,defaults /dev/xvdcg /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-068d85e415249b896
Running scope as unit run-982245.scope.
mount: /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7: mount point does not exist.
$ mkdir /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7
$ systemd-run --description='Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-068d85e415249b896' --scope -- mount -t xfs -o debug,defaults /dev/xvdcg /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-068d85e415249b896
Running scope as unit run-982245.scope.
mount: /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7: wrong fs type, bad option, bad superblock on /dev/xvdbp, missing codepage or helper program, or other error.
$ echo $?
32
I noticed that by removing the debug
option from the command and run it again, then the volume is mounted fine...
$ systemd-run --description='Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7' --scope -- mount -t xfs -o defaults /dev/xvdbp /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-east-1a/vol-077709885f54252c7
Running scope as unit run-986177.scope.
... and the Pod
runs fine a few seconds after that:
Normal Pulled 50s kubelet, ip-10-237-86-124.ec2.internal Container image "nginx:alpine" already present on machine
Normal Created 49s kubelet, ip-10-237-86-124.ec2.internal Created container
Normal Started 46s kubelet, ip-10-237-86-124.ec2.internal Started container
I also noticed that if I use ext4
instead of xfs
, the setup described above works fine.
After a while I realized that the debug
action was added by myself in the StorageClass
config:
mountOptions:
- debug
After I removed these two lines, everything works as expected.