I'm using bitnami/etcd chart and it has ability to create snapshots via EFS mounted pvc.
However I get permission error after aws-efs-csi-driver is provisioned and PVC mounted to any non-root pod (user/gid is 1001)
I'm using helm chart https://kubernetes-sigs.github.io/aws-efs-csi-driver/ version 2.2.0
values of the chart:
# you can obtain the fileSystemId with
# aws efs describe-file-systems --query "FileSystems[*].FileSystemId"
storageClasses:
- name: efs
parameters:
fileSystemId: fs-exxxxxxx
directoryPerms: "777"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/snapshots"
# enable it after the following issue is resolved
# https://github.com/bitnami/charts/issues/7769
# node:
# nodeSelector:
# etcd: "true"
I then manually created the PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: etcd-snapshotter-pv
annotations:
argocd.argoproj.io/sync-wave: "60"
spec:
capacity:
storage: 32Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs
csi:
driver: efs.csi.aws.com
volumeHandle: fs-exxxxxxx
Then if I mount that EFS PVC in non-rood pod I get the following error
➜ klo etcd-snapshotter-001-ph8w9
etcd 23:18:38.76 DEBUG ==> Using endpoint etcd-snapshotter-001-ph8w9:2379
{"level":"warn","ts":1633994320.7789018,"logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0005ea380/#initially=[etcd-snapshotter-001-ph8w9:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.120.2.206:2379: connect: connection refused\""}
etcd-snapshotter-001-ph8w9:2379 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster
etcd 23:18:40.78 WARN ==> etcd endpoint etcd-snapshotter-001-ph8w9:2379 not healthy. Trying a different endpoint
etcd 23:18:40.78 DEBUG ==> Using endpoint etcd-2.etcd-headless.etcd.svc.cluster.local:2379
etcd-2.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 1.6312ms
etcd 23:18:40.87 INFO ==> Snapshotting the keyspace
Error: could not open /snapshots/db-2021-10-11_23-18.part (open /snapshots/db-2021-10-11_23-18.part: permission denied)
As a result I have to spawn a new "root" pod, get inside the pod and manually adjust the permissions
apiVersion: v1
kind: Pod
metadata:
name: perm
spec:
securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
containers:
- name: app1
image: busybox
command: ["/bin/sh"]
args: ["-c", "sleep 3000"]
volumeMounts:
- name: persistent-storage
mountPath: /snapshots
securityContext:
runAsUser: 0
runAsGroup: 0
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: etcd-snapshotter
nodeSelector:
etcd: "true"
k apply -f setup.yaml
k exec -ti perm -- ash
cd /snapshots
/snapshots # chown -R 1001.1001 .
/snapshots # chmod -R 777 .
/snapshots # exit
➜ k create job --from=cronjob/etcd-snapshotter etcd-snapshotter-001
job.batch/etcd-snapshotter-001 created
➜ klo etcd-snapshotter-001-bmv79
etcd 23:31:10.22 DEBUG ==> Using endpoint etcd-1.etcd-headless.etcd.svc.cluster.local:2379
etcd-1.etcd-headless.etcd.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 2.258532ms
etcd 23:31:10.32 INFO ==> Snapshotting the keyspace
{"level":"info","ts":1633995070.4244702,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/snapshots/db-2021-10-11_23-31.part"}
{"level":"info","ts":1633995070.4907935,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1633995070.4908395,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379"}
{"level":"info","ts":1633995070.4965465,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1633995070.544217,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"etcd-1.etcd-headless.etcd.svc.cluster.local:2379","size":"320 kB","took":"now"}
{"level":"info","ts":1633995070.5507936,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/snapshots/db-2021-10-11_23-31"}
Snapshot saved at /snapshots/db-2021-10-11_23-31
➜ k exec -ti perm -- ls -la /snapshots
total 924
drwxrwxrwx 2 1001 1001 6144 Oct 11 23:31 .
drwxr-xr-x 1 root root 46 Oct 11 23:25 ..
-rw------- 1 1001 root 319520 Oct 11 23:31 db-2021-10-11_23-31
I have this setting in storage class
gidRangeStart: "1000"
gidRangeEnd: "2000"
but it has no effect.
PVC is defined as:
➜ kg pvc etcd-snapshotter -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: efs.csi.aws.com
name: etcd-snapshotter
namespace: etcd
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 32Gi
storageClassName: efs
volumeMode: Filesystem
volumeName: etcd-snapshotter-pv
By default the StorageClass field provisioningMode
is unset, please set it to provisioningMode: "efs-ap"
to enable dynamic provision with access point.