Azure AKS backup using Velero

5/20/2020

I noticed that Velero can only backup AKS PVCs if those PVCs are disk and not Azure fileshares. To handle this i tried to use restic to backup by fileshares itself but i gives me a strange log:

This is how my actual pod looks like

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: grafana-data
    deployment.kubernetes.io/revision: "17"

And the log of my backup:

time="2020-05-26T13:51:54Z" level=info msg="Adding pvc grafana-data to additionalItems" backup=velero/grafana-test-volume cmd=/velero logSource="pkg/backup/pod_action.go:67" pluginName=velero
time="2020-05-26T13:51:54Z" level=info msg="Backing up item" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:169" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Executing custom action" backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:330" name=grafana-data namespace=grafana resource=persistentvolumeclaims
time="2020-05-26T13:51:54Z" level=info msg="Skipping item because it's already been backed up." backup=velero/grafana-test-volume group=v1 logSource="pkg/backup/item_backupper.go:163" name=grafana-data namespace=grafana resource=persistentvolumeclaims

As you can see somehow it did not backup the grafana-data volume since it says it is already in the backup (where it is actually not).

My azurefile volume holds these contents:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1beta1","kind":"StorageClass","metadata":{"annotations":{},"labels":{"kubernetes.io/cluster-service":"true"},"name":"azurefile"},"parameters":{"skuName":"Standard_LRS"},"provisioner":"kubernetes.io/azure-file"}
  creationTimestamp: "2020-05-18T15:18:18Z"
  labels:
    kubernetes.io/cluster-service: "true"
  name: azurefile
  resourceVersion: "1421202"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/azurefile
  uid: e3cc4e52-c647-412a-bfad-81ab6eb222b1
mountOptions:
- nouser_xattr
parameters:
  skuName: Standard_LRS
provisioner: kubernetes.io/azure-file
reclaimPolicy: Delete
volumeBindingMode: Immediate

As you can see i actually patched the storage class to hold the nouser_xattr mount option which was suggested earlier

When i check the Restic pod logs i see the following info:

E0524 10:22:08.908190       1 reflector.go:156] github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117: Failed to list *v1.PodVolumeBackup: Get https://10.0.0.1:443/apis/velero.io/v1/namespaces/velero/podvolumebackups?limit=500&resourceVersion=1212830: dial tcp 10.0.0.1:443: i/o timeout
I0524 10:22:08.909577       1 trace.go:116] Trace[1946538740]: "Reflector ListAndWatch" name:github.com/vmware-tanzu/velero/pkg/generated/informers/externalversions/factory.go:117 (started: 2020-05-24 10:21:38.908988405 +0000 UTC m=+487217.942875118) (total time: 30.000554209s):
Trace[1946538740]: [30.000554209s] [30.000554209s] END

When i check the PodVolumeBackup pod i see below contents. I don't know what is expected here though

➜  ~ kubectl -n velero get podvolumebackups -o yaml              
apiVersion: v1
items: []
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

To summarize this, i installed Velero like this:

velero install \
  --provider azure \
  --plugins velero/velero-plugin-for-microsoft-azure:v1.0.1 \
  --bucket $BLOB_CONTAINER \
  --secret-file ./credentials-velero \
  --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID \
  --snapshot-location-config apiTimeout=5m,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \
  --use-restic
  --wait

The end result is the deployment described below

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    backup.velero.io/backup-volumes: app-upload
    deployment.kubernetes.io/revision: "18"
  creationTimestamp: "2020-05-18T16:55:38Z"
  generation: 10
  labels:
    app: app
    velero.io/backup-name: mekompas-tenant-production-20200518020012
    velero.io/restore-name: mekompas-tenant-production-20200518020012-20200518185536
  name: app
  namespace: mekompas-tenant-production
  resourceVersion: "427893"
  selfLink: /apis/extensions/v1beta1/namespaces/mekompas-tenant-production/deployments/app
  uid: c1961ec3-b7b1-4f81-9aae-b609fa3d31fc
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2020-05-18T20:24:19+02:00"
      creationTimestamp: null
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine
        imagePullPolicy: IfNotPresent
        name: app-nginx
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /etc/nginx/conf.d
          name: nginx-vhost
      - env:
        - name: CONF_DB_HOST
          value: db.mekompas-tenant-production
        - name: CONF_DB
          value: mekompas
        - name: CONF_DB_USER
          value: mekompas
        - name: CONF_DB_PASS
          valueFrom:
            secretKeyRef:
              key: DATABASE_PASSWORD
              name: secret
        - name: CONF_EMAIL_FROM_ADDRESS
          value: noreply@mekompas.nl
        - name: CONF_EMAIL_FROM_NAME
          value: mekompas
        - name: CONF_EMAIL_REPLYTO_ADDRESS
          value: slc@mekompas.nl
        - name: CONF_UPLOAD_PATH
          value: /uploads
        - name: CONF_SMTP_HOST
          value: smtp.sendgrid.net
        - name: CONF_SMTP_PORT
          value: "587"
        - name: CONF_SMTP_USER
          value: apikey
        - name: CONF_SMTP_PASSWORD
          valueFrom:
            secretKeyRef:
              key: MAIL_PASSWORD
              name: secret
        image: me.azurecr.io/mekompas/php-fpm-alpine:1.12.0
        imagePullPolicy: Always
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - cp -r /app/. /var/www/html && chmod -R 777 /var/www/html/templates_c
                && chmod -R 777 /var/www/html/core/lib/htmlpurifier-4.9.3/library/HTMLPurifier/DefinitionCache
        name: app-php
        ports:
        - containerPort: 9000
          name: upstream-php
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/www/html
          name: app-files
        - mountPath: /uploads
          name: app-upload
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: registrypullsecret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: app-upload
        persistentVolumeClaim:
          claimName: upload
      - emptyDir: {}
        name: app-files
      - configMap:
          defaultMode: 420
          name: nginx-vhost
        name: nginx-vhost
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-05-18T18:12:20Z"
    lastUpdateTime: "2020-05-18T18:12:20Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-05-18T16:55:38Z"
    lastUpdateTime: "2020-05-20T16:03:48Z"
    message: ReplicaSet "app-688699c5fb" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 10
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Best, Pim

-- Dirkos
azure-aks
backup
kubernetes
velero

1 Answer

5/20/2020

Have you added nouser_xattr to your StorageClass mountOptions list?

This requirement is documented in GitHub issue 1800.

Also mentioned on the restic integration page (check under the Azure section), where they provide this snippet to patch your StorageClass resource:

kubectl patch storageclass/<YOUR_AZURE_FILE_STORAGE_CLASS_NAME> \
  --type json \
  --patch '[{"op":"add","path":"/mountOptions/-","value":"nouser_xattr"}]'

If you have no existing mountOptions list, you can try:

kubectl patch storageclass azurefile \
  --type merge \
  --patch '{"mountOptions": ["nouser_xattr"]}'

Ensure the pod template of the Deployment resource includes the annotation backup.velero.io/backup-volumes. Annotations on Deployment resources will propagate to ReplicaSet resources, but not to Pod resources.

Specifically, in your example the annotation backup.velero.io/backup-volumes: app-upload should be a child of spec.template.metadata.annotations, rather than a child of metadata.annotations.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    # *** move velero annotiation from here ***
  labels:
    app: app
  name: app
  namespace: mekompas-tenant-production
spec:
  template:
    metadata:
      annotations:
        # *** velero annotation goes here in order to end up on the pod ***
        backup.velero.io/backup-volumes: app-upload
      labels:
        app: app
    spec:
      containers:
      - image: nginx:1.17-alpine
-- bpdohall
Source: StackOverflow