Velero - Restore Partially fails for volumes provisioned using CSI driver

1/6/2022

As part of POC, I am trying to backup and restore volumes provisioned by the GKE CSI driver in the same GKE cluster. However, the restore fails with no logs to debug.

Steps:

Create volume snapshot class: kubectl create -f vsc.yaml

# vsc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-gce-vsc
  labels:
    "velero.io/csi-volumesnapshot-class": "true"
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

Create storage class: kubectl create -f sc.yaml

# sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-example
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
  type: pd-standard

Create namespace: kubectl create namespace csi-app

Create a persistent volume claim: kubectl create -f pvc.yaml

# pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: podpvc
  namespace: csi-app
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: pd-example
  resources:
    requests:
      storage: 6Gi

Create a pod to consume the pvc: kubectl create -f pod.yaml

# pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: web-server
  namespace: csi-app
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - mountPath: /var/lib/www/html
         name: mypvc
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: podpvc
       readOnly: false

Once the pvc is bound, I created the velero backup.

velero backup create test --include-resources=pvc,pv --include-namespaces=csi-app --wait

Output:

Backup request "test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe test` and `velero backup logs test`.

velero describe backup test

Name:         test
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.5-gke.1302
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21

Phase:  Completed

Errors:    0
Warnings:  1

Namespaces:
  Included:  csi-app
  Excluded:  <none>

Resources:
  Included:        pvc, pv
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2021-12-22 15:40:08 +0300 +03
Completed:  2021-12-22 15:40:10 +0300 +03

Expiration:  2022-01-21 15:40:08 +0300 +03

Total items to be backed up:  2
Items backed up:              2

Velero-Native Snapshots: <none included>

After the backup is created, I verified the backup was created and was available in my GCS bucket.

Delete all the existing resources to test restore.

kubectl delete -f pod.yaml
kubectl delete -f pvc.yaml
kubectl delete -f sc.yaml
kubectl delete namespace csi-app

Run restore command:

velero restore create --from-backup test --wait

Output:

Restore request "test-20211222154302" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.

What did you expect to happen: I was expecting the pv, pvc and the namespace get restored.

The following information will help us better understand what's going on:

velero debug --backup test --restore test-20211222154302 command is stuck for more than 10 minutes and I couldn't generate the support bundle. Output:

2021/12/22 15:45:16 Collecting velero resources in namespace: velero
2021/12/22 15:45:24 Collecting velero deployment logs in namespace: velero
2021/12/22 15:45:28 Collecting log and information for backup: test
Environment:

Velero version (use velero version):
Client:
Version: v1.7.1
Git commit: -
Server:
Version: v1.7.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
GKE 1.21.5-gke.1302
Cloud provider or hardware configuration:
GCP
OS (e.g. from /etc/os-release):
GCP Container-Optimized OS (COS)
-- Shibily Shukoor
google-kubernetes-engine
kubernetes
velero

0 Answers