As part of POC, I am trying to backup and restore volumes provisioned by the GKE CSI driver in the same GKE cluster. However, the restore fails with no logs to debug.
Steps:
Create volume snapshot class: kubectl create -f vsc.yaml
# vsc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-gce-vsc
labels:
"velero.io/csi-volumesnapshot-class": "true"
driver: pd.csi.storage.gke.io
deletionPolicy: Delete
Create storage class: kubectl create -f sc.yaml
# sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-example
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-standard
Create namespace: kubectl create namespace csi-app
Create a persistent volume claim: kubectl create -f pvc.yaml
# pvc.yaml
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
namespace: csi-app
spec:
accessModes:
- ReadWriteOnce
storageClassName: pd-example
resources:
requests:
storage: 6Gi
Create a pod to consume the pvc: kubectl create -f pod.yaml
# pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: web-server
namespace: csi-app
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- mountPath: /var/lib/www/html
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: podpvc
readOnly: false
Once the pvc is bound, I created the velero backup.
velero backup create test --include-resources=pvc,pv --include-namespaces=csi-app --wait
Output:
Backup request "test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe test` and `velero backup logs test`.
velero describe backup test
Name: test
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.21.5-gke.1302
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=21
Phase: Completed
Errors: 0
Warnings: 1
Namespaces:
Included: csi-app
Excluded: <none>
Resources:
Included: pvc, pv
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2021-12-22 15:40:08 +0300 +03
Completed: 2021-12-22 15:40:10 +0300 +03
Expiration: 2022-01-21 15:40:08 +0300 +03
Total items to be backed up: 2
Items backed up: 2
Velero-Native Snapshots: <none included>
After the backup is created, I verified the backup was created and was available in my GCS bucket.
Delete all the existing resources to test restore.
kubectl delete -f pod.yaml
kubectl delete -f pvc.yaml
kubectl delete -f sc.yaml
kubectl delete namespace csi-app
Run restore command:
velero restore create --from-backup test --wait
Output:
Restore request "test-20211222154302" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
.
Restore completed with status: PartiallyFailed. You may check for more information using the commands `velero restore describe test-20211222154302` and `velero restore logs test-20211222154302`.
velero describe or velero logs command doesn't return any description/logs.
What did you expect to happen:
I was expecting the pv
, pvc
and the namespace
get restored.
The following information will help us better understand what's going on:
velero debug --backup test --restore test-20211222154302
command is stuck for more than 10 minutes and I couldn't generate the support bundle.
Output:
2021/12/22 15:45:16 Collecting velero resources in namespace: velero
2021/12/22 15:45:24 Collecting velero deployment logs in namespace: velero
2021/12/22 15:45:28 Collecting log and information for backup: test
Environment:
Velero version (use velero version):
Client:
Version: v1.7.1
Git commit: -
Server:
Version: v1.7.1
Velero features (use velero client config get features):
features:
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:33:37Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes installer & version:
GKE 1.21.5-gke.1302
Cloud provider or hardware configuration:
GCP
OS (e.g. from /etc/os-release):
GCP Container-Optimized OS (COS)