How to run periodic volume snapshots using k8s client and cronjob

5/22/2020

I have a volume PersistentVolumeClaim that I want to run snapshots for. I know there is VolumeSnapshot docs. I think the best way to run periodic snapshots is to create a CronJob for that.

So I've created a docker image with python k8s client and my custom script. This way I'm able to run it whenever I want and I can access kube config and all resources directly from the pod.

FROM python:3.8-slim-buster
RUN apt-get -qq update && apt-get -qq install -y git
COPY . .
RUN pip install --upgrade pip
RUN pip install git+https://github.com/kubernetes-client/python.git

The first problem I encountered, was that I was trying to use this VolumeSnapshot template using apiVersion: snapshot.storage.k8s.io/v1beta1

doing

client = kubernetes.client.ApiClient()
utils.create_from_yaml(
        k8s_client=client,
        yaml_file='snapshot.yaml',
        verbose=True
    )

but it fails

AttributeError: module 'kubernetes.client' has no attribute 'SnapshotStorageV1beta1Api'

And indeed I can't find it in python client, js client nor in the official docs v1.18. Maybe it's because it's in beta?

Then I tried writing some custom code. So I have

def main():
    _configuration = kubernetes.client.Configuration()
    _client = kubernetes.client.ApiClient(_configuration)
    _storage_api = kubernetes.client.StorageV1beta1Api(_client)
    storages = _storage_api.get_api_resources()

But it returns a list of V1 storage class and I can't find any way to create a snapshot from the response.

In my terminology, a snapshot is a copy-paste of a current state to anything else. Do you know how to achieve that?

I feel like I'm overengineering this but I don't want to use 3rd parties such as stash.

I'm running on GKE.

-- Tomasz Wojcik
google-kubernetes-engine
kubernetes

1 Answer

5/23/2020

That persistentVolumeClaim is mapped to a PersistentVolume, if you log into your gcloud console and on the Sidebar

Compute Engine -> Disks

You will get a list of all the GCE Disks that are being used on your project.

You will need to run

kubectl get pvc --namespace YOUR_NAMESPACE

this will get you a list of all pvc, you need to figure out which one is the PVC you want, in order to get the disk name so you can view in the console you need to do something like this

╰─ kubectl describe pvc NAME_OF_PVC  --namespace YOUR_NAMESPACE
Name:          NAME_OF_PVC
Namespace:     YOUR_NAMESPACE
StorageClass:  standard
Status:        Bound
Volume:        pvc-61e864b6-6fbf-4a36-80af-8a65e1588b58
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    <none>
Events:        <none>

Where your volume name would be pvc-61e864b6-6fbf-4a36-80af-8a65e1588b58

Go back to the console and filter by this name and then click on it, you should be able to create a snapshot from there.

Now the correct approach would be to create a snapshot schedule and bind it to your disk as shown here (https://cloud.google.com/blog/products/compute/introducing-scheduled-snapshots-for-compute-engine-persistent-disk).

When you are done creating the snapshot schedule you can edit the disk in the console and assign it whatever shanpshot schedule you created to your disk.

-- V3RL4223N3
Source: StackOverflow