according to the official docs https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/ with the “Retain” policy a PV can be manually recovered . What does that actually mean and is there a tool how I can read the data from that "retained" PV and write it into to another PV , or does it mean you can mount that volume manual in order to gain access ?
The process to manually recover the volume is as below.
You can use the same PV to mount to different pod along with the data even after the PVC is deleted (PV must exist, will typically exist if the reclaim policy of storageclass is Retain)
Verify that PV is in released state. (ie no pvc has claimed it currently)
➜ ~ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-eae6acda-59c7-11e9-ab12-06151ee9837e 16Gi RWO Retain Released default/dhanvi-test-pvc gp2 52m
Edit the PV (kubectl edit pv pvc-eae6acda-59c7-11e9-ab12-06151ee9837e
) and remove the spec.claimRef part. The PV claim would be unset like below.
➜ ~ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-eae6acda-59c7-11e9-ab12-06151ee9837e 16Gi RWO Retain Available gp2 57m
Then claim the PV using PVC as below.
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dhanvi-test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 16Gi
volumeName: "pvc-eae6acda-59c7-11e9-ab12-06151ee9837e"
Can be used in the pods as below.
volumes:
- name: dhanvi-test-pv
persistentVolumeClaim:
claimName: dhanvi-test-pvc
Update: Volume cloning might help https://kubernetes.io/blog/2019/06/21/introducing-volume-cloning-alpha-for-kubernetes/
There are three reclaim policies which define what happens with the persistent volume after deletion of the bound volume claim
Delete means the persistent volume as well as the associated storage asset in the external infrastructure is deleted.
Recycle will clean up the volume rm -rf /thevolume/* and after that it will be available for new persistent volume claims.
Retain leaves persistent volume in state released which does not allow for new persistent volume claims to reclaim it. The whole reclaim process is manual. You need to delete the persistent volume yourself. You can backup the data from the storage asset and delete the data afterwards. Then you can either delete the storage asset or create a new persistent volume for this asset.
If you want to write the data to another persistent volume using Kubernetes you could use a Job to copy the data.
In that case make sure you use persistent volume access modes ROX - ReadOnlyMany or RWX - ReadWriteMany and start a Job running a container which claims the persistent volume to be backed-up using a selector and claim another destination backup volume. Then copy the data via the container.
Alternatively, you can do the backup outside Kubernetes. Your method does then depend on the type of storage asset you are using. E.g., if you are using NFS you could mount source and destination and copy the data via command line.
Both options I've framed are more or less manual backup strategy. If you aim for a more sophisticated backup strategy for production workloads you might have a look at Stash - Backup for your disks for production workloads in Kubernetes