How to copy an S3 bucket onto Kubernetes nodes

10/29/2018

I wanted to copy an S3 bucket on Kubernetes nodes as a DaemonSet, as the new node will also get the s3 bucket copy as soon it gets launched, I prefer an S3 copy to the Kubernetes node because copying S3 to directly to the pod as an AWS API would mean multiple calls as multiple pods require it and it will take time to copy content each time when the pod is launching.

-- Rajaneesh
amazon-s3
daemonset
kops
kubernetes

1 Answer

10/29/2018

Assuming that your S3 content is static and doesn't change often. I believe more than a DaemonSet it makes more sense to use a one time Job to copy the whole S3 bucket to a local disk. It's not clear how you would signal the kube-scheduler that your node is not ready until the S3 bucket is fully copied. But, perhaps you can taint your node before the job is finished and remove the taint after the job finishes.

Note also that S3 is inherently slow and meant to be used for processing (reading/writing) single files at a time, so if your bucket has a large amount of data it would take a long time to copy to the node disk.

If your S3 content is dynamically (constantly changing) then it would be more challenging since you would have to files in sync. Your apps would probably have to cache architecture where you would go to the local disk to find files and if they are not there, then make a request to S3.

-- Rico
Source: StackOverflow