Daily file drop into Kubernetes

5/9/2019

I have a requirement to create a solution in Kubernetes using non-cloud native technologies.

One of the requirements is to import data on a daily basis into a database in the cluster. The cluster can either run on prem, or in the cloud on a VPN etc.

The question I have is what options do I have for getting the data into the cluster? At the moment the data would be supplied in a .csv file.

Would it be possible to have a cron job run from within the cluster to pick up files, can the cluster access the network that is hosting it etc?

-- Gary F
kubernetes

2 Answers

5/9/2019

Best way is to use cloud object storage like S3/GCS or on-prem object storage like Ceph RadosGW or Minio. Object storage lets you use well-known S3-compatible API with many famous clients libraries over HTTP(S). AWS S3 and GCS has numerous advantages:

  1. HTTP(S) protocol which is most popular and very rarely blocked on firewalls.
  2. Infinite capacity and pay-as-you-go pricing model.
  3. ACLs, strong authentication.
  4. Signed URLs which allow to give access by link.
  5. Object versioning and lifecycle rules.

In Kubernetes you can set up cronjob with credentials that will download or upload data into object storage regularly.

-- Vasily Angapov
Source: StackOverflow

5/9/2019

If I would be you then, I would have created:

  1. A network shared directory say, /tmp/data-to-import.

  2. Mounted that directory using volumes in my cron job pod.

  3. Mounted that shared directory (from step 1) to some server or on local disk.

Now, only thing you have to do daily is; drop your data file in that directory.

There could be various ways to solve this problem but as you asked for opinion, this is my opinion. :)

-- Prateek Jain
Source: StackOverflow