I've dockerized a python project that requires the use of several CSVs (~2gb). In order to keep image size down I didn't include the CSVs in the build, instead opting to give the running container the data from a directory outside the container through a volume. Locally, when running through docker, I can just do
docker run -v ~/local/path/:/container/path my-image:latest
This works, but I'm not sure how to go about doing this in Kubernetes. I've been reading the documentation and am confused by the number of volume types, where the actual CSVs should be stored, etc.
Based on the information about the project that I've provided, is there an obvious solution?
Here is a typical example of sharing between containers. You can keep your data in a separate container and code in a different container.
apiVersion: v1
kind: Pod
metadata:
name: two-containers
spec:
restartPolicy: Never
volumes:
- name: shared-data
emptyDir: {}
containers:
- name: nginx-container
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: debian-container
image: debian
volumeMounts:
- name: shared-data
mountPath: /pod-data
command: ["/bin/sh"]
args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]
Hope it helps.
If you'd like to replicate that exact behavior from Docker the most common way to do it is to use hostPath. Something like this:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: my-image:latest
name: my-container
volumeMounts:
- mountPath: /container/path
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /usr/local/path
type: Directory