How to use Skaffold with kubernetes volumes?

8/28/2019

I have an python application whose docker build takes about 15-20 minutes. Here is how my Dockerfile looks like more or less

FROM ubuntu:18.04
...
COPY . /usr/local/app
RUN pip install -r /usr/local/app/requirements.txt
...
CMD ...

Now if I use skaffold, any code change triggers a rebuild and it is going to do a reinstall of all requirements(as from the COPY step everything else is going to be rebuild) regardless of whether they were already installed. iIn docker-compose this issue would be solved using volumes. In kubernetes, if we use volumes in the following way:

apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- image: test:test
name: test-container
volumeMounts:
- mountPath: /usr/local/venv # this is the directory of the 
# virtualenv of python
    name: test-volume
volumes:
- name: test-volume
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

will this extra requirements build be resolved with skaffold?

-- Rajdeep Mukherjee
docker-compose
docker-volume
kubernetes
skaffold
volumes

3 Answers

8/29/2019

I can't speak for skaffold specifically but the container image build can be improved. If there is layer caching available then only reinstall the dependencies when your requirements.txt changes. This is documented in the "ADD or COPY" Best Practices.

FROM ubuntu:18.04
...
COPY requirements.txt /usr/local/app/
RUN pip install -r /usr/local/app/requirements.txt
COPY . /usr/local/app
...
CMD ...

You may need to trigger updates some times if the module versions are loosely defined and say you want a new patch version. I've found requirements should be specific so versions don't slide underneath your application without your knowledge/testing.

Kaniko in-cluster builds

For kaniko builds to make use of a cache in a cluster where there is no persistent storage by default, kaniko needs either a persistent volume mounted (--cache-dir) or a container image repo (--cache-repo) with the layers available.

-- Matt
Source: StackOverflow

8/28/2019

If your goal is to speed up the dev process: Instead of triggering an entirely new deployment process every time you change a line of code, you can switch to a sync-based dev process to deploy once and then update the files within the running containers when editing code.

Skaffold supports file sync to directly update files inside the deployed containers if you change them on your local machine. However, the docs state "File sync is alpha" (https://skaffold.dev/docs/how-tos/filesync/) and I can completely agree from working with it a while ago: The sync mechanism is only one-directional (no sync from container back to local) and pretty buggy, i.e. it crashes frequently when switching git branches, installing dependencies etc. which can be pretty annoying.

If you want a more stable alternative for sync-based Kubernetes development which is very easy to get started with, take a look at DevSpace: https://github.com/devspace-cloud/devspace

I am one of the maintainers of DevSpace and started the project because Skaffold was much too slow for our team and it did not have a file sync back then.

-- LukasGentele
Source: StackOverflow

8/29/2019

@Matt's answer is a great best practice (+1) - skaffold in and of itself won't solve the underlying layer cache invalidation issues which results in having to re-install the requirements during every build.

For additional performance, you can cache all the python packages in a volume mounted in your pod for example:

apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- image: test:test
  name: test-container
volumeMounts:
- mountPath: /usr/local/venv
    name: test-volume
- mountPath: /root/.cache/pip
    name: pip-cache
volumes:
- name: test-volume
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4
- name: pip-cache
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

That way if the build cache is ever invalidated and you have to re-install your requirements.txt you'd be saving some time by fetching them from cache.

If you're building with kaniko you can also cache base images to a persistent disk using the kaniko-warmer, for example:

...
volumeMounts:
...
- mountPath: /cache
    name: kaniko-warmer
volumes:
...
- name: kaniko-warmer
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

Running the kaniko-warmer inside the pod: docker run --rm -it -v /cache:/cache --entrypoint /kaniko/warmer gcr.io/kaniko-project/warmer --cache-dir=/cache --image=python:3.7-slim --image=nginx:1.17.3. Your skaffold.yaml might look something like:

apiVersion: skaffold/v1beta13
kind: Config
build:
  artifacts:
  - image: test:test
    kaniko:
      buildContext:
        localDir: {}
      cache:
        hostPath: /cache
  cluster:
    namespace: jx
    dockerConfig:
      secretName: jenkins-docker-cfg
  tagPolicy:
    envTemplate:
      template: '{{.DOCKER_REGISTRY}}/{{.IMAGE_NAME}}'
deploy:
  kubectl: {}
-- masseyb
Source: StackOverflow