Big title, I know, but it is a very specific issue.
I'm creating a new Jenkins cluster, and trying to use Docker-in-Docker containers to build images, differently from the current Jenkins cluster that uses that ugly-as-hell /var/run/docker.sock
. The context of the things being built is a monorepo with some Dockerfile
s, with builds running in parallel.
The problem is, when building huge layers (for example, after an yarn install
that downloads half of the internet), the step hangs in that Done in XX.XXs
and does not goes to the next step, whatever it is.
Sometimes the build passes successfully (generally when I change something in the cluster), but the next ones hangs forever. When it passes, I can build 8 nodejs images in ~28min, but the next ones times out after 60min.
Here follows some code to show how I'm doing this. All the other images have the same template than the provided one.
Jenkins pod template:
apiVersion: "v1"
kind: "Pod"
metadata:
labels:
name: "jnlp"
jenkins/jenkins-jenkins-agent: "true"
spec:
containers:
- env:
- name: "DOCKER_HOST"
value: "tcp://localhost:2375"
image: "12345678910.dkr.ecr.us-east-1.amazonaws.com/kubernetes-agent:2.0" # internal image
imagePullPolicy: "IfNotPresent"
name: "jnlp"
resources:
limits:
cpu: "1000m"
memory: "1Gi"
requests:
cpu: "500m"
memory: "500Mi"
tty: true
volumeMounts:
- mountPath: "/home/jenkins"
name: "workspace-volume"
readOnly: false
workingDir: "/home/jenkins"
- args:
- "--tls=false"
env:
- name: "DOCKER_BUILDKIT"
value: "1"
- name: "DOCKER_TLS_CERTDIR"
value: ""
- name: "DOCKER_DRIVER"
value: "overlay2"
image: "docker:20.10.12-dind-alpine3.15"
imagePullPolicy: "IfNotPresent"
name: "docker"
resources:
limits:
memory: "4Gi"
cpu: "2"
requests:
memory: "1Gi"
cpu: "500m"
securityContext:
privileged: true
tty: true
volumeMounts:
- mountPath: "/var/lib/docker"
name: "docker"
readOnly: false
- mountPath: "/home/jenkins"
name: "workspace-volume"
readOnly: false
workingDir: "/home/jenkins"
nodeSelector:
spot: "true"
restartPolicy: "Never"
volumes:
- emptyDir:
medium: ""
name: "docker"
- emptyDir:
medium: ""
name: "workspace-volume"
Dockerfile
# We don't use alpine image due to dependency issues
FROM node:12.14.1-stretch-slim as base
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get -y install --no-install-recommends \
apt-utils build-essential bzip2 ca-certificates cron curl g++ git libfontconfig make python \
&& update-ca-certificates \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /tmp/* /var/tmp/* \
&& rm -f /var/log/alternatives.log /var/log/apt/* \
&& rm -rf /var/lib/apt/lists/* \
&& rm /var/cache/debconf/*-old
ENV NODE_ENV development
# Put here, to optimize caching
EXPOSE 8043
WORKDIR /opt/app
RUN chown -R node:node /opt/app
USER node
COPY --chown=node:node package.json yarn.lock .yarnclean /opt/app/
COPY 100-wkhtmltoimage-special.conf /etc/fonts/conf.d/
RUN yarn config set network-timeout 600000 -g && \
yarn --frozen-lockfile && \
yarn autoclean --force && \
yarn cache clean
FROM base as dev
# --debug and inspect port
EXPOSE 5858 9229
COPY --chown=node:node . /opt/app
RUN npx gulp build && sh ./app-ssl
FROM base as prod
COPY --from=dev /opt/app /opt/app
# Like `npm prune --production`
RUN yarn --production --ignore-scripts --prefer-offline
CMD ["yarn", "start"]
The command:
docker build \
--network host --force-rm \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--cache-from 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:latest \
--cache-from 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:latest-dev \
--cache-from 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:${VERSION} \
--cache-from 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:${VERSION}-dev \
--tag 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:${VERSION}-dev \
--tag 12345678910.dkr.ecr.us-east-1.amazonaws.com/name-of-my-image:latest-dev \
--target dev .
The end of the log:
...
[2022-01-18T19:37:19.928Z] [4/5] Building fresh packages...
[2022-01-18T19:37:19.928Z] [5/5] Cleaning modules...
[2022-01-18T19:37:34.774Z] Done in 486.04s.
[2022-01-18T19:37:34.774Z] yarn autoclean v1.21.1
[2022-01-18T19:37:34.774Z] [1/1] Cleaning modules...
[2022-01-18T19:37:46.952Z] info Removed 0 files
[2022-01-18T19:37:46.952Z] info Saved 0 MB.
[2022-01-18T19:37:46.952Z] Done in 12.85s.
[2022-01-18T19:37:46.952Z] yarn cache v1.21.1
[2022-01-18T19:38:13.453Z] success Cleared cache.
[2022-01-18T19:38:13.453Z] Done in 24.21s.
[2022-01-18T20:28:51.170Z] make: *** [Makefile:21: build-dev] Terminated <=== Pipeline reaches timeout! Look how long it hangs from the previous line.
script returned exit code 2
If anyone needs any more information, please let me know. Thanks!