im facing a pretty strange Problem.
First of all my setup: I got a private Gitlab server which uses Gitlab CI Runners on Kubernetes to build Docker Images. For that purpose i use the Kaniko Image. The Runners are provisioned by Gitlab itself with the built-in Kubernetes management. All that is running behind a PFSense server.
Now to my problem: Sometimes the Kaniko Pods cant resolve the Hostname of the GitLab server. This leads to failed git pull and so to a failed build. I would rate the chance to fail by 60%, wich is way too high for us. After retrying the build a few times, it will run without any problem.
The Kubernetes Cluster running the Gitlab CI is setup on CentOS 7. SELinux and FirewallD are disabled. All of the Hosts can resolve the GitLab Server. It is also not related to a specific Host Server, which is causing the problem. I have seen it fail on all of the 5 Servers including the Manager Server. Also i havent seen this problem appear in other Pods. But the other Deployments in the cluster dont really do connections via DNS. I am sure that the Runner is able to access DNS at all, because it is pulling the Kaniko Image from gcr.io.
Has anyone ever seen this problem or knows a workaround?
Here is my CI config:
build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo $REGISTRY_AUTH > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $REGISTRY_URL/$REGISTRY_IMAGE:$CI_JOB_ID
  only:
    - master
The following error happens:
Initialized empty Git repository in /builds/MYPROJECT/.git/
Fetching changes...
Created fresh repository.
fatal: unable to access 'https://gitlab-ci-token:[MASKED]@git.mydomain.com/MYPROJECT.git/': Could not resolve host: git.mydomain.comThere are a env for gitlab-runner that can solve this problem
- name: RUNNER_PRE_CLONE_SCRIPT
  value: "exec command before git fetch ..."for example: edit /etc/hosts
echo '127.0.0.1 git.demo.xxxx' >> /etc/hostsor edit /etc/resolv.conf
echo 'nameserver 8.8.8.8' > /etc/resolv.confhope it works for you
We had same issue for couple of days. We tried change CoreDNS config, move runners to different k8s cluster and so on. Finally today i checked my personal runner and found that i'm using different version. Runners in cluster had gitlab/gitlab-runner:alpine-v12.3.0, when mine had gitlab/gitlab-runner:alpine-v12.0.1. We added line
image: gitlab/gitlab-runner:alpine-v12.1.0in values.yaml and this solved problem for us