I have a Kubernetes cluster deployed with Kops in AWS, not using EKS (too expensive for now). The CNI is Calico. I have 4 nodes:
They are labelled.
I have a Gitlab kubernetes runner installed with this Helm chart, it's configured to run on the big worker. It's also configured to spawn runners on the big worker as well.
It was working well, but since a few days I noticed that sometimes (if not each time) the first job of a pipeline will cause an error in the pod calico-node running on the big node. The error is:
bird: Mesh_10_1_1_136: Socket error: bind: Address not available
bird: Mesh_10_1_1_212: Socket error: bind: Address not available
bird: Mesh_10_1_1_14: Socket error: bind: Address not available
10.1.1.136 212 and 14 being the ips of the master and the 2 smaller nodes. IP of big node is never appearing.
So my questions are:
Thank you very much in advance. Cheers.
EDIT I've found these lines in the calico-node logs:
2021-05-24 22:43:26.881 [INFO][49] monitor-addresses/startup.go 576: Node IPv4 changed, will check for conflicts
2021-05-24 22:43:26.907 [WARNING][49] monitor-addresses/startup.go 1107: IPv4 address has changed. This could happen if there are multiple nodes with the same name. node="ip-10-1-1-198.eu-west-1.compute.internal" original="10.1.1.198" updated="192.168.0.1"
2021-05-24 22:43:26.936 [INFO][45] confd/client.go 877: Recompute BGP peerings: HostBGPConfig(node=ip-10-1-1-198.eu-west-1.compute.internal; name=ip_addr_v4) updated; HostBGPConfig(node=ip-10-1-1-198.eu-west-1.compute.internal; name=network_v4) updated
2021-05-24 22:43:26.937 [INFO][53] felix/int_dataplane.go 1325: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"ip-10-1-1-198.eu-west-1.compute.internal" ipv4_addr:"192.168.0.1"
2021-05-24 22:43:26.937 [INFO][53] felix/int_dataplane.go 1453: Applying dataplane updates
2021-05-24 22:43:26.937 [INFO][53] felix/ipip_mgr.go 222: All-hosts IP set out-of sync, refreshing it.
2021-05-24 22:43:26.937 [INFO][53] felix/ipsets.go 119: Queueing IP set for creation family="inet" setID="all-hosts-net" setType="hash:net"
2021-05-24 22:43:26.941 [INFO][49] monitor-addresses/startup.go 308: Updated node IP addresses
2021-05-24 22:43:26.951 [INFO][53] felix/ipsets.go 749: Doing full IP set rewrite family="inet" numMembersInPendingReplace=4 setID="all-hosts-net"
bird: Mesh_10_1_1_212: Received: Peer de-configured
bird: Mesh_10_1_1_212: State changed to stop
bird: Mesh_10_1_1_212: State changed to down
bird: Mesh_10_1_1_212: Starting
bird: Mesh_10_1_1_212: State changed to start
2021-05-24 22:43:26.971 [INFO][53] felix/int_dataplane.go 1467: Finished applying updates to dataplane. msecToApply=33.685021
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Reconfigured
2021-05-24 22:43:26.984 [INFO][45] confd/resource.go 277: Target config /etc/calico/confd/config/bird6.cfg has been updated due to change in key: /calico/bgp/v1/host
bird: Reconfiguration requested by SIGHUP
bird: Reconfiguring
bird: device1: Reconfigured
bird: direct1: Reconfigured
bird: Restarting protocol Mesh_10_1_1_136
bird: Mesh_10_1_1_136: Shutting down
bird: Mesh_10_1_1_136: State changed to stop
bird: Restarting protocol Mesh_10_1_1_14
bird: Mesh_10_1_1_14: Shutting down
bird: Mesh_10_1_1_14: State changed to stop
bird: Restarting protocol Mesh_10_1_1_212
bird: Mesh_10_1_1_212: Shutting down
bird: Mesh_10_1_1_212: State changed to stop
bird: Mesh_10_1_1_212: State changed to down
bird: Mesh_10_1_1_212: Initializing
bird: Mesh_10_1_1_212: Starting
bird: Mesh_10_1_1_212: State changed to start
bird: Mesh_10_1_1_136: State changed to down
bird: Mesh_10_1_1_136: Initializing
bird: Mesh_10_1_1_136: Starting
bird: Mesh_10_1_1_136: State changed to start
bird: Mesh_10_1_1_14: State changed to down
bird: Mesh_10_1_1_14: Initializing
bird: Mesh_10_1_1_14: Starting
bird: Mesh_10_1_1_14: State changed to start
bird: Reconfigured
EDIT 2 The gitlab step that provoque this error is the following:
integrationtesting:
tags:
- kubernetes
image: docker/compose:alpine-1.29.2
stage: tests
before_script:
- echo "NPM_TOKEN=$NPM_TOKEN" > test_integ/dependencies/.env
- docker-compose -f test_integ/dependencies/docker-compose.yaml up --build -d
script:
- docker-compose -f test_integ/dependencies/tester-compose.yaml up --build --abort-on-container-exit --exit-code-from tester
after_script:
- docker-compose -f test_integ/dependencies/docker-compose.yaml -f test_integ/dependencies/tester-compose.yaml down
With:
docker-compose.yaml
version: "3.9"
networks:
testinteg:
name: testinteg
services:
mongosrv:
container_name: "mongosrv"
image: mongo
networks:
- testinteg
users:
container_name: "users"
build:
context: "../.."
dockerfile: "Dockerfile"
target: run
args:
NPM_TOKEN: "${NPM_TOKEN}"
network: host
environment:
NODE_ENV: "dev"
PORT: 80
LOG_LEVEL: "debug"
LOG_FORMAT: "splat,simple"
PASSWORD_JWT_SECRET: "anothersecurestring"
PASSWORD_JWT_TTL: "30s"
SSL_ENABLED: "false"
MOCK_DB: "false"
MONGO_DB: "users"
MONGO_HOST: "mongosrv"
depends_on:
- mongosrv
networks:
- testinteg
test-compose.yaml
version: "3.9"
networks:
testinteg:
name: testinteg
services:
tester:
container_name: "tester"
build:
context: "../.."
dockerfile: "Dockerfile"
target: testinteg
args:
NPM_TOKEN: "${NPM_TOKEN}"
network: host
environment:
MSHOST: users
MSPORT: 80
volumes:
- ../tests:/app/test_integ/tests
networks:
- testinteg
Final information: the Dockerfile is running npm ci
against a private npm registry at Jfrog Artifactory.
Without the network: host
option in the build
section it fails to resolve domain (Docker in docker issue).