I depolied rke in air-gapped environment with below specification:
Nodes:
3 controller with etcd 2 workers
RKE version: v1.0.0
Docker version:
Client:
Debug Mode: false
Server:
Containers: 24
Running: 7
Paused: 0
Stopped: 17
Images: 4
Server Version: 19.03.1-ol
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: **************
runc version: ******
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.35-1902.8.4.el7uek.x86_64
Operating System: Oracle Linux Server 7.7
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.409GiB
Name: rke01.kuberlocal.co
ID:*******************************
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
registry.console:5000
127.0.0.0/8
Live Restore Enabled: false
Registries:
Operating system and kernel: (Oracle linux 7)
Red Hat Enterprise Linux Server release 7.7
4.14.35-1902.8.4.el7uek.x86_64
Type/provider of hosts: VirtualBox (test environment)
cluster.yml file:
If you intened to deploy Kubernetes in an air-gapped environment,
please consult the documentation on how to configure custom RKE images.
nodes:
address: rke01
port: "22"
internal_address: 192.168.40.11
role:
controlplane
etcd
hostname_override: ""
user: rke
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
address: rke02
port: "22"
internal_address: 192.168.40.17
role:
controlplane
etcd
hostname_override: ""
user: rke
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
address: rke03
port: "22"
internal_address: 192.168.40.13
role:
controlplane
etcd
hostname_override: ""
user: rke
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
address: rke04
port: "22"
internal_address: 192.168.40.14
role:
worker
hostname_override: ""
user: rke
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
address: rke05
port: "22"
internal_address: 192.168.40.15
role:
worker
hostname_override: ""
user: rke
docker_socket: /var/run/docker.sock
ssh_key: ""
ssh_key_path: ~/.ssh/id_rsa
ssh_cert: ""
ssh_cert_path: ""
labels: {}
taints: []
services:
etcd:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
external_urls: []
ca_cert: ""
cert: ""
key: ""
path: ""
uid: 0
gid: 0
snapshot: null
retention: ""
creation: ""
backup_config: null
kube-api:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
service_cluster_ip_range: 10.43.0.0/16
service_node_port_range: ""
pod_security_policy: false
always_pull_images: false
secrets_encryption_config: null
audit_log: null
admission_configuration: null
event_rate_limit: null
kube-controller:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
cluster_cidr: 10.42.0.0/16
service_cluster_ip_range: 10.43.0.0/16
scheduler:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
kubelet:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
cluster_domain: bmi.rke.cluster.local
infra_container_image: ""
cluster_dns_server: 10.43.0.10
fail_swap_on: false
generate_serving_certificate: false
kubeproxy:
image: ""
extra_args: {}
extra_binds: []
extra_env: []
network:
plugin: weave
weave_network_provider:
password: "********"
options: {}
node_selector: {}
authentication:
strategy: x509
sans: []
webhook: null
addons: ""
addons_include: []
system_images:
etcd: registry.console:5000/rancher/coreos-etcd:v3.3.15-rancher1
alpine: registry.console:5000/rancher/rke-tools:v0.1.51
nginx_proxy: registry.console:5000/rancher/rke-tools:v0.1.51
cert_downloader: registry.console:5000/rancher/rke-tools:v0.1.51
kubernetes_services_sidecar: registry.console:5000/rancher/rke-tools:v0.1.51
kubedns: registry.console:5000/rancher/k8s-dns-kube-dns:1.15.0
dnsmasq: registry.console:5000/rancher/k8s-dns-dnsmasq-nanny:1.15.0
kubedns_sidecar: registry.console:5000/rancher/k8s-dns-sidecar:1.15.0
kubedns_autoscaler: registry.console:5000/rancher/cluster-proportional-autoscaler:1.7.1
coredns: registry.console:5000/rancher/coredns-coredns:1.6.2
coredns_autoscaler: registry.console:5000/rancher/cluster-proportional-autoscaler:1.7.1
kubernetes: registry.console:5000/rancher/hyperkube:v1.16.3-rancher1
flannel: registry.console:5000/rancher/coreos-flannel:v0.11.0-rancher1
flannel_cni: registry.console:5000/rancher/flannel-cni:v0.3.0-rancher5
calico_node: registry.console:5000/rancher/calico-node:v3.8.1
calico_cni: registry.console:5000/rancher/calico-cni:v3.8.1
calico_controllers: registry.console:5000/rancher/calico-kube-controllers:v3.8.1
calico_ctl: ""
calico_flexvol: registry.console:5000/rancher/calico-pod2daemon-flexvol:v3.8.1
canal_node: registry.console:5000/rancher/calico-node:v3.8.1
canal_cni: registry.console:5000/rancher/calico-cni:v3.8.1
canal_flannel: registry.console:5000/rancher/coreos-flannel:v0.11.0
canal_flexvol: registry.console:5000/rancher/calico-pod2daemon-flexvol:v3.8.1
weave_node: registry.console:5000/weaveworks/weave-kube:2.5.2
weave_cni: registry.console:5000/weaveworks/weave-npc:2.5.2
pod_infra_container: registry.console:5000/rancher/pause:3.1
ingress: registry.console:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
ingress_backend: registry.console:5000/rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
metrics_server: registry.console:5000/rancher/metrics-server:v0.3.4
windows_pod_infra_container: rancher/kubelet-pause:v0.1.3
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
mode: rbac
options: {}
#ignore_docker_version: false
ignore_docker_version: true
kubernetes_version: ""
private_registries:
url: registry.console:5000
user: registry_user
password: ***********
is_default: true
ingress:
provider: ""
options: {}
node_selector: {}
extra_args: {}
dns_policy: ""
extra_envs: []
extra_volumes: []
extra_volume_mounts: []
cluster_name: ""
cloud_provider:
name: ""
prefix_path: "/opt/rke/"
addon_job_timeout: 30
bastion_host:
address: ""
port: ""
user: ""
ssh_key: ""
ssh_key_path: ""
ssh_cert: ""
ssh_cert_path: ""
monitoring:
provider: ""
options: {}
node_selector: {}
restore:
restore: false
snapshot_name: ""
dns:
provider: coredns
Steps to Reproduce:
rke -d up --config cluster.yml
Results:
INFO[0129] [sync] Successfully synced nodes Labels and Taints
DEBU[0129] Host: rke01 has role: controlplane
DEBU[0129] Host: rke01 has role: etcd
DEBU[0129] Host: rke03 has role: controlplane
DEBU[0129] Host: rke03 has role: etcd
DEBU[0129] Host: rke04 has role: worker
DEBU[0129] Host: rke05 has role: worker
INFO[0129] [network] Setting up network plugin: weave
INFO[0129] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0129] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0129] [addons] Executing deploy job rke-network-plugin
DEBU[0129] [k8s] waiting for job rke-network-plugin-deploy-job to complete..
FATA[0159] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system rke-network-plugin-deploy-job-4jgcq 0/1 Error 0 4m6s
kube-system rke-network-plugin-deploy-job-57jr8 0/1 Error 0 3m50s
kube-system rke-network-plugin-deploy-job-h2gr8 0/1 Error 0 90s
kube-system rke-network-plugin-deploy-job-p92br 0/1 Error 0 2m50s
kube-system rke-network-plugin-deploy-job-xrgpl 0/1 Error 0 4m1s
kube-system rke-network-plugin-deploy-job-zqhmk 0/1 Error 0 3m30s
kubectl describe pod rke-network-plugin-deploy-job-zqhmk -n kube-system
Name: rke-network-plugin-deploy-job-zqhmk
Namespace: kube-system
Priority: 0
Node: rke01/192.168.40.11
Start Time: Sun, 12 Jan 2020 09:40:00 +0330
Labels: controller-uid=*******************
job-name=rke-network-plugin-deploy-job
Annotations:
Status: Failed
IP: 192.168.40.11
IPs:
IP: 192.168.40.11
Controlled By: Job/rke-network-plugin-deploy-job
Containers:
rke-network-plugin-pod:
Container ID: docker://7658aecff174e4ac53caaf088782dab50654911065371cd0d8dcdd50b8fbef3b
Image: registry.console:5000/rancher/hyperkube:v1.16.3-rancher1
Image ID: docker-pullable://registry.console:5000/rancher/hyperkube@sha256:0a55590eb8453bcc46a4bdb8217a48cf56a7c7f7c52d72a267632ffa35b3b8c8
Port:
Host Port:
Command:
kubectl
apply
-f
/etc/config/rke-network-plugin.yaml
State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 12 Jan 2020 09:40:00 +0330
Finished: Sun, 12 Jan 2020 09:40:01 +0330
Ready: False
Restart Count: 0
Environment:
Mounts:
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from rke-job-deployer-token-9dt6n (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rke-network-plugin
Optional: false
rke-job-deployer-token-9dt6n:
Type: Secret (a volume populated by a Secret)
SecretName: rke-job-deployer-token-9dt6n
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations:
Events:
Type Reason Age From Message
Normal Pulled 4m10s kubelet, rke01 Container image "registry.console:5000/rancher/hyperkube:v1.16.3-rancher1" already present on machine
Normal Created 4m10s kubelet, rke01 Created container rke-network-plugin-pod
Normal Started 4m10s kubelet, rke01 Started container rke-network-plugin-pod
container logs: docker logs -f 267a894bb999
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: network is unreachable
network interfaces
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether *********** brd ff:ff:ff:ff:ff:ff
inet 192.168.40.11/24 brd 192.168.40.255 scope global dynamic enp0s8
valid_lft 847sec preferred_lft 847sec
inet6 ************* scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether *************** brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 ************* scope link
valid_lft forever preferred_lft forever
docker network ls
NETWORK ID NAME DRIVER SCOPE
c6063ba5a4d0 bridge bridge local
822441eae3cf host host local
314798c82599 none null local
is the issue related to network interfaces? if yes: how can i create it?
I had the same issue, and these two steps solved my problem.
addon_job_timeout
In my case, one of the nodes had DiskPressure
state
that's resolved by below command and I created a network interface:
docker network create --driver=bridge --subnet=10.43.0.0/16 br0_rke