I've been battling a kubernetes install problem. We started up a new openstack environment and the scripts that work in the old fail environment fail in the new one.
We are using K8s v1.5.4 using these scripts: https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic
CoreOS 1298.7.0
The master seems to come up fine. I can deploy pods to it, always shows ready
when running kubectl get nodes
The worker installation script runs, however it never shows a ready
state.
kubectl get nodes --show-labels
NAME STATUS AGE LABELS
MYIP.118.240.122 Ready,SchedulingDisabled 7m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.122
MYIP.118.240.129 NotReady 5m beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=MYIP.118.240.129
If I run kubectl describe node MYIP.118.240.129
I get the following:
(testtest)➜ dev kubectl describe node MYIP.118.240.129
Name: MYIP.118.240.129
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=MYIP.118.240.129
Taints: <none>
CreationTimestamp: Fri, 14 Apr 2017 15:27:47 -0600
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:27:47 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready Unknown Fri, 14 Apr 2017 15:27:47 -0600 Fri, 14 Apr 2017 15:28:29 -0600 NodeStatusUnknown Kubelet stopped posting node status.
Addresses: MYIP.118.240.129,MYIP.118.240.129,MYIP.118.240.129
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 2052924Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 1
memory: 2052924Ki
pods: 110
System Info:
Machine ID: efee03ac51c641888MYIP50dfa2a40350d
System UUID: 4467C959-37FE-48ED-A263-C36DD0D445F1
Boot ID: 50eb5e93-5aed-441b-b3ef-36da1472e4ea
Kernel Version: 4.9.16-coreos-r1
OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.5.4+coreos.0
Kube-Proxy Version: v1.5.4+coreos.0
ExternalID: MYIP.118.240.129
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system heapster-v1.2.0-216693398-sfz1m 50m (5%) 50m (5%) 90Mi (4%) 90Mi (4%)
kube-system kube-dns-782804071-psmfc 260m (26%) 0 (0%) 140Mi (6%) 220Mi (10%)
kube-system kube-dns-autoscaler-2715466192-jmb3h 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-MYIP.118.240.129 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-3543765157-w8zv2 100m (10%) 100m (10%) 50Mi (2%) 50Mi (2%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
430m (43%) 150m (15%) 290Mi (14%) 360Mi (17%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {kubelet MYIP.118.240.129} Normal Starting Starting kubelet.
11m 11m 1 {kubelet MYIP.118.240.129} Warning ImageGCFailed unable to find data for container /
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientDisk Node MYIP.118.240.129 status is now: NodeHasSufficientDisk
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasSufficientMemory Node MYIP.118.240.129 status is now: NodeHasSufficientMemory
11m 11m 2 {kubelet MYIP.118.240.129} Normal NodeHasNoDiskPressure Node MYIP.118.240.129 status is now: NodeHasNoDiskPressure
(testtest)➜ dev
All ports are open within this internal network between the worker and master.
If I run docker ps
on the worker I get:
ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c25cf12b43f3 quay.io/coreos/hyperkube:v1.5.4_coreos.0 "/hyperkube proxy --m" 4 minutes ago Up 4 minutes k8s_kube-proxy.96aded63_kube-proxy-MYIP.118.240.129_kube-system_23185d6abc4d5c8f11da2ca1943fd398_5ba9628a
c4d14dfd7d52 gcr.io/google_containers/pause-amd64:3.0 "/pause" 6 minutes ago Up 6 minutes k8s_POD.d8dbe16c_kube-proxy-MYIP.118.240.129_kube-
system_23185d6abc4d5c8f11da2ca1943fd398_e8a1c6d6
kubelet logs after running all weekend:
Apr 17 20:53:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:53:15.507939 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:48:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:48:15.484016 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:15 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:15.405888 1353 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: W0417 20:43:07.361035 1353 kubelet.go:1497] Deleting mirror pod "kube-proxy-MYIP.118.240.129_kube-system(37537fb7-2159-11e7-b692-fa163e952b1c)" because it is outdated
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.018406 1353 event.go:208] Unable to write event: 'Post https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/events: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer' (may retry after sleeping)
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017813 1353 reflector.go:188] pkg/kubelet/kubelet.go:386: Failed to list *api.Node: Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.017711 1353 reflector.go:188] pkg/kubelet/kubelet.go:378: Failed to list *api.Service: Get https://MYIP.118.240.122:443/api/v1/services?resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016457 1353 kubelet_node_status.go:302] Error updating node status, will retry: error getting node "MYIP.118.240.129": Get https://MYIP.118.240.122:443/api/v1/nodes?fieldSelector=metadata.name%3DMYIP.118.240.129: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.0161MYIP 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8ea63b2-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.016165356 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8ea63b2-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8ea63b2-2159-11e7-b692-fa163e952b1c" (UID: "e8ea63b2-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.016058 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015943 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec05331e-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015913703 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec05331e-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec05331e-2158-11e7-b692-fa163e952b1c" (UID: "ec05331e-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015843 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015732 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"e8fdcca4-2159-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.015656131 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/e8fdcca4-2159-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "e8fdcca4-2159-11e7-b692-fa163e952b1c" (UID: "e8fdcca4-2159-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015559 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.015429 1353 reflector.go:188] pkg/kubelet/config/apiserver.go:44: Failed to list *api.Pod: Get https://MYIP.118.240.122:443/api/v1/pods?fieldSelector=spec.nodeName%3DMYIP.118.240.129&resourceVersion=0: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012918 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec091be8-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012889039 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec091be8-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec091be8-2158-11e7-b692-fa163e952b1c" (UID: "ec091be8-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012820 1353 secret.go:197] Couldn't get secret kube-system/default-token-93sd7
Apr 17 20:43:07 philtest.openstacklocal kubelet-wrapper[1353]: E0417 20:43:07.012661 1353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7\" (\"ec09da25-2158-11e7-b692-fa163e952b1c\")" failed. No retries permitted until 2017-04-17 20:45:07.012630687 +0000 UTC (durationBeforeRetry 2m0s). Error: MountVolume.SetUp failed for volume "kubernetes.io/secret/ec09da25-2158-11e7-b692-fa163e952b1c-default-token-93sd7" (spec.Name: "default-token-93sd7") pod "ec09da25-2158-11e7-b692-fa163e952b1c" (UID: "ec09da25-2158-11e7-b692-fa163e952b1c") with: Get https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7: read tcp MYIP.118.240.129:50102->MYIP.118.240.122:443: read: connection reset by peer
If you notice in the logs the worker nodes it having trouble talking to the master node....
However if I ssh into the worker and run a command like:
core@philtest ~ $ curl https://MYIP.118.240.122:443/api/v1/namespaces/kube-system/secrets/default-token-93sd7 --insecure
Unauthorized
It's TLS so I didn't expect it to auth of course.
Any suggestions on how to debug this?
Thanks!
It turned out the problem was an inconsistent network setting for MTU in openstack. Packets > 1500 bytes or so were being dropped.
You need to check if you add your IP address in the SSL generation file (openssl.cnf) for the master. Try to recreate your certificate with the IP of your dns server too ( if you follow coreOS it's 10.3.0.1 ). Your openssl.cnf will look like this:
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
IP.1 = 10.3.0.1
IP.2 = PRIVATE_MASTER_IP
IP.3 = PUBLIC_MASTER_IP
You will also need to recreate certificate for node(s). After that delete secret from namespaces to automatically regenerate it. sources CoreOS docs