Pods cant see API at https://api_service_cluster_ip:443, some of my pods cant see API at https://api_service_cluster_ip:6443

3/26/2019

I am deploying the following interactive pod

kubectl run -i -t centos7interactive2 --restart=Never --image=centos:7 /bin/bash

Then I try to curl my API server from within the pod

curl -k https://10.96.0.1:6443/api/v1

This fails (hangs) from a pod on chad:

[root@togo ~]# kubectl describe pod centos7interactive2
Name:               centos7interactive2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               chad.corp.sensis.com/10.93.98.23
Start Time:         Tue, 26 Mar 2019 13:29:15 -0400
Labels:             run=centos7interactive2
Annotations:        <none>
Status:             Running
IP:                 10.96.2.7
Containers:
  centos7interactive2:
    Container ID:  docker://8b7e301b8e8e2d091bdce641be81cc4dc1413ebab47889fec8102175d399e038
    Image:         centos:7
    Image ID:      docker-pullable://centos@sha256:8d487d68857f5bc9595793279b33d082b03713341ddec91054382641d14db861
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/bash
    State:          Running
      Started:      Tue, 26 Mar 2019 13:29:16 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-k2vv5 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-k2vv5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-k2vv5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                           Message
  ----    ------     ----  ----                           -------
  Normal  Scheduled  56s   default-scheduler              Successfully assigned default/centos7interactive2 to chad.corp.sensis.com
  Normal  Pulled     55s   kubelet, chad.corp.sensis.com  Container image "centos:7" already present on machine
  Normal  Created    55s   kubelet, chad.corp.sensis.com  Created container
  Normal  Started    55s   kubelet, chad.corp.sensis.com  Started container

Nor can this pod ping 10.96.0.1

If I create the interactive centos pod again, it will be scheduled to qatar

[root@togo ~]# kubectl describe pod centos7interactive2
Name:               centos7interactive2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               qatar.corp.sensis.com/10.93.98.36
Start Time:         Tue, 26 Mar 2019 13:36:23 -0400
Labels:             run=centos7interactive2
Annotations:        <none>
Status:             Running
IP:                 10.96.1.11
Containers:
  centos7interactive2:
    Container ID:  docker://cfc95172944dcd4d643e68ff761f73d32ff1435d674769ddc38da44847a4af88
    Image:         centos:7
    Image ID:      docker-pullable://centos@sha256:8d487d68857f5bc9595793279b33d082b03713341ddec91054382641d14db861
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/bash
    State:          Running
      Started:      Tue, 26 Mar 2019 13:36:24 -0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-k2vv5 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-k2vv5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-k2vv5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                            Message
  ----    ------     ----  ----                            -------
  Normal  Scheduled  8s    default-scheduler               Successfully assigned default/centos7interactive2 to qatar.corp.sensis.com
  Normal  Pulled     7s    kubelet, qatar.corp.sensis.com  Container image "centos:7" already present on machine
  Normal  Created    7s    kubelet, qatar.corp.sensis.com  Created container
  Normal  Started    7s    kubelet, qatar.corp.sensis.com  Started container

In this case it has no problem pinging, or curling 10.96.0.1

[root@centos7interactive2 /]# curl -k https://10.96.0.1:6443/api/v1/
{
  "kind": "APIResourceList",
  "groupVersion": "v1",
  "resources": [
    {
      "name": "bindings",
      "singularName": "",
      "namespaced": true,
      "kind": "Binding",
      "verbs": [
        "create"
      ]
    },
    {
      "name": "componentstatuses",
      "singularName": "",
      "namespaced": false,
      "kind": "ComponentStatus",
      "verbs": [
        "get",
        "list"
      ],
      "shortNames": [
        "cs"
      ]
    },
    {
      "name": "configmaps",
      "singularName": "",
      "namespaced": true,
      "kind": "ConfigMap",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "cm"
      ]
    },
    {
      "name": "endpoints",
      "singularName": "",
      "namespaced": true,
      "kind": "Endpoints",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "ep"
      ]
    },
    {
      "name": "events",
      "singularName": "",
      "namespaced": true,
      "kind": "Event",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "ev"
      ]
    },
    {
      "name": "limitranges",
      "singularName": "",
      "namespaced": true,
      "kind": "LimitRange",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "limits"
      ]
    },
    {
      "name": "namespaces",
      "singularName": "",
      "namespaced": false,
      "kind": "Namespace",
      "verbs": [
        "create",
        "delete",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "ns"
      ]
    },
    {
      "name": "namespaces/finalize",
      "singularName": "",
      "namespaced": false,
      "kind": "Namespace",
      "verbs": [
        "update"
      ]
    },
    {
      "name": "namespaces/status",
      "singularName": "",
      "namespaced": false,
      "kind": "Namespace",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "nodes",
      "singularName": "",
      "namespaced": false,
      "kind": "Node",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "no"
      ]
    },
    {
      "name": "nodes/proxy",
      "singularName": "",
      "namespaced": false,
      "kind": "NodeProxyOptions",
      "verbs": [
        "create",
        "delete",
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "nodes/status",
      "singularName": "",
      "namespaced": false,
      "kind": "Node",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "persistentvolumeclaims",
      "singularName": "",
      "namespaced": true,
      "kind": "PersistentVolumeClaim",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "pvc"
      ]
    },
    {
      "name": "persistentvolumeclaims/status",
      "singularName": "",
      "namespaced": true,
      "kind": "PersistentVolumeClaim",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "persistentvolumes",
      "singularName": "",
      "namespaced": false,
      "kind": "PersistentVolume",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "pv"
      ]
    },
    {
      "name": "persistentvolumes/status",
      "singularName": "",
      "namespaced": false,
      "kind": "PersistentVolume",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "pods",
      "singularName": "",
      "namespaced": true,
      "kind": "Pod",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "po"
      ],
      "categories": [
        "all"
      ]
    },
    {
      "name": "pods/attach",
      "singularName": "",
      "namespaced": true,
      "kind": "PodAttachOptions",
      "verbs": [
        "create",
        "get"
      ]
    },
    {
      "name": "pods/binding",
      "singularName": "",
      "namespaced": true,
      "kind": "Binding",
      "verbs": [
        "create"
      ]
    },
    {
      "name": "pods/eviction",
      "singularName": "",
      "namespaced": true,
      "group": "policy",
      "version": "v1beta1",
      "kind": "Eviction",
      "verbs": [
        "create"
      ]
    },
    {
      "name": "pods/exec",
      "singularName": "",
      "namespaced": true,
      "kind": "PodExecOptions",
      "verbs": [
        "create",
        "get"
      ]
    },
    {
      "name": "pods/log",
      "singularName": "",
      "namespaced": true,
      "kind": "Pod",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "pods/portforward",
      "singularName": "",
      "namespaced": true,
      "kind": "PodPortForwardOptions",
      "verbs": [
        "create",
        "get"
      ]
    },
    {
      "name": "pods/proxy",
      "singularName": "",
      "namespaced": true,
      "kind": "PodProxyOptions",
      "verbs": [
        "create",
        "delete",
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "pods/status",
      "singularName": "",
      "namespaced": true,
      "kind": "Pod",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "podtemplates",
      "singularName": "",
      "namespaced": true,
      "kind": "PodTemplate",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ]
    },
    {
      "name": "replicationcontrollers",
      "singularName": "",
      "namespaced": true,
      "kind": "ReplicationController",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "rc"
      ],
      "categories": [
        "all"
      ]
    },
    {
      "name": "replicationcontrollers/scale",
      "singularName": "",
      "namespaced": true,
      "group": "autoscaling",
      "version": "v1",
      "kind": "Scale",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "replicationcontrollers/status",
      "singularName": "",
      "namespaced": true,
      "kind": "ReplicationController",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "resourcequotas",
      "singularName": "",
      "namespaced": true,
      "kind": "ResourceQuota",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "quota"
      ]
    },
    {
      "name": "resourcequotas/status",
      "singularName": "",
      "namespaced": true,
      "kind": "ResourceQuota",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "secrets",
      "singularName": "",
      "namespaced": true,
      "kind": "Secret",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ]
    },
    {
      "name": "serviceaccounts",
      "singularName": "",
      "namespaced": true,
      "kind": "ServiceAccount",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "sa"
      ]
    },
    {
      "name": "services",
      "singularName": "",
      "namespaced": true,
      "kind": "Service",
      "verbs": [
        "create",
        "delete",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "svc"
      ],
      "categories": [
        "all"
      ]
    },
    {
      "name": "services/proxy",
      "singularName": "",
      "namespaced": true,
      "kind": "ServiceProxyOptions",
      "verbs": [
        "create",
        "delete",
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "services/status",
      "singularName": "",
      "namespaced": true,
      "kind": "Service",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    }
  ]
}

In that case I have no problem getting to 10.96.0.1 Both nodes seem healthy, yet one consistently prevents my pods from getting to the master via its ClusterIP address

[root@togo work]# kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://10.93.98.204:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

My cluster seems healthy.

[root@togo work]# kubectl get all --all-namespaces
NAMESPACE     NAME                                               READY   STATUS    RESTARTS   AGE
kube-system   pod/coredns-86c58d9df4-jjgpn                       1/1     Running   1          5d22h
kube-system   pod/coredns-86c58d9df4-n6lcv                       1/1     Running   1          5d22h
kube-system   pod/etcd-togo.corp.sensis.com                      1/1     Running   1          5d22h
kube-system   pod/kube-apiserver-togo.corp.sensis.com            1/1     Running   1          5d22h
kube-system   pod/kube-controller-manager-togo.corp.sensis.com   1/1     Running   1          5d22h
kube-system   pod/kube-flannel-ds-amd64-6759k                    1/1     Running   0          26h
kube-system   pod/kube-flannel-ds-amd64-fxpv9                    1/1     Running   1          5d22h
kube-system   pod/kube-flannel-ds-amd64-n6zk9                    1/1     Running   0          5d22h
kube-system   pod/kube-flannel-ds-amd64-rbbms                    1/1     Running   0          26h
kube-system   pod/kube-flannel-ds-amd64-shqnr                    1/1     Running   1          5d22h
kube-system   pod/kube-flannel-ds-amd64-tqkgw                    1/1     Running   0          26h
kube-system   pod/kube-proxy-h9jpr                               1/1     Running   1          5d22h
kube-system   pod/kube-proxy-m567z                               1/1     Running   0          26h
kube-system   pod/kube-proxy-t6swp                               1/1     Running   0          26h
kube-system   pod/kube-proxy-tlfjd                               1/1     Running   0          26h
kube-system   pod/kube-proxy-vzdpl                               1/1     Running   1          5d22h
kube-system   pod/kube-proxy-xn5dv                               1/1     Running   0          5d22h
kube-system   pod/kube-scheduler-togo.corp.sensis.com            1/1     Running   1          5d22h
kube-system   pod/tiller-deploy-5b7c66d59c-k9xkv                 1/1     Running   1          5d22h

NAMESPACE     NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default       service/kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP         5d22h
kube-system   service/kube-dns        ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP   5d22h
kube-system   service/tiller-deploy   ClusterIP   10.105.40.102   <none>        44134/TCP       5d22h

NAMESPACE     NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
kube-system   daemonset.apps/kube-flannel-ds-amd64     6         6         6       6            6           beta.kubernetes.io/arch=amd64     5d22h
kube-system   daemonset.apps/kube-flannel-ds-arm       0         0         0       0            0           beta.kubernetes.io/arch=arm       5d22h
kube-system   daemonset.apps/kube-flannel-ds-arm64     0         0         0       0            0           beta.kubernetes.io/arch=arm64     5d22h
kube-system   daemonset.apps/kube-flannel-ds-ppc64le   0         0         0       0            0           beta.kubernetes.io/arch=ppc64le   5d22h
kube-system   daemonset.apps/kube-flannel-ds-s390x     0         0         0       0            0           beta.kubernetes.io/arch=s390x     5d22h
kube-system   daemonset.apps/kube-proxy                6         6         6       6            6           <none>                            5d22h

NAMESPACE     NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns         2/2     2            2           5d22h
kube-system   deployment.apps/tiller-deploy   1/1     1            1           5d22h

NAMESPACE     NAME                                       DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-86c58d9df4         2         2         2       5d22h
kube-system   replicaset.apps/tiller-deploy-5b7c66d59c   1         1         1       5d22h
[root@togo work]# kubectl get nodes -o wide
NAME                    STATUS   ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
benin.corp.sensis.com   Ready    <none>   26h     v1.13.4   10.93.97.123    <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64        docker://18.9.3
chad.corp.sensis.com    Ready    <none>   5d22h   v1.13.4   10.93.98.23     <none>        CentOS Linux 7 (Core)   3.10.0-957.10.1.el7.x86_64   docker://18.9.3
qatar.corp.sensis.com   Ready    <none>   5d22h   v1.13.4   10.93.98.36     <none>        CentOS Linux 7 (Core)   3.10.0-957.10.1.el7.x86_64   docker://18.9.3
spain.corp.sensis.com   Ready    <none>   26h     v1.13.4   10.93.103.236   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64        docker://18.9.3
togo.corp.sensis.com    Ready    master   5d22h   v1.13.4   10.93.98.204    <none>        CentOS Linux 7 (Core)   3.10.0-957.5.1.el7.x86_64    docker://18.9.3
tonga.corp.sensis.com   Ready    <none>   26h     v1.13.4   10.93.97.202    <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64        docker://18.9.3

A separate problem that I have is neither pod can request API at https://10.96.0.1:443, despite the following service seen above (I can curl 6443 directly however)

default       service/kubernetes      ClusterIP   10.96.0.1       <none>        443/TCP

Can someone please help me to isolate these two problems

  1. Why can chad not get to https://10.96.0.1:6443
  2. Why can neither chad nor qatar get to https://10.96.0.1:443
-- Dave Sargrad
kubernetes

2 Answers

3/27/2019

Thanks to @Janus Lenart, I've made progress.

I followed his suggestion and reset my cluster using a pod-network-cidr of 10.244.0.0/16. I am now able to get to the API service both using the public address and the cluster address

The fix was primarily to use the right pod-network-cidr. As Janus indicates, the default flannel yaml specifies this as 10.244.0.0/16.

kubeadm init --apiserver-advertise-address=10.93.98.204 --pod-network-cidr=10.244.0.0/16

The cluster config after adding a single node

[root@togo dsargrad]# kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://10.93.98.204:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

The default cluster services in all namespaces:

[root@togo dsargrad]# kubectl get svc --all-namespaces
NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP         23m
kube-system   kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP   22m

I then run centos7 in an interactive shell:

kubectl run -i -t centos7interactive2 --restart=Never --image=centos:7 /bin/bash

Then I try to curl the API server both at the cluster-ip (10.96.0.1:443) and at its public address (10.93.98.204:6443)

These connections succeed, however I do see a cert error.

On its public address

[root@centos7interactive2 /]#  curl https://10.93.98.204:6443
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

And on its cluster address

[root@centos7interactive2 /]# curl https://10.96.0.1:443
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

Is this cert error expected? Or Have I missed a step?

-- Dave Sargrad
Source: StackOverflow

3/26/2019

First of all, ping is not expected to work with cluster IPs such as 10.96.0.1 as only specific TCP or UDP ports are forwarded, never ICMP traffic.

To aid your debugging efforts a little I can confirm that https://10.96.0.1:443 should work from any of your Pods (actually, from any of your Nodes as well). If you execute kubectl get ep kubernetes it should show 10.93.98.204:6443 as target. And as you tested, you should be able to reach that as well (https://10.93.98.204:6443) from your Pods and Nodes. Perhaps a firewall problem somewhere if you can't.

Secondly, there might be a problem with your overlay network setup. I've noticed that the Pods you started got IPs like 10.96.2.7 and 10.96.1.11 which indicates that the overlay network (flannel) is probably configured as 10.96.0.0/16. From the address of the Kubernetes service's IP (aka cluster IP), 10.96.0.1, however it seems that network too is configured 10.96.0.0/X. This is most likely wrong, there should be no overlap between the overlay network and the service network (aka cluster network). This is just a guess of course since there isn't enough information in your question (which is otherwise is well detailed and formatted!)

I'd suggest you start again from scratch as it is not trivial to reconfigure these network ranges.

-- Janos Lenart
Source: StackOverflow