I am deploying the following interactive pod
kubectl run -i -t centos7interactive2 --restart=Never --image=centos:7 /bin/bash
Then I try to curl my API server from within the pod
curl -k https://10.96.0.1:6443/api/v1
This fails (hangs) from a pod on chad:
[root@togo ~]# kubectl describe pod centos7interactive2
Name: centos7interactive2
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: chad.corp.sensis.com/10.93.98.23
Start Time: Tue, 26 Mar 2019 13:29:15 -0400
Labels: run=centos7interactive2
Annotations: <none>
Status: Running
IP: 10.96.2.7
Containers:
centos7interactive2:
Container ID: docker://8b7e301b8e8e2d091bdce641be81cc4dc1413ebab47889fec8102175d399e038
Image: centos:7
Image ID: docker-pullable://centos@sha256:8d487d68857f5bc9595793279b33d082b03713341ddec91054382641d14db861
Port: <none>
Host Port: <none>
Args:
/bin/bash
State: Running
Started: Tue, 26 Mar 2019 13:29:16 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k2vv5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-k2vv5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k2vv5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 56s default-scheduler Successfully assigned default/centos7interactive2 to chad.corp.sensis.com
Normal Pulled 55s kubelet, chad.corp.sensis.com Container image "centos:7" already present on machine
Normal Created 55s kubelet, chad.corp.sensis.com Created container
Normal Started 55s kubelet, chad.corp.sensis.com Started container
Nor can this pod ping 10.96.0.1
If I create the interactive centos pod again, it will be scheduled to qatar
[root@togo ~]# kubectl describe pod centos7interactive2
Name: centos7interactive2
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: qatar.corp.sensis.com/10.93.98.36
Start Time: Tue, 26 Mar 2019 13:36:23 -0400
Labels: run=centos7interactive2
Annotations: <none>
Status: Running
IP: 10.96.1.11
Containers:
centos7interactive2:
Container ID: docker://cfc95172944dcd4d643e68ff761f73d32ff1435d674769ddc38da44847a4af88
Image: centos:7
Image ID: docker-pullable://centos@sha256:8d487d68857f5bc9595793279b33d082b03713341ddec91054382641d14db861
Port: <none>
Host Port: <none>
Args:
/bin/bash
State: Running
Started: Tue, 26 Mar 2019 13:36:24 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k2vv5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-k2vv5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k2vv5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8s default-scheduler Successfully assigned default/centos7interactive2 to qatar.corp.sensis.com
Normal Pulled 7s kubelet, qatar.corp.sensis.com Container image "centos:7" already present on machine
Normal Created 7s kubelet, qatar.corp.sensis.com Created container
Normal Started 7s kubelet, qatar.corp.sensis.com Started container
In this case it has no problem pinging, or curling 10.96.0.1
[root@centos7interactive2 /]# curl -k https://10.96.0.1:6443/api/v1/
{
"kind": "APIResourceList",
"groupVersion": "v1",
"resources": [
{
"name": "bindings",
"singularName": "",
"namespaced": true,
"kind": "Binding",
"verbs": [
"create"
]
},
{
"name": "componentstatuses",
"singularName": "",
"namespaced": false,
"kind": "ComponentStatus",
"verbs": [
"get",
"list"
],
"shortNames": [
"cs"
]
},
{
"name": "configmaps",
"singularName": "",
"namespaced": true,
"kind": "ConfigMap",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"cm"
]
},
{
"name": "endpoints",
"singularName": "",
"namespaced": true,
"kind": "Endpoints",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"ep"
]
},
{
"name": "events",
"singularName": "",
"namespaced": true,
"kind": "Event",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"ev"
]
},
{
"name": "limitranges",
"singularName": "",
"namespaced": true,
"kind": "LimitRange",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"limits"
]
},
{
"name": "namespaces",
"singularName": "",
"namespaced": false,
"kind": "Namespace",
"verbs": [
"create",
"delete",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"ns"
]
},
{
"name": "namespaces/finalize",
"singularName": "",
"namespaced": false,
"kind": "Namespace",
"verbs": [
"update"
]
},
{
"name": "namespaces/status",
"singularName": "",
"namespaced": false,
"kind": "Namespace",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "nodes",
"singularName": "",
"namespaced": false,
"kind": "Node",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"no"
]
},
{
"name": "nodes/proxy",
"singularName": "",
"namespaced": false,
"kind": "NodeProxyOptions",
"verbs": [
"create",
"delete",
"get",
"patch",
"update"
]
},
{
"name": "nodes/status",
"singularName": "",
"namespaced": false,
"kind": "Node",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "persistentvolumeclaims",
"singularName": "",
"namespaced": true,
"kind": "PersistentVolumeClaim",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"pvc"
]
},
{
"name": "persistentvolumeclaims/status",
"singularName": "",
"namespaced": true,
"kind": "PersistentVolumeClaim",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "persistentvolumes",
"singularName": "",
"namespaced": false,
"kind": "PersistentVolume",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"pv"
]
},
{
"name": "persistentvolumes/status",
"singularName": "",
"namespaced": false,
"kind": "PersistentVolume",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "pods",
"singularName": "",
"namespaced": true,
"kind": "Pod",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"po"
],
"categories": [
"all"
]
},
{
"name": "pods/attach",
"singularName": "",
"namespaced": true,
"kind": "PodAttachOptions",
"verbs": [
"create",
"get"
]
},
{
"name": "pods/binding",
"singularName": "",
"namespaced": true,
"kind": "Binding",
"verbs": [
"create"
]
},
{
"name": "pods/eviction",
"singularName": "",
"namespaced": true,
"group": "policy",
"version": "v1beta1",
"kind": "Eviction",
"verbs": [
"create"
]
},
{
"name": "pods/exec",
"singularName": "",
"namespaced": true,
"kind": "PodExecOptions",
"verbs": [
"create",
"get"
]
},
{
"name": "pods/log",
"singularName": "",
"namespaced": true,
"kind": "Pod",
"verbs": [
"get"
]
},
{
"name": "pods/portforward",
"singularName": "",
"namespaced": true,
"kind": "PodPortForwardOptions",
"verbs": [
"create",
"get"
]
},
{
"name": "pods/proxy",
"singularName": "",
"namespaced": true,
"kind": "PodProxyOptions",
"verbs": [
"create",
"delete",
"get",
"patch",
"update"
]
},
{
"name": "pods/status",
"singularName": "",
"namespaced": true,
"kind": "Pod",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "podtemplates",
"singularName": "",
"namespaced": true,
"kind": "PodTemplate",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
]
},
{
"name": "replicationcontrollers",
"singularName": "",
"namespaced": true,
"kind": "ReplicationController",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"rc"
],
"categories": [
"all"
]
},
{
"name": "replicationcontrollers/scale",
"singularName": "",
"namespaced": true,
"group": "autoscaling",
"version": "v1",
"kind": "Scale",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "replicationcontrollers/status",
"singularName": "",
"namespaced": true,
"kind": "ReplicationController",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "resourcequotas",
"singularName": "",
"namespaced": true,
"kind": "ResourceQuota",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"quota"
]
},
{
"name": "resourcequotas/status",
"singularName": "",
"namespaced": true,
"kind": "ResourceQuota",
"verbs": [
"get",
"patch",
"update"
]
},
{
"name": "secrets",
"singularName": "",
"namespaced": true,
"kind": "Secret",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
]
},
{
"name": "serviceaccounts",
"singularName": "",
"namespaced": true,
"kind": "ServiceAccount",
"verbs": [
"create",
"delete",
"deletecollection",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"sa"
]
},
{
"name": "services",
"singularName": "",
"namespaced": true,
"kind": "Service",
"verbs": [
"create",
"delete",
"get",
"list",
"patch",
"update",
"watch"
],
"shortNames": [
"svc"
],
"categories": [
"all"
]
},
{
"name": "services/proxy",
"singularName": "",
"namespaced": true,
"kind": "ServiceProxyOptions",
"verbs": [
"create",
"delete",
"get",
"patch",
"update"
]
},
{
"name": "services/status",
"singularName": "",
"namespaced": true,
"kind": "Service",
"verbs": [
"get",
"patch",
"update"
]
}
]
}
In that case I have no problem getting to 10.96.0.1 Both nodes seem healthy, yet one consistently prevents my pods from getting to the master via its ClusterIP address
[root@togo work]# kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.93.98.204:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
My cluster seems healthy.
[root@togo work]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-86c58d9df4-jjgpn 1/1 Running 1 5d22h
kube-system pod/coredns-86c58d9df4-n6lcv 1/1 Running 1 5d22h
kube-system pod/etcd-togo.corp.sensis.com 1/1 Running 1 5d22h
kube-system pod/kube-apiserver-togo.corp.sensis.com 1/1 Running 1 5d22h
kube-system pod/kube-controller-manager-togo.corp.sensis.com 1/1 Running 1 5d22h
kube-system pod/kube-flannel-ds-amd64-6759k 1/1 Running 0 26h
kube-system pod/kube-flannel-ds-amd64-fxpv9 1/1 Running 1 5d22h
kube-system pod/kube-flannel-ds-amd64-n6zk9 1/1 Running 0 5d22h
kube-system pod/kube-flannel-ds-amd64-rbbms 1/1 Running 0 26h
kube-system pod/kube-flannel-ds-amd64-shqnr 1/1 Running 1 5d22h
kube-system pod/kube-flannel-ds-amd64-tqkgw 1/1 Running 0 26h
kube-system pod/kube-proxy-h9jpr 1/1 Running 1 5d22h
kube-system pod/kube-proxy-m567z 1/1 Running 0 26h
kube-system pod/kube-proxy-t6swp 1/1 Running 0 26h
kube-system pod/kube-proxy-tlfjd 1/1 Running 0 26h
kube-system pod/kube-proxy-vzdpl 1/1 Running 1 5d22h
kube-system pod/kube-proxy-xn5dv 1/1 Running 0 5d22h
kube-system pod/kube-scheduler-togo.corp.sensis.com 1/1 Running 1 5d22h
kube-system pod/tiller-deploy-5b7c66d59c-k9xkv 1/1 Running 1 5d22h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d22h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 5d22h
kube-system service/tiller-deploy ClusterIP 10.105.40.102 <none> 44134/TCP 5d22h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-flannel-ds-amd64 6 6 6 6 6 beta.kubernetes.io/arch=amd64 5d22h
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 5d22h
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 5d22h
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 5d22h
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 5d22h
kube-system daemonset.apps/kube-proxy 6 6 6 6 6 <none> 5d22h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 5d22h
kube-system deployment.apps/tiller-deploy 1/1 1 1 5d22h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-86c58d9df4 2 2 2 5d22h
kube-system replicaset.apps/tiller-deploy-5b7c66d59c 1 1 1 5d22h
[root@togo work]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
benin.corp.sensis.com Ready <none> 26h v1.13.4 10.93.97.123 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://18.9.3
chad.corp.sensis.com Ready <none> 5d22h v1.13.4 10.93.98.23 <none> CentOS Linux 7 (Core) 3.10.0-957.10.1.el7.x86_64 docker://18.9.3
qatar.corp.sensis.com Ready <none> 5d22h v1.13.4 10.93.98.36 <none> CentOS Linux 7 (Core) 3.10.0-957.10.1.el7.x86_64 docker://18.9.3
spain.corp.sensis.com Ready <none> 26h v1.13.4 10.93.103.236 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://18.9.3
togo.corp.sensis.com Ready master 5d22h v1.13.4 10.93.98.204 <none> CentOS Linux 7 (Core) 3.10.0-957.5.1.el7.x86_64 docker://18.9.3
tonga.corp.sensis.com Ready <none> 26h v1.13.4 10.93.97.202 <none> CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://18.9.3
A separate problem that I have is neither pod can request API at https://10.96.0.1:443, despite the following service seen above (I can curl 6443 directly however)
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
Can someone please help me to isolate these two problems
Thanks to @Janus Lenart, I've made progress.
I followed his suggestion and reset my cluster using a pod-network-cidr of 10.244.0.0/16. I am now able to get to the API service both using the public address and the cluster address
The fix was primarily to use the right pod-network-cidr. As Janus indicates, the default flannel yaml specifies this as 10.244.0.0/16.
kubeadm init --apiserver-advertise-address=10.93.98.204 --pod-network-cidr=10.244.0.0/16
The cluster config after adding a single node
[root@togo dsargrad]# kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.93.98.204:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
The default cluster services in all namespaces:
[root@togo dsargrad]# kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 22m
I then run centos7 in an interactive shell:
kubectl run -i -t centos7interactive2 --restart=Never --image=centos:7 /bin/bash
Then I try to curl the API server both at the cluster-ip (10.96.0.1:443) and at its public address (10.93.98.204:6443)
These connections succeed, however I do see a cert error.
On its public address
[root@centos7interactive2 /]# curl https://10.93.98.204:6443
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
And on its cluster address
[root@centos7interactive2 /]# curl https://10.96.0.1:443
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
Is this cert error expected? Or Have I missed a step?
First of all, ping is not expected to work with cluster IPs such as 10.96.0.1 as only specific TCP or UDP ports are forwarded, never ICMP traffic.
To aid your debugging efforts a little I can confirm that https://10.96.0.1:443
should work from any of your Pods (actually, from any of your Nodes as well). If you execute kubectl get ep kubernetes
it should show 10.93.98.204:6443
as target. And as you tested, you should be able to reach that as well (https://10.93.98.204:6443
) from your Pods and Nodes. Perhaps a firewall problem somewhere if you can't.
Secondly, there might be a problem with your overlay network setup. I've noticed that the Pods you started got IPs like 10.96.2.7 and 10.96.1.11 which indicates that the overlay network (flannel) is probably configured as 10.96.0.0/16. From the address of the Kubernetes service's IP (aka cluster IP), 10.96.0.1, however it seems that network too is configured 10.96.0.0/X. This is most likely wrong, there should be no overlap between the overlay network and the service network (aka cluster network). This is just a guess of course since there isn't enough information in your question (which is otherwise is well detailed and formatted!)
I'd suggest you start again from scratch as it is not trivial to reconfigure these network ranges.