I am using kubernetes 1.17 on CentOS 7 with flannel:v0.11.0 and having issues with reachibility of my CLUSTER-IPs from the control plane.
I installed and setup the cluster manually with kubeadm.
This is basically my cluster:
k8s-master-01 10.0.0.50/24
k8s-worker-01 10.0.0.60/24
k8s-worker-02 10.0.0.61/24
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
Hint: each node has two NICs (eth0: uplink, eth1: private) The above listed IPs are assigned to eth1 each. kubelet, kube-proxy and flannel are configured to send/receive their traffic through private network on eth1.
I faced the problem the first time when i tried to provide the metric-server
api through kube-apiserver. I followed the instructions from here. It seems that the control plane isn't able to communicate with the service network properly.
So here are my pods of kube-system
namespace:
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6955765f44-jrbs6 0/1 Running 9 24d 10.244.0.30 k8s-master-01 <none> <none>
coredns-6955765f44-mwn2l 1/1 Running 8 24d 10.244.1.37 k8s-worker-01 <none> <none>
etcd-k8s-master-01 1/1 Running 9 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-apiserver-k8s-master-01 1/1 Running 0 2m26s 10.0.0.50 k8s-master-01 <none> <none>
kube-controller-manager-k8s-master-01 1/1 Running 15 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-7d6jq 1/1 Running 11 26d 10.0.0.60 k8s-worker-01 <none> <none>
kube-flannel-ds-amd64-c5rj2 1/1 Running 11 26d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-dsg6l 1/1 Running 11 26d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-mrz9v 1/1 Running 10 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-proxy-slt95 1/1 Running 9 24d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-txlrp 1/1 Running 9 24d 10.0.0.60 k8s-worker-01 <none> <none>
kube-scheduler-k8s-master-01 1/1 Running 14 24d 10.0.0.50 k8s-master-01 <none> <none>
metrics-server-67684d476-mrvj2 1/1 Running 2 7d23h 10.244.2.43 k8s-worker-02 <none> <none>
So here are my services:
kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d <none>
default phpdemo ClusterIP 10.96.52.157 <none> 80/TCP 11d app=phpdemo
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 26d k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.96.71.138 <none> 443/TCP 5d3h k8s-app=metrics-server
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.99.136.237 <none> 8000/TCP 23d k8s-app=dashboard-metrics-scraper
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.97.209.113 <none> 443/TCP 23d k8s-app=kubernetes-dashboard
Metric API doesn't work due to failed connection checks:
$ kubectl describe apiservice v1beta1.metrics.k8s.io
...
Status:
Conditions:
Last Transition Time: 2019-12-27T21:25:01Z
Message: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type:
kube-apiserver
doesn't get a connection:
$ kubectl logs --tail=20 kube-apiserver-k8s-master-01 -n kube-system
...
I0101 22:27:00.712413 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W0101 22:27:00.712514 1 handler_proxy.go:97] no RequestInfo found in the context
E0101 22:27:00.712559 1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0101 22:27:00.712591 1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0101 22:27:04.712991 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:09.714801 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:34.709557 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:39.714173 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I tried to figure out what happens on kube-apiserver
and finally could confirm the problem. I get a delayed response after >60s (unfortunately time
is not installed)
$ kubectl exec -it kube-apiserver-k8s-master-01 -n kube-system -- /bin/sh
# echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close
The same command succeeds from two of my own test pods (respectively from two different worker nodes). So service IPs are reachable from my pod network on the worker nodes:
$ kubectl exec -it phpdemo-55858f97c4-fjc6q -- /bin/sh
/usr/local/bin # echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Wed, 01 Jan 2020 22:53:44 GMT
Content-Length: 212
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"","reason":"Forbidden","details":{},"code":403}
And from the worker node aswell:
[root@k8s-worker-02 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
real 0m0.146s
user 0m0.048s
sys 0m0.089s
This doesnt work on my master node. I get a delayed response after >60s
[root@k8s-master-01 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
real 1m3.248s
user 0m0.061s
sys 0m0.079s
From the master node I can see lots of unreplied SYN_SENT packets.
[root@k8s-master-01 ~ ] conntrack -L -d 10.96.71.138
tcp 6 75 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48550 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=19813 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48287 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=23710 mark=0 use=1
tcp 6 40 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48422 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=24286 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48286 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=35030 mark=0 use=1
tcp 6 80 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48574 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=40636 mark=0 use=1
tcp 6 50 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48464 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=65512 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48290 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=47617 mark=0 use=1
iptables are set:
[root@k8s-master-01 ~ ] iptables-save | grep 10.96.71.138
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ
kube-proxy is up and running on each node without errors.
$ kubectl get pods -A -o wide
...
kube-system kube-proxy-mrz9v 1/1 Running 10 21d 10.0.0.50 k8s-master-01 <none> <none>
kube-system kube-proxy-slt95 1/1 Running 9 21d 10.0.0.61 k8s-worker-02 <none> <none>
kube-system kube-proxy-txlrp 1/1 Running 9 21d 10.0.0.60 k8s-worker-01 <none> <none>
$ kubectl -n kube-system logs kube-proxy-mrz9v
W0101 21:31:14.268698 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:14.283958 1 node.go:135] Successfully retrieved node IP: 10.0.0.50
I0101 21:31:14.284034 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:14.284624 1 server.go:571] Version: v1.17.0
I0101 21:31:14.286031 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:14.286093 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:14.287207 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:14.298760 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:14.298984 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:14.300618 1 config.go:313] Starting service config controller
I0101 21:31:14.300665 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:14.300720 1 config.go:131] Starting endpoints config controller
I0101 21:31:14.300740 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.400864 1 shared_informer.go:204] Caches are synced for service config
I0101 21:31:14.401021 1 shared_informer.go:204] Caches are synced for endpoints config
> kubectl -n kube-system logs kube-proxy-slt95
W0101 21:31:13.856897 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.905653 1 node.go:135] Successfully retrieved node IP: 10.0.0.61
I0101 21:31:13.905704 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.906370 1 server.go:571] Version: v1.17.0
I0101 21:31:13.906983 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.907032 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.907413 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.912221 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.912321 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.915322 1 config.go:313] Starting service config controller
I0101 21:31:13.915353 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.915755 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.915779 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.016995 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:14.017115 1 shared_informer.go:204] Caches are synced for service config
> kubectl -n kube-system logs kube-proxy-txlrp
W0101 21:31:13.552518 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.696793 1 node.go:135] Successfully retrieved node IP: 10.0.0.60
I0101 21:31:13.696846 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.697396 1 server.go:571] Version: v1.17.0
I0101 21:31:13.698000 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.698101 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.698509 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.704280 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.704467 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.704888 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.704935 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:13.705046 1 config.go:313] Starting service config controller
I0101 21:31:13.705059 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.806299 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:13.806430 1 shared_informer.go:204] Caches are synced for service config
Here are my (default) kube-proxy
settings:
$ kubectl -n kube-system get configmap kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.0.0.50:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2019-12-06T22:07:40Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "185"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: bac4a8df-e318-4c91-a6ed-9305e58ac6d9
$ kubectl -n kube-system get daemonset kube-proxy -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "2"
creationTimestamp: "2019-12-06T22:07:40Z"
generation: 2
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "115436"
selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy
uid: 64a53d29-1eaa-424f-9ebd-606bcdb3169c
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-proxy
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-proxy
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: k8s.gcr.io/kube-proxy:v1.17.0
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/modules
name: lib-modules
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
beta.kubernetes.io/os: linux
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-proxy
serviceAccountName: kube-proxy
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
volumes:
- configMap:
defaultMode: 420
name: kube-proxy
name: kube-proxy
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 0
numberReady: 3
observedGeneration: 2
updatedNumberScheduled: 3
Is this just a result of misconfiguration or is this a bug? Any help is appreciated.
This issue is related to weird flannel bug(s) with vxlan backend which causes NAT rules and/or routing entries to be missing or incomplete:
https://github.com/coreos/flannel/issues/1243
https://github.com/coreos/flannel/issues/1245
As a workaround, setting up a static route (to the service network via cni0 interface) on my nodes helped me instantly:
ip route add 10.96.0.0/12 dev cni0
Here is what I did to make it work:
1.Set - --enable-aggregator-routing=true
flag in kube API Server.
2.Set below flags in metrics-server-deployment.yaml
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
3.Set hostNetwork: true
in metrics-server-deployment.yaml