Env:
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
km12-01 Ready master 26d v1.13.0 10.42.140.154 <none> Ubuntu 16.04.5 LTS 4.17.0-041700-generic docker://17.6.2
km12-02 Ready master 26d v1.13.0 10.42.104.113 <none> Ubuntu 16.04.5 LTS 4.17.0-041700-generic docker://17.6.2
km12-03 Ready master 26d v1.13.0 10.42.177.142 <none> Ubuntu 16.04.5 LTS 4.17.0-041700-generic docker://17.6.2
prod-k8s-node002 Ready node 25d v1.13.0 10.42.78.21 <none> Ubuntu 16.04.5 LTS 4.17.0-041700-generic docker://17.3.2
prod-k8s-tmpnode005 Ready node 24d v1.13.0 10.42.177.75 <none> Ubuntu 16.04.5 LTS 4.17.0-041700-generic docker://17.3.2
calico v3.3.1
What Happen:
I have a deployment, with 2 pods scheduled on prod-k8s-node002
and prod-k8s-tmpnode005
. Just like this:
# kubectl -n gitlab-prod get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
api-monkey-76489bd8c9-7gbkt 1/1 Running 0 35m 192.244.199.37 prod-k8s-node002 <none> <none>
api-monkey-76489bd8c9-w2zrs 1/1 Running 0 55m 192.244.124.240 prod-k8s-tmpnode005 <none> <none>
Now I curl each pod from a master node, say it is km12-01
:
# # wait 0ms, response a json string, which has a property named 'data', it is a string, which is '1' repeated by 1000 times
# curl -v '192.244.124.240:3000/health/test?ms=0&content=1&repeat=1000'
* Trying 192.244.124.240...
* Connected to 192.244.124.240 (192.244.124.240) port 3000 (#0)
> GET /health/test?ms=0&content=1&repeat=1000 HTTP/1.1
> Host: 192.244.124.240:3000
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Content-Length: 1011
< ETag: W/"3f3-oUOV2TWikka+Y8l16Cqo/Q"
< Date: Sat, 08 Dec 2018 20:12:26 GMT
< Connection: keep-alive
<
* Connection #0 to host 192.244.124.240 left intact
{"data":"1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"}
# curl -v '192.244.199.37:3000/health/test?ms=0&content=1&repeat=1000'
* Trying 192.244.199.37...
* Connected to 192.244.199.37 (192.244.199.37) port 3000 (#0)
> GET /health/test?ms=0&content=1&repeat=1000 HTTP/1.1
> Host: 192.244.199.37:3000
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Content-Length: 1011
< ETag: W/"3f3-oUOV2TWikka+Y8l16Cqo/Q"
< Date: Sat, 08 Dec 2018 20:15:16 GMT
< Connection: keep-alive
<
* Connection #0 to host 192.244.199.37 left intact
{"data":"1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"}
Good, both work.
So if I make a longer response body (2000 bytes), what will happen?
# curl -v '192.244.124.240:3000/health/test?ms=0&content=1&repeat=2000'
* Trying 192.244.124.240...
* Connected to 192.244.124.240 (192.244.124.240) port 3000 (#0)
> GET /health/test?ms=0&content=1&repeat=2000 HTTP/1.1
> Host: 192.244.124.240:3000
> User-Agent: curl/7.47.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
Unfortunately, the connection was hanging on and then reset by peer after couple minutes. But it works on the host of the requested pod.
Curl a pod from its host:
# curl -v '192.244.124.240:3000/health/test?ms=0&content=1&repeat=2000'
* Trying 192.244.124.240...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 192.244.124.240 (192.244.124.240) port 3000 (#0)
> GET /health/test?ms=0&content=1&repeat=2000 HTTP/1.1
> Host: 192.244.124.240:3000
> User-Agent: curl/7.47.0
> Accept: */*
>
{"data":"11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"}< HTTP/1.1 200 OK
< X-Powered-By: Express
< Access-Control-Allow-Origin: *
< Content-Type: application/json; charset=utf-8
< Content-Length: 2011
< ETag: W/"7db-ViIL+fXpsfh/YwmlqDHSsQ"
< Date: Sat, 08 Dec 2018 20:22:55 GMT
< Connection: keep-alive
<
{ [2011 bytes data]
100 2011 100 2011 0 0 440k 0 --:--:-- --:--:-- --:--:-- 490k
* Connection #0 to host 192.244.124.240 left intact
The other one is in the same situation.
Summary:
I tried many times, and found a interesting and strange thing: If I request a pod from any other node (except the host of the pod), the response body cannot be longer than 1140 bytes. Otherwise, the connection will hang on.
My problem:
How it happen? And how can I break this limitation?
This is the kubeadm-1.12.0 initialization file:
# cat kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1alpha3
controlPlaneEndpoint: 10.42.79.210:6443
kind: ClusterConfiguration
kubernetesVersion: v1.12.0
networking:
podSubnet: 192.244.0.0/16
serviceSubnet: 192.96.0.0/16
apiServerCertSANs:
- 10.42.140.154
- 10.42.104.113
- 10.42.177.142
- km12-01
- km12-02
- km12-03
- 127.0.0.1
- localhost
- 10.42.79.210
etcd:
external:
endpoints:
- https://10.42.140.154:2379
- https://10.42.104.113:2379
- https://10.42.177.142:2379
caFile: /etc/kubernetes/ssl/ca.pem
certFile: /etc/etcd/ssl/etcd.pem
keyFile: /etc/etcd/ssl/etcd-key.pem
dataDir: /var/lib/etcd
clusterDNS:
- 192.96.0.2
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
We use an external etcd cluster, and the underlying network uses IPVS rather than iptables. We did an upgrade from 1.12.0 to 1.13.0.
# kubeadm upgrade apply 1.13.0
You can use this to test, and we're the ones that came up after the upgrade. @Adam Otto
There are some IPVS errors as follows: