I created a K8s cluster of 5 VMs (1 master and 4 slaves running Ubuntu 16.04.3 LTS) using kubeadm
. I used flannel
to set up networking in the cluster. I was able to successfully deploy an application. I, then, exposed it via NodePort service. From here things got complicated for me.
Before I started, I disabled the default firewalld
service on master and the nodes.
As I understand from the K8s Services doc, the type NodePort exposes the service on all nodes in the cluster. However, when I created it, the service was exposed only on 2 nodes out of 4 in the cluster. I am guessing that's not the expected behavior (right?)
For troubleshooting, here are some resource specs:
root@vm-vivekse-003:~# kubectl get nodes
NAME STATUS AGE VERSION
vm-deepejai-00b Ready 5m v1.7.3
vm-plashkar-006 Ready 4d v1.7.3
vm-rosnthom-00f Ready 4d v1.7.3
vm-vivekse-003 Ready 4d v1.7.3 //the master
vm-vivekse-004 Ready 16h v1.7.3
root@vm-vivekse-003:~# kubectl get pods -o wide -n playground
NAME READY STATUS RESTARTS AGE IP NODE
kubernetes-bootcamp-2457653786-9qk80 1/1 Running 0 2d 10.244.3.6 vm-rosnthom-00f
springboot-helloworld-2842952983-rw0gc 1/1 Running 0 1d 10.244.3.7 vm-rosnthom-00f
root@vm-vivekse-003:~# kubectl get svc -o wide -n playground
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
sb-hw-svc 10.101.180.19 <nodes> 9000:30847/TCP 5h run=springboot-helloworld
root@vm-vivekse-003:~# kubectl describe svc sb-hw-svc -n playground
Name: sb-hw-svc
Namespace: playground
Labels: <none>
Annotations: <none>
Selector: run=springboot-helloworld
Type: NodePort
IP: 10.101.180.19
Port: <unset> 9000/TCP
NodePort: <unset> 30847/TCP
Endpoints: 10.244.3.7:9000
Session Affinity: None
Events: <none>
root@vm-vivekse-003:~# kubectl get endpoints sb-hw-svc -n playground -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-08-09T06:28:06Z
name: sb-hw-svc
namespace: playground
resourceVersion: "588958"
selfLink: /api/v1/namespaces/playground/endpoints/sb-hw-svc
uid: e76d9cc1-7ccb-11e7-bc6a-fa163efaba6b
subsets:
- addresses:
- ip: 10.244.3.7
nodeName: vm-rosnthom-00f
targetRef:
kind: Pod
name: springboot-helloworld-2842952983-rw0gc
namespace: playground
resourceVersion: "473859"
uid: 16d9db68-7c1a-11e7-bc6a-fa163efaba6b
ports:
- port: 9000
protocol: TCP
After some tinkering I realized that on those 2 "faulty" nodes, those services were not available from within those hosts itself.
Node01 (working):
root@vm-vivekse-004:~# curl 127.0.0.1:30847 //<localhost>:<nodeport>
Hello Docker World!!
root@vm-vivekse-004:~# curl 10.101.180.19:9000 //<cluster-ip>:<port>
Hello Docker World!!
root@vm-vivekse-004:~# curl 10.244.3.7:9000 //<pod-ip>:<port>
Hello Docker World!!
Node02 (working):
root@vm-rosnthom-00f:~# curl 127.0.0.1:30847
Hello Docker World!!
root@vm-rosnthom-00f:~# curl 10.101.180.19:9000
Hello Docker World!!
root@vm-rosnthom-00f:~# curl 10.244.3.7:9000
Hello Docker World!!
Node03 (not working):
root@vm-plashkar-006:~# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root@vm-plashkar-006:~# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root@vm-plashkar-006:~# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Node04 (not working):
root@vm-deepejai-00b:/# curl 127.0.0.1:30847
curl: (7) Failed to connect to 127.0.0.1 port 30847: Connection timed out
root@vm-deepejai-00b:/# curl 10.101.180.19:9000
curl: (7) Failed to connect to 10.101.180.19 port 9000: Connection timed out
root@vm-deepejai-00b:/# curl 10.244.3.7:9000
curl: (7) Failed to connect to 10.244.3.7 port 9000: Connection timed out
Tried netstat
and telnet
on all 4 slaves. Here's the output:
Node01 (the working host):
root@vm-vivekse-004:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 27808/kube-proxy
root@vm-vivekse-004:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node02 (the working host):
root@vm-rosnthom-00f:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 11842/kube-proxy
root@vm-rosnthom-00f:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Node03 (the not-working host):
root@vm-plashkar-006:~# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 7791/kube-proxy
root@vm-plashkar-006:~# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Node04 (the not-working host):
root@vm-deepejai-00b:/# netstat -tulpn | grep 30847
tcp6 0 0 :::30847 :::* LISTEN 689/kube-proxy
root@vm-deepejai-00b:/# telnet 127.0.0.1 30847
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection timed out
Addition info:
From the kubectl get pods
output, I can see that the pod is actually deployed on slave vm-rosnthom-00f
. I am able to ping
this host from all the 5 VMs and curl vm-rosnthom-00f:30847
also works from all the VMs.
I can clearly see that the internal cluster networking is messed up, but I am unsure how to resolve it! iptables -L
for all the slaves are identical, and even the Local Loopback (ifconfig lo
) is up and running for all the slaves. I'm completely clueless as to how to fix it!
If you want to reach the service from any node in the cluster you need fine service type as ClusterIP
. Since you defined service type as NodePort
, you can connect from the node where service is running.
my above answer was not correct, based on documentation we should be able to connect from any NodeIP:Nodeport
. but its not working in my cluster also.
https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types
NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting :.
One of my node ip forward not set. I was able to connect my service using NodeIP:nodePort
sysctl -w net.ipv4.ip_forward=1