I setup a 4 node Kubernetes cluster by following the guide found here: https://www.tecmint.com/install-a-kubernetes-cluster-on-centos-8/
It has one master and 3 worker nodes.
I'm running a deployment called "hello-world" based on the bashofmann/rancher-demo image with 20 replicas. I've also created a nodeport service called hello-world that maps nodeport 30213 to port 8080 on each respective pod.
See below for the basic details:
# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/hello-world-655b948488-22dq4 1/1 Running 0 112m
pod/hello-world-655b948488-2fd7f 1/1 Running 0 112m
pod/hello-world-655b948488-2hrtw 1/1 Running 0 112m
pod/hello-world-655b948488-5h4ns 1/1 Running 0 112m
pod/hello-world-655b948488-5zg9w 1/1 Running 0 112m
pod/hello-world-655b948488-7kcsp 1/1 Running 0 112m
pod/hello-world-655b948488-c5m67 1/1 Running 0 112m
pod/hello-world-655b948488-dswcv 1/1 Running 0 112m
pod/hello-world-655b948488-fbtx6 1/1 Running 0 112m
pod/hello-world-655b948488-g7bxp 1/1 Running 0 112m
pod/hello-world-655b948488-gfb4v 1/1 Running 0 112m
pod/hello-world-655b948488-j6lz9 1/1 Running 0 112m
pod/hello-world-655b948488-jthnq 1/1 Running 0 112m
pod/hello-world-655b948488-pm5b8 1/1 Running 0 112m
pod/hello-world-655b948488-qt7gs 1/1 Running 0 112m
pod/hello-world-655b948488-s2hjv 1/1 Running 0 112m
pod/hello-world-655b948488-vcjzz 1/1 Running 0 112m
pod/hello-world-655b948488-vprgn 1/1 Running 0 112m
pod/hello-world-655b948488-x4b9n 1/1 Running 0 112m
pod/hello-world-655b948488-ztfh7 1/1 Running 0 112m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hello-world NodePort 10.110.212.243 <none> 8080:30213/TCP 114m
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d2h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hello-world 20/20 20 20 112m
NAME DESIRED CURRENT READY AGE
replicaset.apps/hello-world-655b948488 20 20 20 112m
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready control-plane,master 2d2h v1.21.3 192.168.188.190 <none> CentOS Linux 8 4.18.0-305.10.2.el8_4.x86_64 docker://20.10.7
k8s-worker1 Ready <none> 2d2h v1.21.3 192.168.188.191 <none> CentOS Linux 8 4.18.0-305.10.2.el8_4.x86_64 docker://20.10.7
k8s-worker2 Ready <none> 2d2h v1.21.3 192.168.188.192 <none> CentOS Linux 8 4.18.0-305.10.2.el8_4.x86_64 docker://20.10.7
k8s-worker3 Ready <none> 2d2h v1.21.3 192.168.188.193 <none> CentOS Linux 8 4.18.0-305.10.2.el8_4.x86_64 docker://20.10.7
I've discovered that the cluster is not load balancing across the three worker nodes. If I open my web browser and go to http://192.168.188.191:30213 then it'll load the website but only when served up by pods on k8s-worker1. Likewise, if I go to http://192.168.188.192:30213 then it'll load the website but only when served up by pods on k8s-worker2.
The neat thing about this particular container/pod image is that it'll display which pod is serving up my website request at any given time. The page refreshes and cycles through all of the available pods in the cluster. I can see that any time the page successfully refreshes it's only being served by a pod that's present on k8s-worker1. It'll also display how many replicas are present. I should be seeing 20, but I only see at most 8 replicas.
It'll never load the website served up by any pods on any of the other worker nodes. From the k8s-master I can issue the "curl --insecure http://192.168.188.191:30213" command and get a response about 33% of the time. The rest of the time it fails. I believe this is because it's trying to load-balance the request to the other worker nodes but those requests fail.
Since I'm still pretty new to this stuff, I'm not sure what to look at. Is it possible there's something wrong with the replicaset?
Each worker node has the following firewall rules opened up:
# firewall-cmd --list-ports
6443/tcp 2379-2380/tcp 10250/tcp 10251/tcp 10252/tcp 10255/tcp 6783/tcp 6783/udp 6784/udp 443/tcp
Are there more ports I need to open up? Am I supposed to open up all ports from 3000-32767? This seems like a possible security vulnerability.
The behaviour that you are facing is probably given by the the type of service that you created for load balancing your pod: You used type NodePort.
Mentioning the official kubernetes documentation the type NodePort is exposing the Service on each Node's IP at a static port (in your case the 30213). In this scenario, by making a request to a node, you are coherently seeing always the same node and the number of pods scheduled on that node, in your mentioned example: 8 (Note that this number may vary with respect to the pods distribution).
If you want to load balance all your pod you should either use a service type: LoadBalancer or ClusterIp + an Ingress.
Please, note that both of the options that I mentioned require that the cluster is capable of having an external ip which is externally load balanced "outside" kubernetes scope. If you are using a managed kubernetes installation (e.g.GKE, EKS, AKS) you will have it "for free". Since you are using a custom cluster installation, for the load balanced, you can have a look at MetalLB project.