In order to access the Kubernetes dashboard you have to run kubectl proxy on your local machine, then point your web browser to the proxy. Similarly, if you want to submit a Spark job you again run kubectl proxy on your local machine then run spark-submit against the localhost address.
My question is, why does Kubernetes have this peculiar arrangement? The dashboard service is running on the Kubernetes cluster, so why am I not pointing my web browser at the cluster directly? Why have a proxy? In some cases the need for proxy is inconvenient. For example, from my Web server I want to submit a Spark job. I can't do that--I have to run a proxy first, but this ties me to a specific cluster. I may have many Kubernetes clusters.
Why was Kubernetes designed such that you can only access it through a proxy?
You can access your application in the cluster in different ways:
hostIP:hostPort
, where the hostIP
is the IP address of the Kubernetes node where the container is running and the hostPort
is the port requested by the user.30000-32767
. All cluster nodes listen to that port and forward all traffic to corresponding Service.NodeIP:Nodeport
for that service.So, basically: [[[ Kubernetes Service type:ClusterIP] + NodePort ] + LoadBalancer ]
Now, about kubectl proxy
. It uses the first way to connect to the cluster. Basically, it reads the cluster configuration in .kube/config and uses credentials from there to pass cluster API Server authentication and authorization stage. Then it creates communication channel from local machine to API-Server interface, so, you can use local port to send requests to Kubernetes cluster API without necessity to specify credentials for each request.
Why was Kubernetes designed such that you can only access it through a proxy?
In simple terms, for security and convenience.
A cluster is isolated by default, doing so, reduce the burden on administrators to think about all possible security breaches that exposed services might open.
The proxy provides a secure connection between the cluster(API Server) and the client, this avoid you having to change all your applications to implement a security logic just to communicate to the cluster, this way, you authenticate once, and every application use this secure connection without any changes.
As noted in your examples, you didn't have to authenticate on the cluster to communicate, kubectl did the work for you and every API Server call will have the security set on your behalf.
The cluster can also be accessed without the proxy, the problem is that you need to manually configure it and authenticate the app to the server and lose the convenience of doing with a simple command.