I am trying to set up a hadoop single node on kubernetes. The odd thing is that, when i login into the pod via kubectl exec -it <pod> /bin/bash
i can happily access e.g. the name node on port 9000.
root@hadoop-5dcf94b54d-7fgfq:/hadoop/hadoop-2.8.5# telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
I can also bin/hdfs dfs -put
files and such, so the cluster seems to be working fine. I can also access the ui via kubectl port-forward <podname> 50070:50070
and i see a data node up and running. So the cluster (setup is 'pseudo-distributed' as described here.) seems to be working fine.
However, when i want to access my service via kubernetes dns, i get a Connection refused
.
telnet hadoop.aca534.svc.cluster.local 9000
Trying 10.32.89.21...
telnet: Unable to connect to remote host: Connection refused
What is the difference when accessing a port via k8s-dns?
The port must be open, i also can see that hadoop name node is listening on 9000.
lsof -i :9000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 2518 root 227u IPv4 144574393 0t0 TCP localhost:9000 (LISTEN)
java 2518 root 237u IPv4 144586825 0t0 TCP localhost:9000->localhost:58480 (ESTABLISHED)
java 2660 root 384u IPv4 144584032 0t0 TCP localhost:58480->localhost:9000 (ESTABLISHED)
For complete reference here is my kubernetes yml
service and deployment specification.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
service: hadoop
name: hadoop
spec:
selector:
matchLabels:
service: hadoop
replicas: 1
template:
metadata:
labels:
service: hadoop
run: hadoop
track: stable
spec:
containers:
- name: hadoop
image: falcowinkler/hadoop:2.8.5
imagePullPolicy: Never
ports:
# HDFS Ports
- containerPort: 50010
- containerPort: 50020
- containerPort: 50070
- containerPort: 50075
- containerPort: 50090
- containerPort: 8020
- containerPort: 9000
# Map Reduce Ports
- containerPort: 19888
# YARN Ports
- containerPort: 8030
- containerPort: 8031
- containerPort: 8032
- containerPort: 8033
- containerPort: 8040
- containerPort: 8042
- containerPort: 8088
- containerPort: 22
# Other Ports
- containerPort: 49707
- containerPort: 2122
---
apiVersion: v1
kind: Service
metadata:
labels:
service: hadoop
name: hadoop
spec:
ports:
- name: hadoop
port: 9000
- name: ssh
port: 22
- name: hadoop-ui
port: 50070
selector:
service: hadoop
type: ClusterIP
What is the difference when accessing a port via k8s-dns?
When you call a Pod IP address, you directly connect to a pod, not to the service.
When you call to DNS name of your service, it resolves to a Service IP address, which forward your request to actual pods using Selectors as a filter to find a destination, so it is 2 different ways of how to access pods.
Also, you cal call Service IP address directly instead of using DNS, it will works the same way. Moreover, Service IP address, unlike Pod IPs, is static, so you can use it all the time if you want.
For in-cluster communication you are using ClusterIP service mode, which is default and you set it, so everything is OK here.
Current endpoints where your service forwards requests you can get by kubectl get service $servicename -o wide
in an "endpoint" column.
What about your current problems with connection, I can recommend you:
Check endpoint of your service (there should be one or more IP addresses of pods),
Set targetPort
parameter for each of service ports, e.g:
apiVersion: v1
kind: Service
metadata:
labels:
service: hadoop
name: hadoop
spec:
ports:
- name: hadoop
port: 9000
targetPort: 9000 # here is
- name: ssh
port: 22
targetPort: 22 # here is
- name: hadoop-ui
port: 50070
targetPort: 50070 # here is
selector:
service: hadoop
type: ClusterIP
P.S. Here is a nice topic with explanation about how Service works. Also, you can check official documentation.