Pods cannot communicate with each other

9/10/2019

I have two jobs that will run only once. One is called Master and one is called Slave. As the name implies a Master pod needs some info from the slave then queries some API online. A simple scheme on how the communicate can be done like this:

Slave --- port 6666 ---> Master ---- port 8888 ---> internet:www.example.com

To achieve this I created 5 yaml file:

  1. A job-master.yaml for creating a Master pod:
apiVersion: batch/v1
kind: Job
metadata:
  name: master-job
  labels:
    app: master-job
    role: master-job
spec:
  template:
    metadata:
      name: master
    spec:
      containers:
      - name: master
        image: registry.gitlab.com/example
        command: ["python", "run.py", "-wait"]
        ports:
        - containerPort: 6666

      imagePullSecrets:
      - name: regcred
      restartPolicy: Never
  1. A service (ClusterIP) that allows the Slave to send info to the Master node on port 6666:
apiVersion: v1
kind: Service
metadata:
  name: master-service
  labels:
    app: master-job
    role: master-job
spec:
  selector:
    app: master-job
    role: master-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666
  1. A service(NodePort) that will allow the master to fetch info online:
apiVersion: v1
kind: Service
metadata:
  name: master-np-service
spec:
  type: NodePort
  selector:
    app: master-job
  ports:
    - protocol: TCP
      port: 8888
      targetPort: 8888
      nodePort: 31000
  1. A job for the Slave pod:
apiVersion: batch/v1
kind: Job
metadata:
  name: slave-job
  labels:
    app: slave-job
spec:
  template:
    metadata:
      name: slave
    spec:
      containers:
      - name: slave
        image: registry.gitlab.com/example2
        ports:
        - containerPort: 6666
        #command: ["python", "run.py", "master-service.default.svc.cluster.local"]
        #command: ["python", "run.py", "10.106.146.155"]
        command: ["python", "run.py", "master-service"]
      imagePullSecrets:
      - name: regcred
      restartPolicy: Never
  1. And a service (ClusterIP) that allows the Slave pod to send the info to the Master pod:
apiVersion: v1
kind: Service
metadata:
  name: slave-service
spec:
  selector:
    app: slave-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666

But no matter what I do (as it can be seen in the job_slave.yaml file in the commented lines) they cannot communicate with each other except when I put the IP of the Master node in the command section of the Slave. Also the Master node cannot communicate with the outside world (even though I created a configMap with upstreamNameservers: | ["8.8.8.8"] Everything is running in a minikube environment. But I cannot pinpoint what my problem is. Any help is appreciated.

-- Andi Domi
docker
kubernetes
minikube

2 Answers

9/10/2019

Your Job spec has two parts: a description of the Job itself, and a description of the Pods it creates. (Using a Job here is a little odd and I'd probably pick a Deployment instead, but the same applies here.) Where the Service object has a selector: that matches the labels: of the Pods.

In the YAML files you show the Jobs have correct labels but the generated Pods don't. You need to add (potentially duplicate) labels to the pod spec part:

apiVersion: batch/v1
kind: Job
metadata:
  name: master-job
  labels: {...}
spec:
  template:
    metadata:
      # name: will get ignored here
      labels:
        app: master-job
        role: master-job

You should be able to verify with kubectl describe service master-service. At the end of its output will be a line that says Endpoints:. If the Service selector and the Pod labels don't match this will say <none>; if they do match you will see the Pod IP addresses.

(You don't need a NodePort service unless you need to accept requests from outside the cluster; it could be the same as the service you use to accept requests from within the cluster. You don't need to include objects' types in their names. Nothing you've shown has any obvious relevance to communication out of the cluster.)

-- David Maze
Source: StackOverflow

9/10/2019

Try with headless service:

apiVersion: v1
kind: Service
metadata:
  name: master-service
  labels:
    app: master-job
    role: master-job
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: master-job
    role: master-job
  ports:
    - protocol: TCP
      port: 6666
      targetPort: 6666

and use command: ["python", "run.py", "master-service"] in your job_slave.yaml

Make sure your master job is listening on port 6666 inside your container.

-- FL3SH
Source: StackOverflow