Kubernetes pod distribution

3/1/2019

I've worked quite a lot with Docker in the past years, but I'm a newbie when it comes to Kubernetes. I'm starting today and I am struggling with the usefulness of the Pod concept in comparison with the way I used to do thinks with Docker swarm.

Let's say that I have a cluster with 7 powerful machines and I have the following stack:

  • I want three Cassandra replicas each running in a dedicated machine (3/7)
  • I want two Kafka replicas each running in a dedicated machine (5/7)
  • I want a MyProducer replica running on its own machine, receiving messages from the web and pushing them into Kafka (6/7)
  • I want 3 MyConsumer replicas all running in the last machine (7/7), which pull from Kafka and insert in Cassandra.

With docker swarm I used to handle container distribution with node labels, e.g. I would label three machines and Cassandra container configuration as C_HOST, 2 machines and Kafka configuration as K_HOST,... The swarm deployment would place each container correctly.

I have the following questions:

  • Does Kubernetes pods bring any advantage comparing to my previous approach (e.g. simplicity)? I understood that I am still required to configure labels, if so, I don't see the appeal.

  • What would be the correct way to configure these pods? Would it be one pod for Cassandra replicas, one pod for Kafka replicas, one pod for MyConsumer replicas and one pod for MyProducer?

-- João Matos
docker
docker-swarm
kubernetes

2 Answers

3/1/2019

Using pod anti-affinity, you can ensure that a pod is not co-located with other pods with specific labels.

So say your have a label "app" with values "cassandra", "kafka", "my-producer" and "my-consumer".

Since you want to have cassandra, kafka and my-producer on dedicated nodes all by themselves, you simply configure an anti-affinity to ALL the existing labels:

(see https://kubernetes.io/docs/concepts/configuration/assign-pod-node/ for full schema)

  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values:
        - cassandra
        - kafka
        - my-producer
        - my-consumer

This is for a "Pod" resource, so you'd define this in a deployment (where you also define how many replicas) in the pod template.

Since you want three instances of my-consumer running on the same node (or really, you don't care where they run, since by now only one node is left), you do not need to define anything about affinity or anti-affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-consumer
  namespace: default
  labels:
    app: my-consumer
spec:
  selector:
    matchLabels:
      app: my-consumer
  replicas: 3 # here you set the number of replicas that should run
  template:   # this is the pod template
    metadata:
      labels:
        app: my-consumer # now this is the label you can set an anti-affinity to
    spec:
      containers:
      - image: ${IMAGE}
        name: my-consumer
#       affinity:
# now here below this you'd put the affinity-settings from above
# for the other deployments
-- Markus Dresch
Source: StackOverflow

3/1/2019

You can still use node labels and use nodeSelector parameter.

You can add node labels by using kubectl...

kubectl label nodes <node-name> <label-key>=<label-value> to add a label to the node you’ve chosen.

But more advanced way is use affinity for pod distribution...

-- ozlevka
Source: StackOverflow