Kubernetes different container args depending on number of pods in replica set

9/27/2017

I want to scale an application with workers.
There could be 1 worker or 100, and I want to scale them seamlessly.
The idea is using replica set. However due to domain-specific reasons, the appropriate way to scale them is for each worker to know its: ID and the total number of workers.

For example, in case I have 3 workers, I'd have this:

id:0, num_workers:3
id:1, num_workers:3
id:2, num_workers:3

Is there a way of using kubernetes to do so?
I pass this information in command line arguments to the app, and I assume it would be fine having it in environment variables too.

It's ok on size changes for all workers to be killed and new ones spawned.

-- nmiculinic
kubernetes
replicaset

2 Answers

12/7/2017

Before giving the kubernetes-specific answer, I wanted to point out that it seems like the problem is trying to push cluster-coordination down into the app, which is almost by definition harder than using a distributed system primitive designed for that task. For example, if every new worker identifies themselves in etcd, then they can watch keys to detect changes, meaning no one needs to destroy a running application just to update its list of peers, their contact information, their capacity, current workload, whatever interesting information you would enjoy having while building a distributed worker system.

But, on with the show:


If you want stable identifiers, then StatefulSets is the modern answer to that. Whether that is an exact fit for your situation depends on whether (for your problem domain) id:0 being "rebooted" still counts as id:0 or the fact that it has stopped and started now disqualifies it from being id:0.

The running list of cluster size is tricky. If you are willing to be flexible in the launch mechanism, then you can have a pre-launch binary populate the environment right before spawning the actual worker (that example is for reading from etcd directly, but the same principle holds for interacting with the kubernetes API, then launching).

You could do that same trick in a more static manner by having an initContainer write the current state of affairs to a file, which the app would then read in. Or, due to all Pod containers sharing networking, the app could contact a "sidecar" container on localhost to obtain that information via an API.

So far so good, except for the

on size changes for all workers to be killed and new one spawned

The best answer I have for that requirement is that if the app must know its peers at launch time, then I am pretty sure you have left the realm of "scale $foo --replicas=5" and entered into the "destroy the peers and start all afresh" realm, with kubectl delete pods -l some-label=of-my-pods; which is, thankfully, what updateStrategy: type: OnDelete does, when combined with the delete pods command.

-- mdaniel
Source: StackOverflow

1/11/2018

In the end, I've tried something different. I've used kubernetes API to get the number of running pods with the same label. This is python code utilizing kubernetes python client.

import socket
from kubernetes import client
from kubernetes import config

config.load_incluster_config()
v1 = client.CoreV1Api()
with open(
    '/var/run/secrets/kubernetes.io/serviceaccount/namespace',
    'r'
) as f:
    namespace = f.readline()
workers = []
for pod in v1.list_namespaced_pod(
    namespace,
    watch=False,
    label_selector="app=worker"
).items:
    workers.append(pod.metadata.name)
workers.sort()
num_workers = len(workers)
worker_id = workers.index(socket.gethostname())
-- nmiculinic
Source: StackOverflow