We are using a Apache-Kafka deployment on Kubernetes which is based on the ability to label pods after they have been created (see https://github.com/Yolean/kubernetes-kafka). The init container of the broker pods takes advantage of this feature to set a label on itself with its own numeric index (e.g. "0", "1", etc) as value. The label is used in the service descriptors to select exactly one pod.
This approach works fine on our DIND-Kubernetes environment. However, when tried to port the deployment onto a Docker-EE Kubernetes environment we ran into trouble because the command kubectl label pod
generates a run time error which is completely misleading (also see https://github.com/fabric8io/kubernetes-client/issues/853).
In order to verify the run time error in a minimal setup we created the following deployment scripts.
# create a simple pod as a test target for labeling
> kubectl run -ti -n default --image alpine sh
# get the pod name for all further steps
> kubectl -n default get pods
NAME READY STATUS RESTARTS AGE
nfs-provisioner-7d49cdcb4f-8qx95 1/1 Running 1 7d
nginx-deployment-76dcc8c697-ng4kb 1/1 Running 1 7d
nginx-deployment-76dcc8c697-vs24j 1/1 Running 0 20d
sh-777f6db646-hrm65 1/1 Running 0 3m <--- This is the test pod
test-76bbdb4654-9wd9t 1/1 Running 2 6d
test4-76dbf847d5-9qck2 1/1 Running 0 5d
# get client and server versions
> kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5",
GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean",
BuildDate:"2018-06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.11- docker-8d637ae", GitCommit:"8d637aedf46b9c21dde723e29c645b9f27106fa5",
GitTreeState:"clean", BuildDate:"2018-04-26T16:51:21Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
# set label
kubectl -n default label pod sh-777f6db646-hrm65 "mylabel=hallo"
pod "sh-777f6db646-hrm65" labeled <---- successful execution
Everything works fine as expected.
kubectl
1.10.5FROM debian:stretch-
slim@sha256:ea42520331a55094b90f6f6663211d4f5a62c5781673935fe17a4dfced777029
ENV KUBERNETES_VERSION=1.10.5
RUN set -ex; \
export DEBIAN_FRONTEND=noninteractive; \
runDeps='curl ca-certificates procps netcat'; \
buildDeps=''; \
apt-get update && apt-get install -y $runDeps $buildDeps --no-install- recommends; \
rm -rf /var/lib/apt/lists/*; \
\
curl -sLS -o k.tar.gz -k https://dl.k8s.io/v${KUBERNETES_VERSION}/kubernetes-client-linux-amd64.tar.gz; \
tar -xvzf k.tar.gz -C /usr/local/bin/ --strip-components=3 kubernetes/client/bin/kubectl; \
rm k.tar.gz; \
\
apt-get purge -y --auto-remove $buildDeps; \
rm /var/log/dpkg.log /var/log/apt/*.log
This image is deployed as 10.100.180.74:5000/test/kubectl-client-1.10.5
in a site local registry and will be referred to below.
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: pod-labeler
namespace: default
spec:
selector:
matchLabels:
app: pod-labeler
replicas: 1
serviceName: pod-labeler
updateStrategy:
type: OnDelete
template:
metadata:
labels:
app: pod-labeler
annotations:
spec:
terminationGracePeriodSeconds: 10
containers:
- name: check-version
image: 10.100.180.74:5000/test/kubectl-client-1.10.5
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
value: sh-777f6db646-hrm65
command: ["/usr/local/bin/kubectl", "version" ]
- name: label-pod
image: 10.100.180.74:5000/test/kubectl-client-1.10.5
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
value: sh-777f6db646-hrm65
command: ["/bin/bash", "-c", "/usr/local/bin/kubectl -n default label pod $POD_NAME 'mylabel2=hallo'" ]
We get the following logging output
# Log of the container "check-version"
2018-07-18T11:11:10.791011157Z Client Version: version.Info{Major:"1",
Minor:"10", GitVersion:"v1.10.5",
GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean",
BuildDate:"2018-\
06-21T11:46:00Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
2018-07-18T11:11:10.791058997Z Server Version: version.Info{Major:"1",
Minor:"8+", GitVersion:"v1.8.11-docker-8d637ae",
GitCommit:"8d637aedf46b9c21dde723e29c645b9f27106fa5", GitTreeState:"clean",
BuildDate:"2018-04-26T16:51:21Z", GoVersion:"go1.8.3", Compiler:"gc",
Platform:"linux/amd64"}
and the run time error
2018-07-18T11:24:15.448695813Z The Pod "sh-777f6db646-hrm65" is invalid:
spec.tolerations: Forbidden: existing toleration can not be modified except its tolerationSeconds
Any ideas?
It was pointed out in one of the comments that the issue may be caused by a missing permission even though the error message does not insinuate so. We officially filed a ticket with Docker and actually got exactly this result: In order to be able to set/modify a label from within a pod the default user of the namespace must be given the "Scheduler" role on the swarm resource (which later shows up as \
in the GUI). Granting this permission fixes the problem. See added grant in Docker-EE-GUI below.
From my point of view, this is far from obvious. The Docker support representative offered to investigate if this is actually expected behavior or results from a bug. As soon as we learn more on this question I will include it into our answer.
As for using more debugging output: Unfortunately, adding --v=9
to the calls of kubectl
does not return any useful information. There's too much output to be displayed here but the overall logging is very similar in both cases: It consists of a lot GET API requests which are all successful followed by a final PATCH API request which succeeds in the one case but fails in the other as described above.