I am running kubernetes 1.9.4
on my gke
cluster
I have two pods , gate
which is trying to connect to coolapp
, both written in elixir
I am using libcluster to connect my nodes I get the following error:
[libcluster:app_name] cannot query kubernetes (unauthorized): endpoints is forbidden: User "system:serviceaccount:staging:default" cannot list endpoints in the namespace "staging": Unknown user "system:serviceaccount:staging:default"
here is my config in gate under config/prod
:
config :libcluster,
topologies: [
app_name: [
strategy: Cluster.Strategy.Kubernetes,
config: [
kubernetes_selector: "tier=backend",
kubernetes_node_basename: System.get_env("MY_POD_NAMESPACE") || "${MY_POD_NAMESPACE}"]]]
here is my configuration:
vm-args
## Name of the node
-name ${MY_POD_NAMESPACE}@${MY_POD_IP}
## Cookie for distributed erlang
-setcookie ${ERLANG_COOKIE}
# Enable SMP automatically based on availability
-smp auto
creating the secrets:
kubectl create secret generic erlang-config --namespace staging --from-literal=erlang-cookie=xxxxxx
kubectl create configmap vm-config --namespace staging --from-file=vm.args
gate/deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: gate
namespace: staging
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: gate
tier: backend
spec:
securityContext:
runAsUser: 0
runAsNonRoot: false
containers:
- name: gate
image: gcr.io/development/gate:0.1.7
args:
- foreground
ports:
- containerPort: 80
volumeMounts:
- name: config-volume
mountPath: /beamconfig
env:
- name: MY_POD_NAMESPACE
value: staging
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RELEASE_CONFIG_DIR
value: /beamconfig
- name: ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: erlang-config
key: erlang-cookie
volumes:
- name: config-volume
configMap:
name: vm-config
coolapp/deployment.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: coolapp
namespace: staging
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: coolapp
tier: backend
spec:
securityContext:
runAsUser: 0
runAsNonRoot: false
# volumes
volumes:
- name: config-volume
configMap:
name: vm-config
containers:
- name: coolapp
image: gcr.io/development/coolapp:1.0.3
volumeMounts:
- name: secrets-volume
mountPath: /secrets
readOnly: true
- name: config-volume
mountPath: /beamconfig
ports:
- containerPort: 80
args:
- "foreground"
env:
- name: MY_POD_NAMESPACE
value: staging
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: REPLACE_OS_VARS
value: "true"
- name: RELEASE_CONFIG_DIR
value: /beamconfig
- name: ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: erlang-config
key: erlang-cookie
# proxy_container
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=staging:us-central1:com-staging=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: cloudsql
mountPath: /cloudsql
The default service account for the staging
namespace (in which apparently your Pods using libcluster are running) lacks RBAC permissions to get endpoints in that namespace.
Likely your application requires a number of other permissions (that are not mentioned in the above error message) to work correctly; identifying all such permissions is out of scope for SO.
A way to resolve this issue is to grant superuser permissions that service account. This is not a secure solution but a stop gap fix.
$ kubectl create clusterrolebinding make-staging-sa-cluster-admin \
--serviceaccount=staging:default \
--clusterrole=cluster-admin
clusterrolebinding "make-staging-sa-cluster-admin" created
To grant the specific permission only (get endpoints in the staging namespace) you would need to create a Role first:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: some-permissions
namespace: staging
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch"]
And create a RoleBinding for the default service account in the staging namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: give-default-sa-some-permissions
namespace: staging
subjects:
- kind: ServiceAccount
name: default
namespace: staging
roleRef:
kind: Role
name: some-permissions
apiGroup: rbac.authorization.k8s.io
Not an erlang/elixir or libcluster user, but it seems it is trying to use the default service account for the namespace to try and query the master for a list of endpoints available in the cluster.
The readme for libcluster says as much:
If set to Cluster.Strategy.Kubernetes, it will use the Kubernetes API to query endpoints based on a basename and label selector, using the token and namespace injected into every pod; once it has a list of endpoints, it uses that list to form a cluster, and keep it up to date.
Reading the code for the kubernetes.ex in libcluster and the error you get confirm as much.
You will need to setup a ClusterRole and RoleBinding for the service account in the staging namespace. This will allow libcluster to dynamically query the master to discover other erlang nodes in the cluster/namespace.
Here are some handy resources for follow up reading: