I am running kubernetes 1.9.4 on my gke cluster
I have two pods , gate which is trying to connect to coolapp, both written in elixir
I am using libcluster to connect my nodes I get the following error:
[libcluster:app_name] cannot query kubernetes (unauthorized): endpoints is forbidden: User "system:serviceaccount:staging:default" cannot list endpoints in the namespace "staging": Unknown user "system:serviceaccount:staging:default"
here is my config in gate under config/prod:
config :libcluster,
topologies: [
app_name: [
strategy: Cluster.Strategy.Kubernetes,
config: [
kubernetes_selector: "tier=backend",
kubernetes_node_basename: System.get_env("MY_POD_NAMESPACE") || "${MY_POD_NAMESPACE}"]]]here is my configuration:
vm-args
## Name of the node
-name ${MY_POD_NAMESPACE}@${MY_POD_IP}
## Cookie for distributed erlang
-setcookie ${ERLANG_COOKIE}
# Enable SMP automatically based on availability
-smp auto
creating the secrets:
kubectl create secret generic erlang-config --namespace staging --from-literal=erlang-cookie=xxxxxx
kubectl create configmap vm-config --namespace staging --from-file=vm.argsgate/deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: gate
namespace: staging
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: gate
tier: backend
spec:
securityContext:
runAsUser: 0
runAsNonRoot: false
containers:
- name: gate
image: gcr.io/development/gate:0.1.7
args:
- foreground
ports:
- containerPort: 80
volumeMounts:
- name: config-volume
mountPath: /beamconfig
env:
- name: MY_POD_NAMESPACE
value: staging
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: RELEASE_CONFIG_DIR
value: /beamconfig
- name: ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: erlang-config
key: erlang-cookie
volumes:
- name: config-volume
configMap:
name: vm-configcoolapp/deployment.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: coolapp
namespace: staging
spec:
replicas: 1
revisionHistoryLimit: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: coolapp
tier: backend
spec:
securityContext:
runAsUser: 0
runAsNonRoot: false
# volumes
volumes:
- name: config-volume
configMap:
name: vm-config
containers:
- name: coolapp
image: gcr.io/development/coolapp:1.0.3
volumeMounts:
- name: secrets-volume
mountPath: /secrets
readOnly: true
- name: config-volume
mountPath: /beamconfig
ports:
- containerPort: 80
args:
- "foreground"
env:
- name: MY_POD_NAMESPACE
value: staging
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: REPLACE_OS_VARS
value: "true"
- name: RELEASE_CONFIG_DIR
value: /beamconfig
- name: ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: erlang-config
key: erlang-cookie
# proxy_container
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=staging:us-central1:com-staging=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: cloudsql
mountPath: /cloudsqlThe default service account for the staging namespace (in which apparently your Pods using libcluster are running) lacks RBAC permissions to get endpoints in that namespace.
Likely your application requires a number of other permissions (that are not mentioned in the above error message) to work correctly; identifying all such permissions is out of scope for SO.
A way to resolve this issue is to grant superuser permissions that service account. This is not a secure solution but a stop gap fix.
$ kubectl create clusterrolebinding make-staging-sa-cluster-admin \
--serviceaccount=staging:default \
--clusterrole=cluster-admin
clusterrolebinding "make-staging-sa-cluster-admin" created
To grant the specific permission only (get endpoints in the staging namespace) you would need to create a Role first:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: some-permissions
namespace: staging
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch"]And create a RoleBinding for the default service account in the staging namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: give-default-sa-some-permissions
namespace: staging
subjects:
- kind: ServiceAccount
name: default
namespace: staging
roleRef:
kind: Role
name: some-permissions
apiGroup: rbac.authorization.k8s.ioNot an erlang/elixir or libcluster user, but it seems it is trying to use the default service account for the namespace to try and query the master for a list of endpoints available in the cluster.
The readme for libcluster says as much:
If set to Cluster.Strategy.Kubernetes, it will use the Kubernetes API to query endpoints based on a basename and label selector, using the token and namespace injected into every pod; once it has a list of endpoints, it uses that list to form a cluster, and keep it up to date.
Reading the code for the kubernetes.ex in libcluster and the error you get confirm as much.
You will need to setup a ClusterRole and RoleBinding for the service account in the staging namespace. This will allow libcluster to dynamically query the master to discover other erlang nodes in the cluster/namespace.
Here are some handy resources for follow up reading: