Prometheus - Kubernetes RBAC

4/7/2017

I upgrade my GKE API server to 1.6, and am in the process of upgrading nodes to 1.6, but ran into a snag...

I've got a prometheus server (version 1.5.2) running in a pod managed by a Kubernetes deployment with a couple of nodes running version 1.5.4 Kubelet, with a single new node running 1.6.

Prometheus can't connect to the new node--it's metrics endpoint is returning 401 Unauthorized.

This seems to be a RBAC issue, but I'm not sure how to proceed. I can't find docs on what roles the Prometheus server needs, or even how to grant them to the server.

From the coreos/prometheus-operator repo I was able to piece together a configuration that I might expect to work:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
secrets:
- name: prometheus-token-xxxxx

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: prometheus-prometheus
    component: server
    release: prometheus
  name: prometheus-server
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus-prometheus
        component: server
        release: prometheus
    spec:
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: prometheus
      serviceAccountName: prometheus
      ...

But Prometheus is still getting 401s.

UPDATE: seems like a kubernetes authentication issue as Jordan said. See new, more focused question here; https://serverfault.com/questions/843751/kubernetes-node-metrics-endpoint-returns-401

-- pnovotnak
google-kubernetes-engine
kubernetes
prometheus

3 Answers

5/9/2017

As per discussion on @JorritSalverda's ticket; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099

Since GKE doesn't allow you to get to client certificates that would allow you to authenticate yourself with the kubelet, the best solution for users on GKE seems to use the kubernetes API server as a proxy requests to nodes.

To do this (quoting @JorritSalverda);

"For my Prometheus server running inside GKE I now have it running with the following relabeling:

relabel_configs:
- action: labelmap
  regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
  replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
  replacement: https
- source_labels: [__meta_kubernetes_node_name]
  regex: (.+)
  target_label: __metrics_path__
  replacement: /api/v1/nodes/${1}/proxy/metrics

And the following ClusterRole bound to the service account used by Prometheus:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]

Because the GKE cluster still has an ABAC fallback in case RBAC fails I'm not 100% sure yet this covers all required permissions.

-- pnovotnak
Source: StackOverflow

4/7/2017

401 means unauthenticated, which means it is not an RBAC issue. I believe GKE no longer allows anonymous access to the kubelet in 1.6. What credentials are you using to authenticate to the kubelet?

-- Jordan Liggitt
Source: StackOverflow

5/10/2017

This is what I have working for role definition and binding.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
-- Pouya Naghizadeh
Source: StackOverflow