I upgrade my GKE API server to 1.6, and am in the process of upgrading nodes to 1.6, but ran into a snag...
I've got a prometheus server (version 1.5.2) running in a pod managed by a Kubernetes deployment with a couple of nodes running version 1.5.4 Kubelet, with a single new node running 1.6.
Prometheus can't connect to the new node--it's metrics endpoint is returning 401 Unauthorized.
This seems to be a RBAC issue, but I'm not sure how to proceed. I can't find docs on what roles the Prometheus server needs, or even how to grant them to the server.
From the coreos/prometheus-operator repo I was able to piece together a configuration that I might expect to work:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
secrets:
- name: prometheus-token-xxxxx
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
name: prometheus-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-prometheus
component: server
release: prometheus
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-prometheus
component: server
release: prometheus
spec:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
serviceAccount: prometheus
serviceAccountName: prometheus
...
But Prometheus is still getting 401s.
UPDATE: seems like a kubernetes authentication issue as Jordan said. See new, more focused question here; https://serverfault.com/questions/843751/kubernetes-node-metrics-endpoint-returns-401
As per discussion on @JorritSalverda's ticket; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099
Since GKE doesn't allow you to get to client certificates that would allow you to authenticate yourself with the kubelet, the best solution for users on GKE seems to use the kubernetes API server as a proxy requests to nodes.
To do this (quoting @JorritSalverda);
"For my Prometheus server running inside GKE I now have it running with the following relabeling:
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
replacement: https
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
And the following ClusterRole bound to the service account used by Prometheus:
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
Because the GKE cluster still has an ABAC fallback in case RBAC fails I'm not 100% sure yet this covers all required permissions.
401 means unauthenticated, which means it is not an RBAC issue. I believe GKE no longer allows anonymous access to the kubelet in 1.6. What credentials are you using to authenticate to the kubelet?
This is what I have working for role definition and binding.
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default