I've recently been making use of the GKE Workload Identity feature. I'd be interested to know in more detail how the gke-metadata-server
component works.
gcloud
or other language SDKs) falls through to the GCE metadata methodhttp://metadata.google.internal/path
GKE_METADATA_SERVER
on my node pool configures this to resolve to the gke-metadata-server
pod on that node.gke-metadata-server
pod with --privileged and host networking has a means of determining the source (pod IP?) then looking up the pod and its service account to check for the iam.gke.io/gcp-service-account
annotation.[PROJECT_ID].svc.id.goog[[K8S_NAMESPACE]/[KSA_NAME]]
) to get a token for the service account annotated on its Kubernetes service account.I guess the main puzzle for me right now is the verification of the calling pods identity. Originally I thought this would use the TokenReview API but now I'm not sure how the Google client tools would know to use the service account token mounted into the pod...
Edit follow-up questions:
Q1: In between step 2 and 3, is the request to metadata.google.internal
routed to the GKE metadata proxy by the setting GKE_METADATA_SERVER
on the node pool?
Q2: Why does the metadata server pod need host networking?
Q3: In the video here: https://youtu.be/s4NYEJDFc0M?t=2243 it's taken as a given that the pod makes a GCP call. How does the GKE metadata server identify the pod making the call to start the process?
Before going into details, please familiarize yourself with these components:
OIDC provider: Runs on Google’s infrastructure, provides cluster specific metadata and signs authorized JWTs.
GKE metadata server: It runs as a DaemonSet meaning one instance on every node, exposes pod specific metadata server (it will provide backwards compatibility with old client libraries), emulates existing node metadata server.
Google IAM: issues access token, validates bindings, validates OIDC signatures.
Google cloud: accepts access tokens, does pretty much anything.
JWT: JSON Web token
mTLS: Mutual Transport Layer Security
The steps below explain how GKE metadata server components work:
Step 1: An authorized user binds the cluster to the namespace.
Step 2: Workload tries to access Google Cloud service using client libraries.
Step 3: GKE metadata server is going to request an OIDC signed JWT from the control plane. That connection is authenticated using mutual TLS (mTLS) connection with node credential.
Step 4: Then the GKE metadata server is going use that OIDC signed JWT to request an access token for the [identity namespace]/[Kubernetes service account] from IAM. IAM is going to validate that the appropriate bindings exist on identity namespace and in the OIDC provider.
Step 5: And then IAM validates that it was signed by the cluster’s correct OIDC provider. It will then return an access token for the [identity namespace]/[kubernetes service account].
Step 6: Then the metadata server sends the access token it just got back to IAM. IAM will then exchange that for a short lived GCP service account token after validating the appropriate bindings.
Step 7: Then GKE metadata server returns the GCP service account token to the workload.
Step 8: The workload can then use that token to make calls to any Google Cloud Service.
I also found a video regarding Workload Identity which you will find useful.
EDIT Follow-up questions' answers:
Below are answers to your follow-up questions:
Q1: In between step 2 and 3, is the request to metadata.google.internal routed to the gke metadata proxy by the setting GKE_METADATA_SERVER on the node pool?
You are right, GKE_METADATA_SERVER is set on the node pool. This exposes a metadata API to the workloads that is compatible with the V1 Compute Metadata APIs. Once workload tries to access Google Cloud service, the GKE metadata server performs a lookup (the metadata server checks to see if a pod exists in the list whose IP matches the incoming IP of the request) before it goes on to request the OIDC token from the control plane.
Keep in mind that GKE_METADATA_SERVER enumeration feature can only be enabled if Workload Identity is enabled at the cluster level.
Q2: Why does the metadata server pod need host networking?
The gke-metadata-server intercepts all GCE metadata server requests from pods, however pods using the host network are not intercepted.
Q3: How does the GKE metadata server identify the pod making the call to start the process?
The pods are identified using iptables rules.