We're running several clusters with AWS's EKS.
Currently all the clusters are already on 1.19 but the NodeGroups are still running on 1.18. The last update of the NodeGroups was in December and there everything works well. The aws-auth ConfigMap wasn't modified since this moment of time.
Now we want to update them. If we either click in the Console on Update or using the following command:
aws eks --region <clusterRegion> update-nodegroup-version --cluster-name=<clusterName> --nodegroup-name=<nodeGroupName>
...it fails with:
An error occurred (InvalidRequestException) when calling the UpdateNodegroupVersion operation: Nodegroup health has issues other than [ AsgInstanceLaunchFailures, InstanceLimitExceeded, InsufficientFreeAddresses, ClusterUnreachable ]
A look in the details of the nodegroup shows the following message:
AccessDenied: The aws-auth ConfigMap in your cluster is invalid.
The related ConfigMap (which works fine for all of us to access it) has the following content (stripped from sensitive information):
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::<accountId>:role/<ourEksClusterNodeRole>
username: system:node:{{EC2PrivateDNSName}}
- groups:
- system:masters
rolearn: arn:aws:iam::<accountId>:role/AWSReservedSSO_SystemAdministrator_<someRandomString>
username: {{SessionName}}
It turns out that the proposed way by the AWS documentation to integrate SSO users into the clusters is not compatible with the latest version of EKS.
The placeholder {{SessionName}}
cannot be evaluated. So I've had to change it like this:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::<accountId>:role/<ourEksClusterNodeRole>
username: system:node:{{EC2PrivateDNSName}}
- groups:
- system:masters
rolearn: arn:aws:iam::<accountId>:role/AWSReservedSSO_SystemAdministrator_<someRandomString>
username: awssso-system-administrator
The downside of this approach that we lost audit information in the logs.
To get around this (although it is really weird):
1. Adjust the aws-auth
ConfigMap like this.
2. Wait for some minutes.
3. Trigger the AMI release version upgrade
4. Wait until it is done.
5. Change the aws-auth
ConfigMap back.