Cannot update EKS NodeGroup because of aws-auth ConfigMap issues

3/23/2021

We're running several clusters with AWS's EKS.

Currently all the clusters are already on 1.19 but the NodeGroups are still running on 1.18. The last update of the NodeGroups was in December and there everything works well. The aws-auth ConfigMap wasn't modified since this moment of time.

Now we want to update them. If we either click in the Console on Update or using the following command:

aws eks --region <clusterRegion> update-nodegroup-version --cluster-name=<clusterName> --nodegroup-name=<nodeGroupName>

...it fails with:

An error occurred (InvalidRequestException) when calling the UpdateNodegroupVersion operation: Nodegroup health has issues other than [ AsgInstanceLaunchFailures, InstanceLimitExceeded, InsufficientFreeAddresses, ClusterUnreachable ]

A look in the details of the nodegroup shows the following message:

AccessDenied: The aws-auth ConfigMap in your cluster is invalid.

The related ConfigMap (which works fine for all of us to access it) has the following content (stripped from sensitive information):

mapRoles: |
	- groups:
	  - system:bootstrappers
	  - system:nodes
	  rolearn: arn:aws:iam::<accountId>:role/<ourEksClusterNodeRole>
	  username: system:node:{{EC2PrivateDNSName}}
	- groups:
	  - system:masters
	  rolearn: arn:aws:iam::<accountId>:role/AWSReservedSSO_SystemAdministrator_<someRandomString>
	  username: {{SessionName}}
-- GreNodge
amazon-eks
amazon-web-services
authentication
kubernetes

1 Answer

3/26/2021

It turns out that the proposed way by the AWS documentation to integrate SSO users into the clusters is not compatible with the latest version of EKS.

The placeholder {{SessionName}} cannot be evaluated. So I've had to change it like this:

mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::<accountId>:role/<ourEksClusterNodeRole>
      username: system:node:{{EC2PrivateDNSName}}
    - groups:
      - system:masters
      rolearn: arn:aws:iam::<accountId>:role/AWSReservedSSO_SystemAdministrator_<someRandomString>
      username: awssso-system-administrator

The downside of this approach that we lost audit information in the logs.

To get around this (although it is really weird): 1. Adjust the aws-auth ConfigMap like this. 2. Wait for some minutes. 3. Trigger the AMI release version upgrade 4. Wait until it is done. 5. Change the aws-auth ConfigMap back.

-- GreNodge
Source: StackOverflow