Upgrade Kubernetes Cluster using Terraform's provisioner

12/22/2018

Scenario:

I have a situation where I don't have the liberty to use any market available Kubernetes upgrade tool like KOps or KubeSpray. So, I'm compelled to use Terraform to provision the instances and kubernetes is installed as part of bootstrapping using provisioners. Now, the catchy part is, my K8s cluster is running on version 1.11.6 and I want to upgrade it to 1.12.3.

What I did:

In the terraform scripts, I doubled node count and updated the K8s version. Then did the Terraform deployment. I was successful in upgrading the new nodes (the nodes formed after doubling count). Post which I terminated the instances running old version on K8s. Now I only have the new nodes with new version of K8s running on them. Then ran terraform refresh so as to sync the statefile with real resources which exist remotely on AWS.

Problem

To verify the statefile and remote are in sync, I ran terraform plan which shows some resources to be created. Basically, the plan shows it's going to create the new nodes again. Unable to understand why so !!

Please can someone clarify what's going wrong here ? Thanks in advance.

Update:

My K8s version map

type = "map"
default = {
        kubernetes = "1.11.5"
        etcd = "3.3.1"
        dockerCE = "18.06.1.ce-3.el7"
        cfssl = "1.2"
        kube-dns = "1.14.10"
        core-dns = "1.2.0"
        helm = "2.9.1"
        calico-node = "3.0.6"
        calico-cni = "2.0.5"
        calico-kube-controller = "2.0.4"
        nginx-ingress-controller = "0.19.0"
}

My node count (for master below, but same for all nodes like etcd, ca, worker etc)

variable "MASTER_COUNT" {
        type = "map"
        default = {
                #bastion
                dev = "1"
                prod = "3"
        }

Terraform plan still shows below resources to be created. Basically it tries to recreate the nodes with older version of K8s, which shouldn't be as I have already ran terraform refresh which should sync my local and remote.

Terraform will perform the following actions:

  + module.master.aws_instance.ec2-master[0]
      id:                                                <computed>
      ami:                                               "ami-######"
      arn:                                               <computed>
      associate_public_ip_address:                       <computed>
      availability_zone:                                 <computed>
      cpu_core_count:                                    <computed>
      cpu_threads_per_core:                              <computed>
      ebs_block_device.#:                                "2"

  + module.master.aws_instance.ec2-master[1]
      id:                                                <computed>
      ami:                                               "ami-#######"
      arn:                                               <computed>
      associate_public_ip_address:                       <computed>
      availability_zone:                                 <computed>
      cpu_core_count:                                    <computed>
      cpu_threads_per_core:                              <computed>
      ebs_block_device.#:                                "2"

  + module.master.aws_instance.ec2-master[2]
      id:                                                <computed>
      ami:                                               "ami-######"
      arn:                                               <computed>
      associate_public_ip_address:                       <computed>
      availability_zone:                                 <computed>
      cpu_core_count:                                    <computed>
      cpu_threads_per_core:                              <computed>
      ebs_block_device.#:                                "2"

  - module.master.aws_instance.ec2-master[3]

  - module.master.aws_instance.ec2-master[4]

  - module.master.aws_instance.ec2-master[5]

# some other re-creations like Auto scaling group, Load balancer changes etc

Plan: 10 to add, 1 to change, 16 to destroy.
-- jagatjyoti
amazon-web-services
kubernetes
terraform

1 Answer

1/2/2019

Finally, I was able to resolve this which means a K8s minor upgrade is successful. Below steps were followed during the course:

  • Deploy a K8s cluster running version 1.11.2
  • Double the node count, change version to 1.11.5 and re-deploy
  • New nodes get created with updated version
  • Remove nodes running old version i.e. 1.11.2
  • Run terraform refresh so as to sync statefile with real world running Infrastructure
  • Change the node count to 3 or half it.
  • Run terraform plan and verify (multiple runs of refresh might be needed)
  • Run terraform apply to apply changes.
  • Statefile should be in sync with remote
  • Run terraform plan which shouldn’t show any resources to be created

I will be trying a major version upgrade shortly and post the results here.

-- jagatjyoti
Source: StackOverflow