kubernetes cluster migration

7/2/2018

I currently have multiple AWS accounts each with it's own Kubernetes cluster. Unfortunately, when the clusters were initially deployed using kops VPCs were created with overlapping CIDR blocks. This normally wouldn't be a problem as each cluster essentially existed in it's own universe.

Things have changed a bit and now we want to implement cross account VPC peering. The idea is users connect over the VPN have have access to all resources through said peering. My understanding is the CIDR block overlap is going to be a major problem when peering is implemented.

It doesn't seem one can just change the CIDR block of the existing cluster. Is my only option to backup and restore the cluster in a new VPC with something like ark? Has anyone gone through a full cluster migration? I'd be curious if there is a better answer.

-- mootpt
amazon-web-services
kubernetes

1 Answer

7/2/2018

Your understanding is correct: with kops, you can't change CIDR blocks of an existing cluster; it's stuck in the VPC in which it was created, and you can't change the CIDR block of a VPC:

The IP address range of a VPC is made up of the CIDR blocks associated with it. You select one CIDR block when you create the VPC, and you can add or remove secondary CIDR blocks later. The CIDR block that you add when you create the VPC cannot be changed, but you can add and remove secondary CIDR blocks to change the IP address range of the VPC. (emphasis mine)

That leads us to the second point: migrating your cluster. This can be broken down into two phases:

  1. Migrating the infrastructure managed by kops
  2. Migrating the workloads on the cluster

1. Migrating the infrastructure managed by kops

You will need to migrate (i.e. recreate) the kops cluster itself: the ec2 instances, kops InstanceGroups and Cluster objects, various AWS infrastructure, etc. For that, you can use the kops toolbox template command:

kops toolbox template --values /path/to/values.yaml --template /path/to/cluster/template.yaml > /path/to/output/cluster.yaml
kops create -f /path/to/output/cluster.yaml

This is a Helm-like tool that allows you to templatize your kops cluster configuration and pass in different values.yaml files. You might want to include this command within a small shell script wrapper or a Makefile to create 1-click cluster deployments to get your k8s cluster infrastructure set up easily and repeatably.

A sample cluster template.yaml file and values.yaml file might look like the following, which includes the specs for the Cluster, and master, worker, and autoscale InstanceGroups.

# template.yaml
{{ $clusterSubdomain := (env "CLUSTER_SUBDOMAIN") }}
{{ $subnetCidr := (env "SUBNET_CIDR") }}

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  name: {{ $clusterSubdomain }}.k8s.example.io
spec:
  hooks:
  - manifest: |
      [Unit]
      Description=Create example user
      ConditionPathExists=!/home/example/.ssh/authorized_keys

      [Service]
      Type=oneshot
      ExecStart=/bin/sh -c 'useradd example && echo "{{ .examplePublicKey }}" > /home/example/.ssh/authorized_keys'
    name: useradd-example.service
    roles:
    - Node
    - Master
  - manifest: |
      Type=oneshot
      ExecStart=/usr/bin/coreos-cloudinit --from-file=/home/core/cloud-config.yaml
    name: reboot-window.service
    roles:
    - Node
    - Master
  kubeAPIServer:
    authorizationRbacSuperUser: admin
    featureGates:
      TaintBasedEvictions: "true"
  kubeControllerManager:
    featureGates:
      TaintBasedEvictions: "true"
    horizontalPodAutoscalerUseRestClients: false
  kubeScheduler:
    featureGates:
      TaintBasedEvictions: "true"
  kubelet:
    featureGates:
      TaintBasedEvictions: "true"
  fileAssets:
  - content: |
      yes
    name: docker-1.12
    path: /etc/coreos/docker-1.12
    roles:
    - Node
    - Master
  - content: |
      #cloud-config
      coreos:
        update:
          reboot-strategy: "etcd-lock"
        locksmith:
          window-start: {{ .locksmith.windowStart }}
          window-length: {{ .locksmith.windowLength }}
    name: cloud-config.yaml
    path: /home/core/cloud-config.yaml
    roles:
    - Node
    - Master
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://my-bucket.example.io/{{ $clusterSubdomain }}.k8s.example.io
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-{{ .zone }}
      name: a
    name: main
  - etcdMembers:
    - instanceGroup: master-{{ .zone }}
      name: a
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubernetesApiAccess:
  - {{ .apiAccessCidr }}
  kubernetesVersion: {{ .k8sVersion }}
  masterPublicName: api.{{ $clusterSubdomain }}.k8s.example.io
  networkCIDR: {{ .vpcCidr }}
  networkID: {{ .vpcId }}
  networking:
    canal: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - {{ .sshAccessCidr }}
  subnets:
  - cidr: {{ $subnetCidr }}
    name: {{ .zone }}
    type: Public
    zone: {{ .zone }}
  topology:
    dns:
      type: Public
    masters: public
    nodes: public
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
  name: master-{{ .zone }}
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeMaster }}
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-{{ .zone }}
  role: Master
  subnets:
  - {{ .zone }}
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
  name: nodes
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeNode }}
  maxSize: {{ .nodeCount }}
  minSize: {{ .nodeCount }}
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  subnets:
  - {{ .zone }}
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  name: ag.{{ $clusterSubdomain }}.k8s.example.io
  labels:
    kops.k8s.io/cluster: {{ $clusterSubdomain }}.k8s.example.io
spec:
{{- if .additionalSecurityGroups }}
  additionalSecurityGroups:
{{- range .additionalSecurityGroups }}
  - {{ . }}
{{- end }}
{{- end }}
  image: {{ .image }}
  machineType: {{ .awsMachineTypeAg }}
  maxSize: 10
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: ag.{{ $clusterSubdomain }}.k8s.example.io
  role: Node
  subnets:
  - {{ .zone }}

And the values.yaml file:

# values.yaml:

region: us-west-2 
zone: us-west-2a  
environment: staging 
image: ami-abc123
awsMachineTypeNode: c5.large
awsMachineTypeMaster: m5.xlarge
awsMachineTypeAg: c5.large
nodeCount: "2"
k8sVersion: "1.9.3"
vpcId: vpc-abc123
vpcCidr: 172.23.0.0/16
apiAccessCidr: <e.g. office ip> 
sshAccessCidr: <e.g. office ip>
additionalSecurityGroups:
- sg-def234 # kubernetes-standard
- sg-abc123 # example scan engine targets
examplePublicKey: "ssh-rsa ..."
locksmith:
  windowStart: Mon 16:00 # 8am Monday PST
  windowLength: 4h

2. Migrating the workloads on the cluster

Not having any hands-on experience with Ark, it does seem to fit your use case well:

Cluster migration

Using Backups and Restores

Heptio Ark can help you port your resources from one cluster to another, as long as you point each Ark Config to the same cloud object storage. In this scenario, we are also assuming that your clusters are hosted by the same cloud provider. Note that Heptio Ark does not support the migration of persistent volumes across cloud providers.

(Cluster 1) Assuming you haven’t already been checkpointing your data with the Ark schedule operation, you need to first back up your

entire cluster (replacing as desired):

ark backup create <BACKUP-NAME>

The default TTL is 30 days (720 hours); you can use the --ttl flag to change this as necessary.

(Cluster 2) Make sure that the persistentVolumeProvider and backupStorageProvider fields in the Ark Config match the ones from

Cluster 1, so that your new Ark server instance is pointing to the same bucket.

(Cluster 2) Make sure that the Ark Backup object has been created. Ark resources are synced with the backup files available in cloud

storage.

(Cluster 2) Once you have confirmed that the right Backup (<BACKUP-NAME>) is now present, you can restore everything with:

ark restore create --from-backup <BACKUP-NAME>

Configuring Ark on AWS clusters seems straight-forward enough: https://github.com/heptio/ark/blob/master/docs/aws-config.md.

With some initial setup with the kops toolbox script and Ark configuration, you should have a clean, repeatable way to migrate your cluster and turn your pets into cattle, as the meme goes.

-- erstaples
Source: StackOverflow