Why is my GKE cluster upscaling when I create a namespace?

5/7/2020

I'm observing a strange behaviour of a newly created cluster in GKE. Just after creating it, there is one node. When I create my first namespace, it autoscales up to 2 nodes, although the resources on the first node are still very low. What could be the cause of that and a way to prevent it? I've created my cluster with the following definition (using python API):

            cluster={
                "name": "mycluster",
                "initial_cluster_version": "latest",
                "network_policy": {
                    "enabled": True,
                    "provider": "PROVIDER_UNSPECIFIED"
                },
                "node_pools": [
                    {
                        "name": "default",
                        "autoscaling": {
                            "enabled": True,
                            "max_node_count": 5,
                            "min_node_count": 1
                        },
                        "config": {
                            "image_type": "UBUNTU",
                            "machine_type": "n1-standard-4",
                            "oauth_scopes": [
                                # Allows pulling images from GCR
                                "https://www.googleapis.com/auth/devstorage.read_only",

                                # Needed for monitoring
                                "https://www.googleapis.com/auth/logging.write",
                                "https://www.googleapis.com/auth/monitoring"
                            ]
                        },
                        "initial_node_count": 1
                    }
                ]
            },

The events logs are the following:

[
 {
   "insertId": "7be8fc3c-b770-4013-9500-09be89e39935@a1",
   "jsonPayload": {
     "status": {
       "measureTime": "1589026811",
       "autoscaledNodesCount": 1,
       "autoscaledNodesTarget": 1
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d",
       "cluster_name": "cluster-test"
     }
   },
   "timestamp": "2020-05-09T12:20:12.229243264Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:20:13.028616473Z"
 },
 {
   "insertId": "9a09bd12-a44b-4360-a3c6-1072d7f9b098@a1",
   "jsonPayload": {
     "decision": {
       "scaleUp": {
         "increasedMigs": [
           {
             "requestedNodes": 1,
             "mig": {
               "zone": "europe-west1-d",
               "name": "gke-cluster-test-default-pool-e317f053-grp",
               "nodepool": "default-pool"
             }
           }
         ],
         "triggeringPodsTotalCount": 1,
         "triggeringPods": [
           {
             "controller": {
               "name": "calico-typha-8dd55d66c",
               "kind": "ReplicaSet",
               "apiVersion": "apps/v1"
             },
             "name": "calico-typha-8dd55d66c-gvwsx"
           }
         ]
       },
       "decideTime": "1589026971",
       "eventId": "15bc10aa-a0d9-4e3f-a0f9-bf3d16bd13e5"
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d",
       "cluster_name": "cluster-test"
     }
   },
   "timestamp": "2020-05-09T12:22:51.653877878Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:22:52.331187267Z"
 },
 {
   "insertId": "221df494-6573-4eec-8524-a9c28e984b93@a1",
   "jsonPayload": {
     "status": {
       "autoscaledNodesTarget": 2,
       "measureTime": "1589026968",
       "autoscaledNodesCount": 1
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d",
       "cluster_name": "cluster-test"
     }
   },
   "timestamp": "2020-05-09T12:22:51.682941907Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:22:52.140573874Z"
 },
 {
   "insertId": "867c38f2-caf2-47f8-a904-baed5fa79418@a1",
   "jsonPayload": {
     "status": {
       "measureTime": "1589027087",
       "autoscaledNodesCount": 2,
       "autoscaledNodesTarget": 2
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "cluster_name": "cluster-test",
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d"
     }
   },
   "timestamp": "2020-05-09T12:24:48.480433786Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:24:49.510037788Z"
 },
 {
   "insertId": "4e134516-d91b-47eb-8df7-d60d7be2fcca@a1",
   "jsonPayload": {
     "resultInfo": {
       "measureTime": "1589027087",
       "results": [
         {
           "eventId": "15bc10aa-a0d9-4e3f-a0f9-bf3d16bd13e5"
         }
       ]
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d",
       "cluster_name": "cluster-test"
     }
   },
   "timestamp": "2020-05-09T12:24:48.514851831Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:24:49.545221929Z"
 },
 {
   "insertId": "5674038e-8931-4530-b6b2-9854e6731573@a1",
   "jsonPayload": {
     "status": {
       "measureTime": "1589027277",
       "autoscaledNodesCount": 2,
       "autoscaledNodesTarget": 2
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "cluster_name": "cluster-test",
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d"
     }
   },
   "timestamp": "2020-05-09T12:27:58.294272079Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:27:58.638736825Z"
 },
 {
   "insertId": "6d26df4f-039d-4c18-8fc0-441891b30e4b@a1",
   "jsonPayload": {
     "status": {
       "autoscaledNodesTarget": 2,
       "measureTime": "1589027467",
       "autoscaledNodesCount": 2
     }
   },
   "resource": {
     "type": "k8s_cluster",
     "labels": {
       "project_id": "arlas-cloud-sandbox",
       "location": "europe-west1-d",
       "cluster_name": "cluster-test"
     }
   },
   "timestamp": "2020-05-09T12:31:08.443814963Z",
   "logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
   "receiveTimestamp": "2020-05-09T12:31:08.951331569Z"
 }
]

Looks like calico-typha is the culprit, but what's going on?

-- Alain B.
google-kubernetes-engine
project-calico

1 Answer

5/19/2020

TL;DR

Your cluster does not scale-up when you create a namespace.

Here is the reason:

Limitations and requirements

Your cluster must have at least 2 nodes of type n1-standard-1 or higher. The recommended minimum size cluster to run network policy enforcement is 3 n1-standard-1 instances.

Cloud.google.com: Kubernetes Engine: Network Policy: Limitations and requirements

The fact that you created your GKE cluster with inital node count of 1 caused the calico-typha-XXX to send a request to scale-up the cluster to minimum of 2 nodes.


Assume the following:

  • GKE cluster with release channel of Regular
  • Autoscaling enabled with:
    • inital node count: 1 node
    • minimum: 1 node
    • maximum: 3 nodes
  • Nodes with machine type: n1-standard-1 or higher
  • Network Policy enabled.

When you create cluster with above requirements you will get a cluster with 1 node. This will change as soon as the calico-typha-XXX-XXX will detect that the amount of nodes is less than 2 and it will send a request to scale-up.

You can get more detailed logs about this by issuing commands:

  • $ kubectl get pods -A
  • $ kubectl describe pod -n kube-system calico-typha-XXX-XXX

You should get the part of the output similar to this:

  Normal   TriggeredScaleUp  18m   cluster-autoscaler                              pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/REDACTED/zones/europe-west3-c/instanceGroups/gke-ubuntu-grp 1->2 (max: 3)}]

You can also look in Kubernetes events log:

  • kubectl get events -A

Please take in mind that parameter -A is causing to output much more valuable information like:

kube-system   3m6s        Normal    TriggeredScaleUp          pod/calico-typha-6b8d44c954-7s9zx                                pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/REDACTED/zones/europe-west3-c/instanceGroups/gke-ubuntu-grp 1->2 (max: 3)}]

Please take a look on additional documentation:

-- Dawid Kruk
Source: StackOverflow