I'm observing a strange behaviour of a newly created cluster in GKE. Just after creating it, there is one node. When I create my first namespace, it autoscales up to 2 nodes, although the resources on the first node are still very low. What could be the cause of that and a way to prevent it? I've created my cluster with the following definition (using python API):
cluster={
"name": "mycluster",
"initial_cluster_version": "latest",
"network_policy": {
"enabled": True,
"provider": "PROVIDER_UNSPECIFIED"
},
"node_pools": [
{
"name": "default",
"autoscaling": {
"enabled": True,
"max_node_count": 5,
"min_node_count": 1
},
"config": {
"image_type": "UBUNTU",
"machine_type": "n1-standard-4",
"oauth_scopes": [
# Allows pulling images from GCR
"https://www.googleapis.com/auth/devstorage.read_only",
# Needed for monitoring
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring"
]
},
"initial_node_count": 1
}
]
},
The events logs are the following:
[
{
"insertId": "7be8fc3c-b770-4013-9500-09be89e39935@a1",
"jsonPayload": {
"status": {
"measureTime": "1589026811",
"autoscaledNodesCount": 1,
"autoscaledNodesTarget": 1
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d",
"cluster_name": "cluster-test"
}
},
"timestamp": "2020-05-09T12:20:12.229243264Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:20:13.028616473Z"
},
{
"insertId": "9a09bd12-a44b-4360-a3c6-1072d7f9b098@a1",
"jsonPayload": {
"decision": {
"scaleUp": {
"increasedMigs": [
{
"requestedNodes": 1,
"mig": {
"zone": "europe-west1-d",
"name": "gke-cluster-test-default-pool-e317f053-grp",
"nodepool": "default-pool"
}
}
],
"triggeringPodsTotalCount": 1,
"triggeringPods": [
{
"controller": {
"name": "calico-typha-8dd55d66c",
"kind": "ReplicaSet",
"apiVersion": "apps/v1"
},
"name": "calico-typha-8dd55d66c-gvwsx"
}
]
},
"decideTime": "1589026971",
"eventId": "15bc10aa-a0d9-4e3f-a0f9-bf3d16bd13e5"
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d",
"cluster_name": "cluster-test"
}
},
"timestamp": "2020-05-09T12:22:51.653877878Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:22:52.331187267Z"
},
{
"insertId": "221df494-6573-4eec-8524-a9c28e984b93@a1",
"jsonPayload": {
"status": {
"autoscaledNodesTarget": 2,
"measureTime": "1589026968",
"autoscaledNodesCount": 1
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d",
"cluster_name": "cluster-test"
}
},
"timestamp": "2020-05-09T12:22:51.682941907Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:22:52.140573874Z"
},
{
"insertId": "867c38f2-caf2-47f8-a904-baed5fa79418@a1",
"jsonPayload": {
"status": {
"measureTime": "1589027087",
"autoscaledNodesCount": 2,
"autoscaledNodesTarget": 2
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"cluster_name": "cluster-test",
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d"
}
},
"timestamp": "2020-05-09T12:24:48.480433786Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:24:49.510037788Z"
},
{
"insertId": "4e134516-d91b-47eb-8df7-d60d7be2fcca@a1",
"jsonPayload": {
"resultInfo": {
"measureTime": "1589027087",
"results": [
{
"eventId": "15bc10aa-a0d9-4e3f-a0f9-bf3d16bd13e5"
}
]
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d",
"cluster_name": "cluster-test"
}
},
"timestamp": "2020-05-09T12:24:48.514851831Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:24:49.545221929Z"
},
{
"insertId": "5674038e-8931-4530-b6b2-9854e6731573@a1",
"jsonPayload": {
"status": {
"measureTime": "1589027277",
"autoscaledNodesCount": 2,
"autoscaledNodesTarget": 2
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"cluster_name": "cluster-test",
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d"
}
},
"timestamp": "2020-05-09T12:27:58.294272079Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:27:58.638736825Z"
},
{
"insertId": "6d26df4f-039d-4c18-8fc0-441891b30e4b@a1",
"jsonPayload": {
"status": {
"autoscaledNodesTarget": 2,
"measureTime": "1589027467",
"autoscaledNodesCount": 2
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"project_id": "arlas-cloud-sandbox",
"location": "europe-west1-d",
"cluster_name": "cluster-test"
}
},
"timestamp": "2020-05-09T12:31:08.443814963Z",
"logName": "projects/arlas-cloud-sandbox/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2020-05-09T12:31:08.951331569Z"
}
]
Looks like calico-typha
is the culprit, but what's going on?
TL;DR
Your cluster does not scale-up when you create a namespace.
Here is the reason:
Limitations and requirements
Your cluster must have at least 2 nodes of type n1-standard-1 or higher. The recommended minimum size cluster to run network policy enforcement is 3 n1-standard-1 instances.
Cloud.google.com: Kubernetes Engine: Network Policy: Limitations and requirements
The fact that you created your GKE
cluster with inital node count of 1 caused the calico-typha-XXX
to send a request to scale-up
the cluster to minimum of 2 nodes.
Assume the following:
GKE
cluster with release channel of Regularn1-standard-1
or higherWhen you create cluster with above requirements you will get a cluster with 1 node. This will change as soon as the calico-typha-XXX-XXX will detect that the amount of nodes is less than 2 and it will send a request to scale-up
.
You can get more detailed logs about this by issuing commands:
$ kubectl get pods -A
$ kubectl describe pod -n kube-system calico-typha-XXX-XXX
You should get the part of the output similar to this:
Normal TriggeredScaleUp 18m cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/REDACTED/zones/europe-west3-c/instanceGroups/gke-ubuntu-grp 1->2 (max: 3)}]
You can also look in Kubernetes events log:
kubectl get events -A
Please take in mind that parameter -A
is causing to output much more valuable information like:
kube-system 3m6s Normal TriggeredScaleUp pod/calico-typha-6b8d44c954-7s9zx pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/REDACTED/zones/europe-west3-c/instanceGroups/gke-ubuntu-grp 1->2 (max: 3)}]
Please take a look on additional documentation: