Reduce Cloud Run on GKE costs

8/5/2019

would be great if I could have to answers to the following questions on Google Cloud Run

  1. If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
  2. If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance? The memory allocated dropdowon
  3. If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
  4. I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs. curl -v -H "Host: hello.default.example.com" YOUR-IP

So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?

Any help will be very much appreciated. Thank you.

-- Pranay Shah
google-cloud-platform
google-cloud-run
google-kubernetes-engine

1 Answer

8/5/2019

1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu

2: can you precise what is asked and when performing which operation?

3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.

4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.

-- guillaume blaquiere
Source: StackOverflow