How many Spark Executor Pods you run per Kubernetes Node

5/28/2019

Spark needs lots of resources to does its job. Kubernetes is great environment for resource management. How many Spark PODs do you run per node to have the best resource utilization?

Trying to run Spark Cluster on Kubernetes Cluster.

-- hnajafi
apache-spark
kubernetes

1 Answer

6/25/2019

It depends on many factors. We need to know how much resources do you have and how much is being consumed by the pods. To do so you need to setup a Metrics-server.

Metrics Server is a cluster-wide aggregator of resource usage data.

Next step is to setup HPA.

The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization or other custom metrics. HPA normally fetches metrics from a series of aggregated APIs:

  • metrics.k8s.io
  • custom.metrics.k8s.io
  • external.metrics.k8s.io

How to make it work?

HPA is being supported by kubectl by default:

  • kubectl create - creates a new autoscaler
  • kubectl get hpa - lists your autoscalers
  • kubectl describe hpa - gets a detailed description of autoscalers
  • kubectl delete - deletes an autoscaler

Example: kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80 creates an autoscaler for replication set foo, with target CPU utilization set to 80% and the number of replicas between 2 and 5. You can and should adjust all values to your needs.

Here is a detailed documentation of how to use kubectl autoscale command.

Please let me know if you find that useful.

-- OhHiMark
Source: StackOverflow