I run a k8s cluster with autoscaling on AWS. I use the cluster as to run Spark (master + workers). Part of it is the following node-group for the worker nodes:
managedNodeGroups:
- name: mng0
instanceType: m5.large
desiredCapacity: 0
privateNetworking: false # if only 'Private' subnets are given, this must be enabled
minSize: 4
maxSize: 50
securityGroups:
attachIDs: xxxxx
iam:
withAddonPolicies:
autoScaler: true
cloudwatch: true
With this setup, I have always at least 4 nodes available for a 'warm start' in case a spark job comes in, to avoid the +-2 min. Now if nodes are requested by a second spark job (and >4 nodes are up allready), then the 2nd job again has to wait for more nodes be started. I want to create a situation where a new spark job is always picked up right away, without the 'starting a new node' overhead. This is especially relevant for me since the dataset size varies a lot (from MBs to TBs) and is used for exploratory analysis as well as ETL, where for exploratory analysis on small datasets I want a
Question: Can I specify a number of idle/waiting/standby nodes ready to accept new spark jobs?
Is this the right approach for achieving what I want, or is there a better approach?