Sorry kinda new to K8s...
I'm using a k8s cronjob to push etcd snapshots to our s3 object store. There are 7 etcd nodes per cluster and I have a job configured to run to completion 7 times with parallelism of 7. Using tolerations and node selectors I limit the scope to my etcd nodes.
apiVersion: batch/v1beta1 kind: CronJob metadata: name: etcd-backup-to-s3 namespace: backups spec: concurrencyPolicy: Allow failedJobsHistoryLimit: 10 jobTemplate: metadata: creationTimestamp: null spec: activeDeadlineSeconds: 300 backoffLimit: 3 completions: 7 parallelism: 7 <SNIP>
Is there a way to configure this to handle a scenario where 1 or more etcd nodes might be offline?
ie: Any way to dynamically compute the value needed for 'completions' so that at run time I can determine how many etcd nodes there are online at that time.
Or is there a completely different way I should be going about this?
Thanks for any help.
If the script in the job can exit successfully in case of connection timeout, or as alternative in case of failed etcd probe(ping), all jobs in cronjob will complete even if some of etcd instances is down.
For example, you can fail the job only if probe stage was successful but backup stage was failed.