Is it possible, and how to limit kubernetes job to create a maxium number of pods if always fail?

1/2/2019

As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is

to create one Job object in order to reliably run one Pod to completion

But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image: enter image description here

Currently the job manifest is like this:

   {
  "apiVersion": "batch/v1",
  "kind": "Job",
  "metadata": {
    "name": "upgradeperf",
    "namespace": "ntg6-grpc26-tts"
  },
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "upgradeperfjob",
            "image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
            "command": [
              "python",
              "/jmeterwork/jmeter.py",
              "-gu",
              "git@gitlab-pri-eastus2.dev.mycompany.net:mobility-ncs-tools/tts-cdqa-tool.git",
              "-gb",
              "upgradeperf",
          "-t",
              "JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
          "-JtestDataFile",
              "JMeter/testcases/ttssvc/testData/avaml_opus.csv",
          "-JthreadNum",
              "3",
          "-JthreadLoopCount",
              "1500",
          "-JresultsFile",
              "results_upgradeperf_cavaml_opus_t3_l1500.csv",
          "-Jhost",
          "mtl-blade32-03.mycompany.com",
          "-Jport",
          "28416"
            ]
          }
        ],
        "restartPolicy": "Never",
        "imagePullSecrets": [
          {
            "name": "docker-registry-secret"
          }
        ]
      }
    }
  }
}

In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource. So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?

-- Lei Yang
kubernetes
kubernetes-jobs
kubernetes-pod

2 Answers

1/2/2019

Depending on your kubernetes version, you can resolve this problem with these methods:

  1. set the option: restartPolicy: OnFailure, then the failed container will be restarted in the same Pod, so you will not get lots of failed Pods, instead you will get a Pod with lots of restart.

  2. From Kubernetes 1.8 on, There is a parameter backoffLimit to control the restart policy of failed job. This parameter defines the retry times of the job before treating the job to be failed, default 6 times. For this parameter to work you must set the parameter restartPolicy: Never .

-- Kun Li
Source: StackOverflow

1/2/2019

You probably didn't set restartPolicy: Never in your pod spec, add that and I would expect it matches your expected behaviors better.

-- coderanger
Source: StackOverflow