StatefulSet/kfserving-controller-manager: Back-off restarting failed container

9/24/2019

I would like to install kubeflow on prem (I have Ubuntu 19.04 with 32GB of RAM). To do this, below are specs :

## microk8s
# Install
snap install microk8s --classic --stable

# version
microk8s.kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

## kfctl
kfctl version
# kfctl v0.6.2-0-g47a0e4c7

## kustomize
kustomize version
Version: {KustomizeVersion:3.2.0 GitCommit:a3103f1e62ddb5b696daa3fd359bb6f2e8333b49 BuildDate:2019-09-18T16:26:36Z GoOs:linux GoArch:amd64}

In order to deploy kubeflow, I found several yaml files:

microk8s.kubectl -n kubeflow get statefulsets

#NAME                                       READY   AGE
#admission-webhook-bootstrap-stateful-set   1/1     127m
#application-controller-stateful-set        1/1     127m
#kfserving-controller-manager               0/1     126m
#metacontroller                             1/1     127m
#seldon-operator-controller-manager         1/1     126m
microk8s.kubectl -n kubeflow describe statefulsets/kfserving-controller-manager
Name:               kfserving-controller-manager
Namespace:          kubeflow
CreationTimestamp:  Tue, 24 Sep 2019 03:54:37 +0400
Selector:           control-plane=kfserving-controller-manager,controller-tools.k8s.io=1.0,kustomize.component=kfserving
Labels:             control-plane=kfserving-controller-manager
                    controller-tools.k8s.io=1.0
                    kustomize.component=kfserving
Annotations:        kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"apps/v1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"control-plane":"kfserving-controller-manager","contro...
Replicas:           1 desired | 1 total
Update Strategy:    RollingUpdate
  Partition:        824638326680
Pods Status:        1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  control-plane=kfserving-controller-manager
           controller-tools.k8s.io=1.0
           kustomize.component=kfserving
  Containers:
   kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    Environment:  <none>
    Mounts:       <none>
   manager:
    Image:      gcr.io/kfserving/kfserving-controller:v0.1.1
    Port:       9876/TCP
    Host Port:  0/TCP
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
      SECRET_NAME:    kfserving-webhook-server-secret
    Mounts:
      /tmp/cert from cert (ro)
  Volumes:
   cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kfserving-webhook-server-secret
    Optional:    false
Volume Claims:   <none>
Events:          <none>
microk8s.kubectl -n kubeflow describe pods/kfserving-controller-manager-0
Name:           kfserving-controller-manager-0
Namespace:      kubeflow
Priority:       0
Node:           amine-x580vd/192.168.0.19
Start Time:     Tue, 24 Sep 2019 03:54:37 +0400
Labels:         control-plane=kfserving-controller-manager
                controller-revision-hash=kfserving-controller-manager-65cfcdb6c5
                controller-tools.k8s.io=1.0
                kustomize.component=kfserving
                statefulset.kubernetes.io/pod-name=kfserving-controller-manager-0
Annotations:    <none>
Status:         Running
IP:             10.1.1.217
Controlled By:  StatefulSet/kfserving-controller-manager
Containers:
  kube-rbac-proxy:
    Container ID:  containerd://4236fbf1001cbe82ec44a7a220b42e4a25e66f15fc8bd0ef6daee1168021e9a9
    Image:         gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
    Image ID:      gcr.io/kubebuilder/kube-rbac-proxy@sha256:297896d96b827bbcb1abd696da1b2d81cab88359ac34cce0e8281f266b4e08de
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=10
    State:          Running
      Started:      Tue, 24 Sep 2019 03:54:48 +0400
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r4lgm (ro)
  manager:
    Container ID:  containerd://30c1e1e55042c28886c5bf45bb0d603dd76abf09524b367976e9e27f72fde4d8
    Image:         gcr.io/kfserving/kfserving-controller:v0.1.1
    Image ID:      gcr.io/kfserving/kfserving-controller@sha256:6809110b3db68530c8f96df5b965fb7003316055dc0d3af6736328a9327d2ad9
    Port:          9876/TCP
    Host Port:     0/TCP
    Command:
      /manager
    Args:
      --metrics-addr=127.0.0.1:8080
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 24 Sep 2019 06:00:27 +0400
      Finished:     Tue, 24 Sep 2019 06:00:28 +0400
    Ready:          False
    Restart Count:  29
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      POD_NAMESPACE:  kubeflow (v1:metadata.namespace)
      SECRET_NAME:    kfserving-webhook-server-secret
    Mounts:
      /tmp/cert from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r4lgm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kfserving-webhook-server-secret
    Optional:    false
  default-token-r4lgm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-r4lgm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                   From                   Message
  ----     ------   ----                  ----                   -------
  Normal   Pulling  20m (x27 over 130m)   kubelet, amine-x580vd  Pulling image "gcr.io/kfserving/kfserving-controller:v0.1.1"
  Warning  BackOff  17s (x588 over 130m)  kubelet, amine-x580vd  Back-off restarting failed container

So suggestions to get this statefulset working correctly? Or may be any suggestion of the version of KubeFlow to install on prem?

Thanks

-- Amine Jallouli
kubeflow
kubernetes
microk8s

0 Answers