Enable autostart of kubernetes service

1/7/2021

Is there a way to get a kubernetes-service restarted in case the service is deleted or if it goes down ?

We have two services defined in a file service-zookeeper-group.yaml

On making a random check we found the service "zk-group" missing. (maybe someone accidentally deleted it)

zk-service  LoadBalancer  10.88.12.113   10.128.0.3   2181:32581/TCP
zk-group    ClusterIP     None           <none>       <none>

Is there a way to perform "liveness" check for the service "zk-group" and get this service started if it goes down ?

service-zookeeper-group.yaml

kind: Service
metadata:
  name: zk-group
  namespace: {{ .Release.Namespace }}
  labels:
    app: {{ .Release.Name }}-zk-app
spec:
  clusterIP: None
  selector:
    app: {{ .Release.Name }}-zk-app


apiVersion: v1
kind: Service
metadata:
  name: zk-service
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
  labels:
    app: {{ .Release.Name }}-zk-app
spec:
  ports:
  - protocol: TCP
    name: tcp-{{ .Values.zk.port }}
    port: {{ .Values.zk.port }}
    targetPort: {{ .Values.zk.port }}
  type: LoadBalancer
  loadBalancerIP: {{ .Values.networking.loadBalancerIP }}
  loadBalancerSourceRanges: {{ .Values.networking.loadBalancerSourceRanges | toJson }}
  selector:
    app: {{ .Release.Name }}-zk-app
-- George
kubernetes
service

1 Answer

1/7/2021

There is no Kubernetes "native" utility to check if a service exists and - if not - create one. If you really need this functionality you would probably need to extend your Cluster with a Kubernetes Controller.

You can recreate the desired state with the GitOps pattern utilizing a GitOps controller such as Flux or Argo. If someone really "accidentally" removes a resource those controllers would ensure to reapply the desired state within a given period of time.

"maybe someone accidentally deleted it"

Your specific case sounds more like a lack of monitoring and governance:

  1. If a service is deleted accidentally - why aren't any alerts popping up? A simple probe test (e.g. via Prometheus BlackBox exporter or Elastic Heartbeat (if you are using the elastic stack)) can check for the availability of services within the cluster.
  2. How is it possible that something important like a service gets deleted accidentally? Who is working with the cluster and who really needs the power to delete/adjust service resources? I guess it makes sense to evaluate the roles/permissions on your cluster.
  3. Consider activating the Kubernetes audit trail: If I would be the cluster administrator I'd like to know who deleted the service.
-- cvoigt
Source: StackOverflow