How to simulate Power Failure In Kubernetes

7/1/2019

I have my rook-cephcluster running on AWS. Its loaded up with data. Is there's any way to stimulate POWER FAILURE so that I can test the behaviour of my cluster?.

-- Rajat Singh
amazon-web-services
google-kubernetes-engine
kubernetes
openshift

3 Answers

7/2/2019

It depends what is the purpose of your crash test. I see two options:

  1. You want to test if you correctly deployed Kubernetes on AWS - then, I'd terminate the related AWS EC2 Instance (or set of Instances)

  2. You want to test if your end application is resilient to Kubernetes Node failures - then I'd just check what PODs are running on the given Node and kill them all suddenly with:

kubectl delete pods <pod> --grace-period=0 --force
-- Rafał Leszko
Source: StackOverflow

7/1/2019

From Docker you can send KILL signal "SIGPWR" that Power failure (System V)

docker kill --signal="SIGPWR"

and from Kubernet

kubectl exec <pod> -- /killme.sh

and so scriplt killme.sh

beginning of script-----
#!/bin/bash
# Define process to find
kiperf=$(pidof iperf)
# Kills all iperf or command line
kill -30 $kiperf
script end -------------

signal 30 you can find here

-- Soleil
Source: StackOverflow

7/3/2019

Cluster Pods do not disappear till someone (a person or a controller) destroys them, or there is an unavoidable hardware or system software error.

Developers call these unavoidable cases involuntary disruptions to an application. Examples are:

  • a hardware failure of the physical machine backing the node
  • cluster administrator deletes VM (instance) by mistake
  • cloud provider or hypervisor failure makes VM disappear a kernel panic
  • the node disappears from the cluster due to cluster network partition
  • eviction of a pod due to the node being out-of-resources. Except for the out-of-resources condition, all these conditions should be familiar to most users; they are not specific to Kubernetes.

Developers call other cases voluntary disruptions. These include both actions initiated by the application owner and those initiated by a Cluster Administrator.

Typical application owner actions include:

  • deleting the deployment or other controller that manages the pod
  • updating a deployment’s pod template causing a restart
  • directly deleting a pod (e.g. by accident)

More information you can find here: kubernetes-discruption, application-discruption.

You can setup Prometheus on your cluster and mesure metrics during failure.

-- MaggieO
Source: StackOverflow