Kubernetes deployment and replication controller give the ability to self heal by ensuring a minimum number of replicas is/are present.
Also the auto scaling features, allows to increase replicas given a specific cpu threshold.
Are there tools available that would provide flexibility in the auto-healing and auto-scale features?
Example : Auto-adjust number of replicas during peak hours or days. When the pod dies, and is due to external issues, prevent the system from re-creating container and wait for a condition to succeed, i.e. ping or telnet test.
You can block pod startup by waiting for external services in an entrypoint script or init container. That's the closest that exists today to waiting for external conditions.
There is no time based autoscaler today, although it would be possible to script it failure easily on a schedule.
In Openshift, you can easily scale your app by running this command in a cron job.
Scale command
oc scale dc app --replicas=5
And of course, scale it down changing the numer of replicas.
Autoscale
This is what Openshift for developers write about autoscaling. OpenShift also supports automatic scaling, defining upper and lower thresholds for CPU usage by pod.
If the upper threshold is consistently exceeded by the running pods for your application, a new instance of your application will be started. When CPU usage drops back below the lower threshold, because your application is no longer working as hard, the number of instances will be scaled back again.
I think Kubernetes now released version 1.3 which allows autoscale but integrated yet in Openshift.
Health Check
What it comes to health check, Openshift has:
readiness checks Checks the status of the test you configure before the router start to send traffic to it.
liveness probe: liveness probe is run periodically once traffic has been switched to an instance of your application to ensure it is still behaving correctly. If the liveness probe fails, OpenShift will automatically shut down that instance of your application and replace it with a new one.
You can perform this kind of tests (HTTP check, Container execution check and TCP socket check)
So e this tolos I guess you can créate some readiness check and liveness check to ensure that the status of your pod is running properly, if not a new deployment will be triggered until readiness status comes to ok.