Created a two node cluster with kubeadm.
Installed istio 1.1.11
kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:32:14Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Executed the commands as given in istio documentation
$for i in install/kubernetes/helm/istio-init/files/crd*yaml; do kubectl apply -f $i; done
$ kubectl apply -f install/kubernetes/istio-demo.yaml
Services got created.
$ kubectl get pods -n istio-system
Telemetry and policy pod status turned to CrashLoopBackOff status
istio-policy-648b5f5bb5-dv5np 1/2 **CrashLoopBackOff** 5 2m52s
istio-telemetry-57946b8569-9m7gd 1/2 **CrashLoopBackOff** 5 2m52s
While describing the pod, getting the following error
Warning FailedMount 2m16s (x2 over 2m18s) kubelet, ip-xxx-xxx-xxx-xxx MountVolume.SetUp failed for volume "policy-adapter-secret" : couldn't propagate object cache: timed out waiting for the condition
Tried restarting the VM, restarted docker service. It did not help.
Because of the above error, the pod repeatedly try to restart and then crash.
Need your help in resolving this
Around the network you can find many issues related to couldn't propagate object cache: timed out waiting for the condition
. There is already opened issue on Github - https://github.com/kubernetes/kubernetes/issues/70044
As one of many steps to resolve it please try to:
In case related to ISTIO I have tried it on Kubeadm, Minikube and GKE. In all cases istio-policy-XXX-XXX
and istio-telemetry-XXX-XXX
were restarted due to liveness proble fail.
telemetry example
---
Warning Unhealthy 8m49s (x9 over 9m29s) kubelet, gke-istio-default-pool-c41459f8-zbhn Liveness probe failed: Get http://10.56.0.6:15014/version: dial tcp 10.56.0.6:15014: connect: connection refused
Normal Killing 8m49s (x3 over 9m19s) kubelet, gke-istio-default-pool-c41459f8-zbhn Killing container with id docker://mixer:Container failed liveness probe.. Container will be killed and recreated.
policy example
---
Warning Unhealthy 7m28s (x9 over 8m8s) kubelet, gke-istio-default-pool-c41459f8-3c6d Liveness probe failed: Get http://10.56.2.6:15014/version: dial tcp 10.56.2.6:15014: connect: connection refused
Normal Killing 7m28s (x3 over 7m58s) kubelet, gke-istio-default-pool-c41459f8-3c6d Killing container with id docker://mixer:Container failed liveness probe.. Container will be killed and recreated.
Even in documentation example you can observe that telemetry and policy were restarted 2 times.
After verification both YAMLs (istio-demo.yaml and istio-demo-auth.yaml) i found that telemetry and policy Deployments
have set liveness probe to 5 seconds.
livenessProbe:
httpGet:
path: /version
port: 15014
initialDelaySeconds: 5
periodSeconds: 5
If you will use kubectl logs on mixer container from istio-telemetry
pod you might be able to see some errors like
2019-07-18T15:16:01.887334Z info pickfirstBalancer: HandleSubConnStateChange: 0xc420751a80, CONNECTING
...
2019-07-18T15:16:21.887741Z info pickfirstBalancer: HandleSubConnStateChange: 0xc420751a80, TRANSIENT_FAILURE
2019-07-18T15:16:21.887862Z error mcp Failed to create a new MCP sink stream: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.25.27:9901: i/o timeout"
...
2019-07-18T15:16:44.282027Z info pickfirstBalancer: HandleSubConnStateChange: 0xc420751a80, CONNECTING
2019-07-18T15:16:44.287281Z info pickfirstBalancer: HandleSubConnStateChange: 0xc420751a80, READY
2019-07-18T15:16:44.888794Z info mcp (re)trying to establish new MCP sink stream
2019-07-18T15:16:44.888922Z info mcp New MCP sink stream created
So in short, mixer container in both (telemetry and policy) deployments need about 44 seconds to establish all connections.
If you will change initialDelaySeconds:
to 60 seconds in both Deployments, pods should not be restarted due to liveness probe.
Hope it helps
These Mixer services may be crashlooping if your node(s) don't have enough memory to run Istio. More and more, people use tools like Meshery to install Istio (and other service meshes), because it will highlight points of contention like that of memory. When deploying either the istio-demo
or istio-demo-auth
configuration profiles, you'll want to ensure you have a minimum of 4GB RAM per node (particularly, if the Istio control plane is only deployed to one node).