Periodic Istio Citadel pod restart

9/27/2021

Istio-citadel pods of istio 1.4.10 helm release, are restarted periodically.

2 replicas of istio-citadel are running.

This happens once in every 4-5 days, when the number of CSR requests reaches 28.3k and memory reaches 9.8G. Memory increases steadily after the restart until it crashes again. CPU spike is also observed consuming around 10 CPU.

Could see the below error log 4 mins before the restart.

Sep 27, 2021 @ 12:20:39.3702021-09-27T06:50:39.370213Z	error	ca.liveness turns unavailable: 1 error occurred:
Sep 27, 2021 @ 12:20:39.370	* liveness: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp <citadel-service-ip>:8060: connect: cannot assign requested address"

Want to understand the error message and the reason that could explain periodic restart.

-- Sharat Naik
amazon-web-services
grpc
istio
istio-operator
kubernetes