Random timeouts at Node.js + gRPC application on Kubernetes

10/29/2018

We have a weird networking issue.

We have a Hyperledger Fabric client application written in Node.js running in Kubernetes which communicates with an external Hyperledger Fabric Network.

We randomly get timeout errors on this communication. When the pod is restarted, all goes good for a while then timeout errors start, sometimes randomly fixed on its own and then goes bad again.

This is Azure EKS, we setup a quick Kubernetes cluster in AWS with Rancher and deployed the app there and same timeout error happened there too.

We ran scripts in the same container all night long which hits the external Hyperledger endpoint both with cURL and a small Node.js script every minute and we didnt get even a single error.

We ran the application in another VM as plain Docker containers and there was no issue there.

We inspected the network traffic inside container, when this issue happens, we can see with netstat a connection is established but tcpdump shows no traffic, no packages are even tried to be sent.

Checking Hyperledger Fabric SDK code, it uses gRPC protocol buffers behind the scenes.

So any clues maybe?

-- r a f t
grpc
hyperledger-fabric
kubernetes
node.js

1 Answer

11/2/2018

This turned out to be not Kubernetes but dropped connection issue.

gRPC keeps connection open and after some period of inactivity intermediary components drop the connection. In Azure AKS case this is the load balancer, as every outbound connection goes through a load balancer. There is a non configurable idle timeout period of 4 minutes after which load balancer drops the connection.

The fix is configuring gRPC for sending keep alive messages.

Scripts in the container worked without a problem, as they open a new connection every time they run.

Application running as plain Docker containers didnt have this issue since we were hitting endpoints every minute hence never reaching idle timeout threshold. When we hit endpoints every 10 minutes, timeout issue also started there too.

-- r a f t
Source: StackOverflow