Apache Ignite: 1000s of warnings "Unable to perform handshake within timeout" get added to the log

6/3/2020

Recently I've updated Apache Ignite running in my .Net Core 3.1 application from 2.7.5 to 2.8.1 and today I noticed thousands of warnings like this in the log

Jun 03 18:26:54 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57941]
Jun 03 18:26:59 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57982]
Jun 03 18:26:59 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:57985]
Jun 03 18:27:04 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58050]
Jun 03 18:27:04 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58051]
Jun 03 18:27:09 quote-service-uk-deployment-d644cbc86-7xcvw org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58114]
Jun 03 18:27:09 quote-service-us-deployment-5d874d8546-psbcs org.apache.ignite.internal.processors.odbc.ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout [timeout=10000, remoteAddr=/10.250.0.4:58118] 

I don't use ODBC or JDBC directly in my app and the app is running in a Kubernetes cluster in a virtual network. Interestingly, in all cases the IP on the other end of connection (10.250.0.4 in this case) belongs to the kube-proxy pod. I am a bit perplexed by this.

UPD: The same IP address is reported to belong also to the following pods: azure-ip-masq-agent and azure-cni-networkmonitor (I guess those belong to Azure Kubernetes Services that I use to run the K8s cluster)

So it is possible that the network monitor is attempting to reach the ODBC port (just guessing). Is there any opportunity to suppress that warning or disable ODBC connections at all? I don't use ODBC but I'd like to keep the JDBC connections enabled as I occasionally connect to the Ignite instances using DBeaver. Thank you!

-- Alex Avrutin
ignite
kubernetes
load-balancing

1 Answer

6/3/2020

If you've defined a service and opened port 10800 then K8 will perform a health check through kube-proxy. This causes Ignite to receive an incomplete handshake on that port log the "unable to perform handshake" message.

ClientListenerNioListener: Site: WARN - Unable to perform handshake within timeout timeout=10000, remoteAddr=/10.250.0.4:58050

Here the client connector listener(ClientListenerNioListener) is saying that it was not able to establish a successful handshake within 10 seconds to remoteAddr=/10.250.0.4:58050

config client connector: https://apacheignite.readme.io/docs/binary-client-protocol#connectivity
client connector handshake: https://apacheignite.readme.io/docs/binary-client-protocol#connection-handshake
 
 

example of service w/port 10800 opened:

kind: Service
metadata: 
  # The name must be equal to TcpDiscoveryKubernetesIpFinder.serviceName
  name: ignite
  # The name must be equal to TcpDiscoveryKubernetesIpFinder.namespaceName
  namespace: ignite
spec:
  type: LoadBalancer
  ports:
    - name: rest
      port: 8080
      targetPort: 8080
    - name: sql
      port: 10800
      targetPort: 10800

You can redefine the service to not open the port or update the service definition to use different ports for the healthcheck: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

from the doc:
service.spec.healthCheckNodePort - specifies the health check node port (numeric port number) for the service. If healthCheckNodePort isn’t specified, the service controller allocates a port from your cluster’s NodePort range. You can configure that range by setting an API server command line option, --service-node-port-range. It will use the user-specified healthCheckNodePort value if specified by the client. It only has an effect when type is set to LoadBalancer and externalTrafficPolicy is set to Local.

-- Alex K
Source: StackOverflow