Kubernetes "fatal alert: protocol_version" error when creating a Deployment

7/15/2016

We're running a kubernetes cluster on the Google Cloud Platform, which creates a Deployment with 8 hazelcast-based replicas. We've had this running fine for over a month, but recently, we started receiving the below error message whenever we try to start our deployment (non-relevant stack frames omitted):

2016-07-15 12:58:02,117 [My-hazelcast.my-deployment-368708980-8v7ig @ my-deployment-368708980-8v7ig] ERROR - [10.68.5.3]:5701 [MyProject] [3.6.2] Error executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/my-service. Cause: Received fatal alert: protocol_version 

io.fabric8.kubernetes.client.KubernetesClientException: Error executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/my-service. Cause: Received fatal alert: protocol_version
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestException(OperationSupport.java:272) ~[kubernetes-client-1.3.66.jar:na]
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:205) ~[kubernetes-client-1.3.66.jar:na]
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:196) ~[kubernetes-client-1.3.66.jar:na]
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:483) ~[kubernetes-client-1.3.66.jar:na]
  at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:108) ~[kubernetes-client-1.3.66.jar:na]
  at com.noctarius.hazelcast.kubernetes.ServiceEndpointResolver.resolve(ServiceEndpointResolver.java:62) ~[hazelcast-kubernetes-discovery-0.9.2.jar:na]
  at com.noctarius.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy.discoverNodes(HazelcastKubernetesDiscoveryStrategy.java:74) ~[hazelcast-kubernetes-discovery-0.9.2.jar:na]
  at com.hazelcast.spi.discovery.impl.DefaultDiscoveryService.discoverNodes(DefaultDiscoveryService.java:74) ~[hazelcast-all-3.6.2.jar:3.6.2]
  ....
Caused by: javax.net.ssl.SSLException: Received fatal alert: protocol_version
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[na:1.7.0_95]
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) ~[na:1.7.0_95]
  at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:1991) ~[na:1.7.0_95]
  ...
  at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:203) ~[kubernetes-client-1.3.66.jar:na]
... 18 common frames omitted

When I google this error, I get a lot of hits about TLS protocol version mismatch. Apparently, Java 8 assumes a different TLS protocol version (TLS 1.2) than Java 7 and 6(TLS 1.0). However, all of our containers run the same docker image (based off of the hazelcast/hazelcast:3.6.2 image), which is based off of Java 7, so there should be no protocol version mismatch (and this layer of our image has not changed).

We've tried to revert all of our recent changes in an attempt to resolve this error, to no avail. And frankly, nobody on our team has changed anything receltly related to SSL or the Hazelcast Kubernetes discovery mechanism. We recently updated our google cloud SDK components (gcloud components update) at the urging of the Cloud SDK tools ("Updates are available for some Cloud SDK components."). We're now running Google Clouds SDK version 117.0.0, but I don't see any breaking changes related to SSL or TLS in the release notes.

Why would we suddenly start seeing this "fatal alert: protocol_version" error message in our kubernetes pods, and how can I resolve it?

-- Ogre Psalm33
google-cloud-platform
hazelcast
java
kubernetes
ssl

1 Answer

7/18/2016

The initial google searches indicating this was a TLS version error (version 1.0 vs 1.2 incompatibility) turned out to be useful. This answer to a question about a similar SSLException protocol_version error is what pointed me in the right direction.

I got a test container to run, and using kubectl exec my-test-pod -i -t -- /bin/bash -il to launch an interactive bash shell into the container, I determined that the Hazelcast discovery service could NOT connect using TLS 1.0, but could using TLS 1.2:

/opt/hazelcast# curl -k  --tlsv1.0 https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/my-service
curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

/opt/hazelcast# curl -k  --tlsv1.2 https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/my-service
Unauthorized               # <-- Unauthorized is expected, as I didn't specify a user/passwd.

I am still not sure what exactly changed, possibly a layer of a public Docker container we use, possibly something within Google cloud service (Java 7 is End of Life, after all), and the fine folks at Hazelcast suggested perhaps the REST API had been updated. But evidently something changed that was causing the discovery service expect clients to TLS version 1.2.

The solution was to download the Hazelcast Docker image we were using, and tweak it to use Java 8 instead of Java 7, and then rebuild the image in our own development sandbox:

$ pwd
/home/jdoe/devel/hazelcast-docker-3.6.2/hazelcast-oss
$ head -n3 Dockerfile
FROM java:8
ENV HZ_VERSION 3.6.2
ENV HZ_HOME /opt/hazelcast/

Voila! Our Deployment is running again.

-- Ogre Psalm33
Source: StackOverflow