I am using Datastax's c++ driver version 2.8.0 for Apache Cassandra inside a kubernetes application. Cassandra is deployed as a 3 node cluster via this Helm chart.
The chart leverages kubernetes' headless services to make the Cassandra endpoints available, so there is an entry in the kubernetes DNS for those endpoints.
I have a c++ app running in a kubernetes pod that interacts with Cassandra, which connects using that DNS entry to resolve the endpoints. The application has a single connection to Cassandra object, following the driver usage guidelines. Connection is initialized at the beginning of the program, and failure to initialize the connection or to execute a query later on will actually fail the program.
Everything is working fine, but cassandra nodes/pods may eventually go down for some reason. When that happens, they're respawned, but get reassigned with a different IP. It seems like the c++ driver is able to get the new endpoints from the DNS without any additional code. However in such a situation the connection is not closed on the client side, and it looks like the previous endpoints remain in the connection pool on some level. This leads to a series of log events similar to the following:
1531920921.161 [WARN] (src/pool.cpp:420:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to reconnect to host XXX.XXX.XXX.XX because of the following error: Connection timeout
and
1531920921.894 [WARN] (src/pool.cpp:420:virtual void cass::Pool::on_close(cass::Connection*)): Connection pool was unable to reconnect to host XXX.XXX.XXX.XX because of the following error: Connect error 'host is unreachable'
Which pop up every [reconnect timeout]. The more IP reassignments, the more log messages, which as you can guess can get to a pretty large number for long lived applications.
Is there some feature of the driver's API that allows dealing with that? Or a good/recommended way to handle that client side, more generally? One option, external to the driver, could be to reset the connection within the client code, but (although I may be missing out) I fail to see a way to "catch" such events : they only show up in the logs.