I have created a kubernetes cluster using EKS and deployed there my application with 3 replications. My application connects to an instance of aurora DB accessible by a public URL. For some reason this morning (first use of the application) there was a UnknownHostException in my pod logs as below:
2019-01-30 08:34:47.352 WARN 5 --- [onnection adder] unknown.jul.logger : IOException occurred while connecting to my-database-aurora-psql.cc3ft0tcxorz.eu-north-1.rds.amazonaws.com:5999
java.net.UnknownHostException: my-database-aurora-psql.cc3ft0tcxorz.eu-north-1.rds.amazonaws.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[na:1.8.0_181]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_181]
at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_181]
at org.postgresql.core.PGStream.<init>(PGStream.java:69) ~[postgresql-42.2.1.jar!/:42.2.1]
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:158) ~[postgresql-42.2.1.jar!/:42.2.1]
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) [postgresql-42.2.1.jar!/:42.2.1]
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) [postgresql-42.2.1.jar!/:42.2.1]
at org.postgresql.Driver.makeConnection(Driver.java:452) [postgresql-42.2.1.jar!/:42.2.1]
at org.postgresql.Driver.connect(Driver.java:254) [postgresql-42.2.1.jar!/:42.2.1]
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:117) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:365) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:194) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:460) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:697) [HikariCP-2.7.8.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:683) [HikariCP-2.7.8.jar!/:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
I have another application using this database and didn't have any problem accessing this DB Url. Also I can see from the console that the DB was never down. Aurora DB is running in the same AWS Zone as the EKS Kubernetes cluster. Does this have anything to do with some internal network problem? Is EKS using internal routing in this case? I am thinking that maybe some internal route didn't work since the other application which is not running in this cluster (not at all in AWS) didn't have this issue.
This mostly looks like a temporary DNS resolution failure, either at route 53 side (which could explain why your other application wasn't impacted) or at client side. It would be very difficult to trace this down, unless you can reproduce this again, and get more logs/netstats when the issue happened.