Lagom App error when deployed on Kubernetes Single Node Self Contained

7/18/2018

The pods are running but producing the warnings below, over and over again.

When using sbt runAll everything works fine. Deployed on Kubernetes (with resources generated by the RP) it does not.

I'm using:

rp generate-kubernetes-resources "xrates-list-impl:1.0.1"
--generate-all --registry-use-local --pod-controller-replicas 3 --external-service "kafka_native=_broker._tcp.kafka.default.svc.cluster.local"
--external-service "cas_native=_server._tcp.cassandra.default.svc.cluster.local" --env JAVA_OPTS="-Dplay.http.secret.key=****" > xrates-list-resource.yaml

Can anybody point me in which direction should I investigate?

2018-07-18T22:07:03.890Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-2, akkaTimestamp=22:07:03.890UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/kafkaProducer-xrateslist, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/kafkaProducer-xrateslistCoordinator/singleton/coordinator)]], but no acknowledgement. Total [46] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]
2018-07-18T22:07:04.680Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-5, akkaTimestamp=22:07:04.680UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/XRatesListProcessor, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/XRatesListProcessorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [46] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]
2018-07-18T22:07:05.879Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-6, akkaTimestamp=22:07:05.879UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/kafkaProducer-xrateslist, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/kafkaProducer-xrateslistCoordinator/singleton/coordinator)]], but no acknowledgement. Total [46] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]
2018-07-18T22:07:06.679Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=22:07:06.679UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/XRatesListProcessor, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/XRatesListProcessorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [47] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]
2018-07-18T22:07:07.889Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=22:07:07.889UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/kafkaProducer-xrateslist, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/kafkaProducer-xrateslistCoordinator/singleton/coordinator)]], but no acknowledgement. Total [47] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]
2018-07-18T22:07:08.679Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=22:07:08.679UTC, akkaSource=akka.tcp://application@192.168.55.137:10001/system/sharding/XRatesListProcessor, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka.tcp://application@192.168.55.134:10001/), Path(/system/sharding/XRatesListProcessorCoordinator/singleton/coordinator)]], but no acknowledgement. Total [47] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.134:10001, status = Up)] is reachable.]

Added after James Roper answer

The RP generates a Kubernetes deployment which is configured to use the akka.discovery.method = kubernetes-api

From what I understand, the pods also have properly joined the cluster, but then it tries:

Trying to register to coordinator at  [ActorSelection[Anchor(akka://application/), Path(/system/sharding/XRatesListProcessorCoordinator/singleton/coordinator)]], but no acknowledgemen t.

See the log below:

2018-07-19T08:21:50.627Z [info] akka.management.cluster.bootstrap.contactpoint.HttpClusterBootstrapRoutes [sourceThread=application-akka.actor.default-dispatcher-20, akkaTimestamp=08:21:50.627UTC, akkaSource=HttpClusterBootstrap
Routes(akka://application), sourceActorSystem=application] - Bootstrap request from 192.168.55.181:52210: Contact Point returning 1 seed-nodes ([Set(ClusterMember(akka.tcp://application@192.168.55.130:10001,-609640290,Up,Set(dc-default
)))])
2018-07-19T08:21:50.788Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-20, akkaTimestamp=08:21:50.788UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActor
System=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Received InitJoin message from [Actor[akka.tcp://application@192.168.55.181:10001/system/cluster/core/daemon/joinSeedNodeProcess-1#1697915162]] to [akka
.tcp://application@192.168.55.130:10001]
2018-07-19T08:21:50.788Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-20, akkaTimestamp=08:21:50.788UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActor
System=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Sending InitJoinAck message from node [akka.tcp://application@192.168.55.130:10001] to [Actor[akka.tcp://application@192.168.55.181:10001/system/cluster
/core/daemon/joinSeedNodeProcess-1#1697915162]]
2018-07-19T08:21:50.831Z [info] akka.cluster.singleton.ClusterSingletonProxy [sourceThread=application-akka.actor.default-dispatcher-6, akkaTimestamp=08:21:50.830UTC, akkaSource=akka.tcp://application@192.168.55.130:10001/user/c
assandraOffsetStorePrepare-singletonProxy, sourceActorSystem=application] - Singleton identified at [akka://application/user/cassandraOffsetStorePrepare-singleton/singleton]
2018-07-19T08:21:50.834Z [info] akka.cluster.singleton.ClusterSingletonProxy [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=08:21:50.834UTC, akkaSource=akka.tcp://application@192.168.55.130:10001/user/r
eadSideGlobalPrepare-XRatesListProcessor-singletonProxy, sourceActorSystem=application] - Singleton identified at [akka://application/user/readSideGlobalPrepare-XRatesListProcessor-singleton/singleton]
2018-07-19T08:21:50.957Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=08:21:50.957UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActorS
ystem=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Node [akka.tcp://application@192.168.55.181:10001] is JOINING, roles [dc-default]
2018-07-19T08:21:50.974Z [info] akka.management.cluster.bootstrap.contactpoint.HttpClusterBootstrapRoutes [sourceThread=application-akka.actor.default-dispatcher-4, akkaTimestamp=08:21:50.973UTC, akkaSource=HttpClusterBootstrapR
outes(akka://application), sourceActorSystem=application] - Bootstrap request from 192.168.55.142:45738: Contact Point returning 2 seed-nodes ([Set(ClusterMember(akka.tcp://application@192.168.55.130:10001,-609640290,Up,Set(dc-default)
), ClusterMember(akka.tcp://application@192.168.55.181:10001,152184767,Joining,Set(dc-default)))])
2018-07-19T08:21:51.151Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-5, akkaTimestamp=08:21:51.150UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActorS
ystem=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Leader is moving node [akka.tcp://application@192.168.55.181:10001] to [Up]
2018-07-19T08:21:51.258Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-5, akkaTimestamp=08:21:51.258UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActorS
ystem=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Received InitJoin message from [Actor[akka.tcp://application@192.168.55.142:10001/system/cluster/core/daemon/joinSeedNodeProcess-1#1126318041]] to [akka.
tcp://application@192.168.55.130:10001]
2018-07-19T08:21:51.259Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-5, akkaTimestamp=08:21:51.259UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActorS
ystem=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Sending InitJoinAck message from node [akka.tcp://application@192.168.55.130:10001] to [Actor[akka.tcp://application@192.168.55.142:10001/system/cluster/
core/daemon/joinSeedNodeProcess-1#1126318041]]
2018-07-19T08:21:51.443Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-22, akkaTimestamp=08:21:51.442UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActor
System=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Node [akka.tcp://application@192.168.55.142:10001] is JOINING, roles [dc-default]
2018-07-19T08:21:52.150Z [info] akka.cluster.Cluster(akka://application) [sourceThread=application-akka.actor.default-dispatcher-5, akkaTimestamp=08:21:52.149UTC, akkaSource=akka.cluster.Cluster(akka://application), sourceActorS
ystem=application] - Cluster Node [akka.tcp://application@192.168.55.130:10001] - Leader is moving node [akka.tcp://application@192.168.55.142:10001] to [Up]
2018-07-19T08:21:54.811Z [warn] akka.cluster.sharding.ShardRegion [sourceThread=application-akka.actor.default-dispatcher-7, akkaTimestamp=08:21:54.810UTC, akkaSource=akka.tcp://application@192.168.55.130:10001/system/sharding/X
RatesListProcessor, sourceActorSystem=application] - Trying to register to coordinator at [ActorSelection[Anchor(akka://application/), Path(/system/sharding/XRatesListProcessorCoordinator/singleton/coordinator)]], but no acknowledgemen
t. Total [1] buffered messages. [Coordinator [Member(address = akka.tcp://application@192.168.55.130:10001, status = Up)] is reachable.]
2018-07-19T08:21:54.925Z [error] akka.cluster.sharding.PersistentShardCoordinator [sourceThread=application-akka.actor.default-dispatcher-6, akkaTimestamp=08:21:54.922UTC, akkaSource=akka.tcp://application@192.168.55.130:10001/s
ystem/sharding/XRatesListEntityCoordinator/singleton/coordinator, sourceActorSystem=application] - Persistence failure when replaying events for persistenceId [/sharding/XRatesListEntityCoordinator]. Last known sequence number [0]
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://application/system/IO-DNS#1459575867]] after [5000 ms]. Sender[null] sent message of type "akka.io.Dns$Resolve".
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
-- monad
akka
kubernetes
lagom

1 Answer

7/19/2018

The most likely cause of those error messages is that your node hasn't formed/joined a cluster, and so is refusing to do any clustered operations (such as deciding which nodes which persistent entity shards should be started on) that might be unsafe.

As an aside, to understand why it refuses to do anything, when a node starts up, it needs to make sure that before it does anything, it joins a cluster. If it doesn't join a cluster, then there's a risk that there might already be a cluster running that it doesn't know about. The consequence of ignoring this and just starting like normal is that your persistent entities may end up running on multiple nodes at once, which will undermine their consistency, a command might be executed on one node, leading to a particular state, but if due to there being multiple clusters, the same entity is also running on another node, that node won't be aware of the previous command and subsequent state change. It then might accept another command, one that would be invalid due to the state change on the other node, but allow it. So when a node starts up, if it can't find a cluster to join, it's too dangerous for it to just on a whim decide to form a new cluster due to this (the most common place where this is a problem in practice is when you have no cluster, and start multiple nodes at once. Since they are all starting, they might not discover each other, and potentially will all decide to form their own clusters). Instead you need to have some mechanism that's able to authoritatively, or at least somewhat reliably, know that there is no existing cluster, so that a new cluster is allowed to be formed under that circumstance. This won't happen automatically, you need to use one of the mechanisms described below.

The best and easiest way to bootstrap the cluster on Kubernetes is to use Akka cluster bootstrap:

https://developer.lightbend.com/docs/akka-management/current/bootstrap.html

This uses DNS to discover other contact points, and reliably decide whether it's safe to create a new cluster. It's not perfect, you can read about the edge cases in the docs. Another approach is to use Kubernetes stateful sets:

https://developer.lightbend.com/guides/akka-cluster-kubernetes-k8s-deploy/

This requires a little more to configure since you need to create the stateful set, but as I understand it it doesn't have the same edge cases where it could fail.

-- James Roper
Source: StackOverflow