I am currently running Apache NiFi as a StatefulSet on Kubernetes. I'm testing to see how the cluster recovers if I kill a pod but am experiencing a problem when the pod (NiFi node) rejoins the cluster.
The node will rejoin as an additional node instead of appearing as it's original identity. For example, if I have a 3 node NiFi cluster and kill and restart one pod/NiFi node I will end up with a 4 node cluster with one disconnected.
Before:
After:
I believe that the NiFi node is identified somehow in a config file which isn't persisting when it is killed. So far I am using persistent volumes to persist the following config files:
I haven't persisted nifi.properties (it is dynamically generated on startup a and I can't see anything in there that could uniquely identify the node).
So I guess, the question is how is the node uniquely identified to the server and where is it stored?
EDIT: I'm using an external Zookeeper.
Thank you in advance,
Harry
Each node stores the state of the cluster in the local state manager which by default would be written to a write-ahead-log in nifi-home/state/local. Most likely you are losing the state/local directory on the node being restarted.