I have a simple elasticsearch cluster running on Kubernetes cluster. I am using the Elasticsearch operator to do so. version 1.7
this is how my ES object looks.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: sifter-elastic-data-factory
spec:
version: 7.10.1
nodeSets:
- name: master
count: 1
config:
node.roles: [ master ]
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 8Gi
cpu: 3000m
limits:
memory: 8Gi
cpu: 3000m
env:
- name: ES_JAVA_OPTS
value: -Xms6g -Xmx6g
- name: cluster.initial_master_nodes
value: "sifter-elastic-data-factory-es-master-0"
volumeClaimTemplates:
- metadata:
name: elasticsearch-data-data-factory
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: ssd
- name: data
count: 3
config:
node.roles: [ data, ingest ]
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 8Gi
cpu: 3000m
limits:
memory: 8Gi
cpu: 3000m
env:
- name: ES_JAVA_OPTS
value: -Xms6g -Xmx6g
volumeClaimTemplates:
- metadata:
name: elasticsearch-data-data-factory
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 60Gi
storageClassName: ssd
http:
service:
spec:
type: ClusterIP
tls:
selfSignedCertificate:
disabled: true
This works fine if one of the data nodes is restarted. Kubernetes stateful sets bring up a deleted node and then it knows who is ES master and picks up from there.
but if the master node dies (or is deleted) a new master node throws the following exception.
"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid gsBEw4N2S-K31IxI4tu4-w than local cluster uuid QlL6zADsR_-8cF7mW4n9Og, rejecting",
or
{"type": "server", "timestamp": "2021-08-12T18:32:34,916Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "xxxxx-elastic-data-factory", "node.name": "xxxxx-elastic-data-factory-es-master-0", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and cluster.initial_master_nodes is empty on this node: have discovered {xxxxx-elastic-data-factory-es-master-0}{6ftRopASSq-jAh-Y7DOy_g}{vdTgG6vFSweeMmuTdCOkVw}{10.1.7.72}{10.1.7.72:9300}{lmr}{k8s_node_name=aks-npdev-10099729-vmss0000dv, ml.machine_memory=8589934592, xpack.installed=true, transform.node=false, ml.max_open_jobs=20}; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, ::1:9300, ::1:9301, ::1:9302, ::1:9303, ::1:9304, ::1:9305] from hosts providers and {xxxxx-elastic-data-factory-es-master-0}{6ftRopASSq-jAh-Y7DOy_g}{vdTgG6vFSweeMmuTdCOkVw}{10.1.7.72}{10.1.7.72:9300}{lmr}{k8s_node_name=aks-npdev-10099729-vmss0000dv, ml.machine_memory=8589934592, xpack.installed=true, transform.node=false, ml.max_open_jobs=20} from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
not sure why I get different errors on different occasions.
what should I do, to keep things running even if the master node goes down?