I deployed an Elasticsearch cluster in AWS EKS with 3 nodes. After launching the cluster, I can see 3 pods are running but 2 of them running fine, one of them keep failing and terminating and restarting.
Below is the error log on the failed pod.
{"type": "server", "timestamp": "2021-12-26T08:17:33,061Z", "level": "INFO", "component": "o.e.i.g.DatabaseRegistry", "cluster.name": "elk", "node.name": "elk-es-node-1", "message": "downloading geoip database [GeoLite2-ASN.mmdb] to [/tmp/elasticsearch-9470345091343635510/geoip-databases/HoGUMQ9ISsCjQ4KhIL2IFA/GeoLite2-ASN.mmdb.tmp.gz]" }
{"type": "server", "timestamp": "2021-12-26T08:17:33,070Z", "level": "ERROR", "component": "o.e.i.g.DatabaseRegistry", "cluster.name": "elk", "node.name": "elk-es-node-1", "message": "failed to download database [GeoLite2-ASN.mmdb]",
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:179) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:165) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:605) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:494) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.lambda$executeRequest$3(TransportSearchAction.java:288) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:134) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:103) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:76) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.executeRequest(TransportSearchAction.java:329) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:217) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:93) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:173) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:42) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:171) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:149) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:77) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:90) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:70) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:402) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:54) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.OriginSettingClient.doExecute(OriginSettingClient.java:40) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:402) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:390) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:534) ~[elasticsearch-7.15.2.jar:7.15.2]",
"at org.elasticsearch.ingest.geoip.DatabaseRegistry.lambda$retrieveDatabase$11(DatabaseRegistry.java:359) [ingest-geoip-7.15.2.jar:7.15.2]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.15.2.jar:7.15.2]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]",
"at java.lang.Thread.run(Thread.java:833) [?:?]"] }
{"type": "server", "timestamp": "2021-12-26T08:17:33,295Z", "level": "INFO", "component": "o.e.l.LicenseService", "cluster.name": "elk", "node.name": "elk-es-node-1", "message": "license [8a88ef40-3b0b-439e-9f46-32e911999b7d] mode [basic] - valid" }
{"type": "server", "timestamp": "2021-12-26T08:17:33,309Z", "level": "INFO", "component": "o.e.h.AbstractHttpServerTransport", "cluster.name": "elk", "node.name": "elk-es-node-1", "message": "publish_address {elk-es-node-1.elk-es-node.default.svc/10.0.1.182:9200}, bound_addresses {0.0.0.0:9200}", "cluster.uuid": "hqRP62pNTze1IWQ0sOOR2Q", "node.id": "HoGUMQ9ISsCjQ4KhIL2IFA" }
{"type": "server", "timestamp": "2021-12-26T08:17:33,310Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "elk", "node.name": "elk-es-node-1", "message": "started", "cluster.uuid": "hqRP62pNTze1IWQ0sOOR2Q", "node.id": "HoGUMQ9ISsCjQ4KhIL2IFA" }
The error message says failed to download database [GeoLite2-ASN.mmdb]
but I don't know what does this mean.
Below is my Elasticsearch K8S spec file.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elk
spec:
version: 7.15.2
serviceAccountName: docker-sa
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: node
count: 3
config:
network.host: 0.0.0.0
xpack.security.enabled: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
readinessProbe:
exec:
command:
- bash
- -c
- /mnt/elastic-internal/scripts/readiness-probe-script.sh
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 12
successThreshold: 1
timeoutSeconds: 12
env:
- name: READINESS_PROBE_TIMEOUT
value: "120"
resources:
requests:
cpu: 1
memory: 4Gi
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: ebs-sc
resources:
requests:
storage: 1024Gi
Any idea why this happens?
I have provisioned a cluster of nodes (ec2 instance) with 2 desired count and 3 maximum. At the moment, only 2 nodes are launched.
Below is the output of nodes information:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-1-216.ap-southeast-2.compute.internal Ready <none> 6d16h v1.21.5-eks-bc4871b
ip-10-0-2-24.ap-southeast-2.compute.internal Ready <none> 6d17h v1.21.5-eks-bc4871b
$ kubectl describe node ip-10-0-1-216.ap-southeast-2.compute.internal
Name: ip-10-0-1-216.ap-southeast-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.xlarge
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=ON_DEMAND
eks.amazonaws.com/nodegroup=elk
eks.amazonaws.com/nodegroup-image=ami-045371401c5f70a1e
failure-domain.beta.kubernetes.io/region=ap-southeast-2
failure-domain.beta.kubernetes.io/zone=ap-southeast-2a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-1-216.ap-southeast-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t3.xlarge
topology.ebs.csi.aws.com/zone=ap-southeast-2a
topology.kubernetes.io/region=ap-southeast-2
topology.kubernetes.io/zone=ap-southeast-2a
Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0fd067997eddeaf86"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 24 Dec 2021 20:29:13 +1100
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: ip-10-0-1-216.ap-southeast-2.compute.internal
AcquireTime: <unset>
RenewTime: Fri, 31 Dec 2021 13:25:51 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 31 Dec 2021 13:25:52 +1100 Mon, 27 Dec 2021 20:44:02 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:29:13 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:29:13 +1100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:29:34 +1100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.1.216
Hostname: ip-10-0-1-216.ap-southeast-2.compute.internal
InternalDNS: ip-10-0-1-216.ap-southeast-2.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 4
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16205904Ki
pods: 58
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 3920m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15189072Ki
pods: 58
System Info:
Machine ID: ec2ac34dd2e6a84bdc340dd6a62c3514
System UUID: ec2ac34d-d2e6-a84b-dc34-0dd6a62c3514
Boot ID: 26c6dde2-1131-4068-9265-c0c4e12d54d3
Kernel Version: 5.4.156-83.273.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.7
Kubelet Version: v1.21.5-eks-bc4871b
Kube-Proxy Version: v1.21.5-eks-bc4871b
ProviderID: aws:///ap-southeast-2a/i-0fd067997eddeaf86
Non-terminated Pods: (14 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
cert-manager cert-manager-68ff46b886-tzfqw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h
cert-manager cert-manager-cainjector-7cdbb9c945-w99rf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h
cert-manager cert-manager-webhook-58d45d56b8-wsmwp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h
default elk-es-node-0 1 (25%) 100m (2%) 4Gi (27%) 50Mi (0%) 3d16h
default sidecar-66f887c666-pbbcv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d19h
elastic-system elastic-operator-0 100m (2%) 1 (25%) 150Mi (1%) 512Mi (3%) 3d16h
kube-system aws-load-balancer-controller-9c59c86d8-jd6j9 100m (2%) 200m (5%) 200Mi (1%) 500Mi (3%) 3d16h
kube-system aws-node-hpnxj 10m (0%) 0 (0%) 0 (0%) 0 (0%) 4d14h
kube-system cluster-autoscaler-76fd4db4c-lcff5 100m (2%) 100m (2%) 600Mi (4%) 600Mi (4%) 3d16h
kube-system coredns-68f7974869-x4kxs 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 3d16h
kube-system ebs-csi-controller-55b9f85d5c-c94m5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h
kube-system ebs-csi-controller-55b9f85d5c-mt57k 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d16h
kube-system ebs-csi-node-692n6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d13h
kube-system kube-proxy-cqll5 100m (2%) 0 (0%) 0 (0%) 0 (0%) 4d14h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1510m (38%) 1400m (35%)
memory 5116Mi (34%) 1832Mi (12%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>
Name: ip-10-0-2-24.ap-southeast-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.xlarge
beta.kubernetes.io/os=linux
eks.amazonaws.com/capacityType=ON_DEMAND
eks.amazonaws.com/nodegroup=elk
eks.amazonaws.com/nodegroup-image=ami-045371401c5f70a1e
failure-domain.beta.kubernetes.io/region=ap-southeast-2
failure-domain.beta.kubernetes.io/zone=ap-southeast-2b
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-2-24.ap-southeast-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t3.xlarge
topology.ebs.csi.aws.com/zone=ap-southeast-2b
topology.kubernetes.io/region=ap-southeast-2
topology.kubernetes.io/zone=ap-southeast-2b
Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0bac62ee2ae10a59a"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 24 Dec 2021 20:23:12 +1100
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: ip-10-0-2-24.ap-southeast-2.compute.internal
AcquireTime: <unset>
RenewTime: Fri, 31 Dec 2021 13:26:12 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 31 Dec 2021 13:25:52 +1100 Mon, 27 Dec 2021 20:23:13 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:23:12 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:23:12 +1100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 31 Dec 2021 13:25:52 +1100 Fri, 24 Dec 2021 20:23:32 +1100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.2.24
Hostname: ip-10-0-2-24.ap-southeast-2.compute.internal
InternalDNS: ip-10-0-2-24.ap-southeast-2.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 4
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16205904Ki
pods: 58
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 3920m
ephemeral-storage: 18242267924
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15189072Ki
pods: 58
System Info:
Machine ID: ec29ef8b81ba9ab1229c8348fa8f6c97
System UUID: ec29ef8b-81ba-9ab1-229c-8348fa8f6c97
Boot ID: 8bf36296-625d-4749-8e81-abe0d7dd85c8
Kernel Version: 5.4.156-83.273.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.7
Kubelet Version: v1.21.5-eks-bc4871b
Kube-Proxy Version: v1.21.5-eks-bc4871b
ProviderID: aws:///ap-southeast-2b/i-0bac62ee2ae10a59a
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default elk-es-node-1 1 (25%) 100m (2%) 4Gi (27%) 50Mi (0%) 3d16h
default kibana-kb-7f66d6978d-6knrx 100m (2%) 100m (2%) 1Gi (6%) 1Gi (6%) 3d16h
default transform-798f5758cd-4x94w 1 (25%) 0 (0%) 2Gi (13%) 0 (0%) 6d16h
kube-system aws-node-qmfg7 10m (0%) 0 (0%) 0 (0%) 0 (0%) 6d17h
kube-system coredns-68f7974869-wqs2k 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 3d16h
kube-system ebs-csi-node-48h2n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6d17h
kube-system kube-proxy-f99dh 100m (2%) 0 (0%) 0 (0%) 0 (0%) 6d17h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2310m (58%) 200m (5%)
memory 7238Mi (48%) 1244Mi (8%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>