There was a pod named n404-neo4j-core-1
running on k8s-slave2. After k8s-slave2 was turned off, the pod was stuck with the Terminating
.
I was expecting the pod to be deleted and a new pod be created on another node. If this problem is not resolved, the neo4j cluster failed to keep HA.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
n404-neo4j-core-0 1/1 Running 0 3d19h *** k8s-node1 <none> <none>
n404-neo4j-core-1 1/1 Terminating 0 78m *** k8s-slave2 <none> <none>
kubectl describe pod n404-neo4j-core-1
Name: n404-neo4j-core-1
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-slave2/10.176.6.67
Start Time: Mon, 01 Jun 2020 23:53:13 -0700
Labels: app.kubernetes.io/component=core
app.kubernetes.io/instance=n404
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=neo4j
controller-revision-hash=n404-neo4j-core-67484bd88
helm.sh/chart=neo4j-4.0.4-1
statefulset.kubernetes.io/pod-name=n404-neo4j-core-1
Annotations: <none>
Status: Terminating (lasts 21m)
Termination Grace Period: 30s
IP: 10.36.0.1
Controlled By: StatefulSet/n404-neo4j-core
Containers:
n404-neo4j:
Container ID: docker://a045d7747678ca62734800d153d01f634b9972b527289541d357cbc27456bf7b
Image: neo4j:4.0.4-enterprise
Image ID: docker-pullable://neo4j@sha256:714d83e56a5db61eb44d65c114720f8cb94b06cd044669e16957aac1bd1b5c34
Ports: 5000/TCP, 7000/TCP, 6000/TCP, 7474/TCP, 7687/TCP, 3637/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-c
export core_idx=$(hostname | sed 's|.*-||')
# Processes key configuration elements and exports env vars we need.
. /helm-init/init.sh
# We advertise the discovery-lb addresses (see discovery-lb.yaml) because
# it is for internal cluster comms and is limited to private ports.
export DISCOVERY_HOST="discovery-n404-neo4j-${core_idx}.default.svc.cluster.local"
export NEO4J_causal__clustering_discovery__advertised__address="$DISCOVERY_HOST:5000"
export NEO4J_causal__clustering_transaction__advertised__address="$DISCOVERY_HOST:6000"
export NEO4J_causal__clustering_raft__advertised__address="$DISCOVERY_HOST:7000"
echo "Starting Neo4j CORE $core_idx on $HOST"
exec /docker-entrypoint.sh "neo4j"
State: Running
Started: Mon, 01 Jun 2020 23:53:14 -0700
Ready: True
Restart Count: 0
Liveness: tcp-socket :7687 delay=300s timeout=2s period=10s #success=1 #failure=3
Readiness: tcp-socket :7687 delay=120s timeout=2s period=10s #success=1 #failure=3
Environment Variables from:
n404-neo4j-common-config ConfigMap Optional: false
n404-neo4j-core-config ConfigMap Optional: false
Environment:
NEO4J_SECRETS_PASSWORD: <set to the key 'neo4j-password' in secret 'n404-neo4j-secrets'> Optional: false
Mounts:
/data from datadir (rw)
/helm-init from init-script (rw)
/plugins from plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from n404-neo4j-sa-token-jp7g9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-n404-neo4j-core-1
ReadOnly: false
init-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: n404-init-script
Optional: false
plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
n404-neo4j-sa-token-jp7g9:
Type: Secret (a volume populated by a Secret)
SecretName: n404-neo4j-sa-token-jp7g9
Optional: false
QoS Class: BestEffort
Node-Selectors: svc=neo4j
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
You should not down a Kubernetes node at sudden. If you do, you'll end up with some strange scenarios like this.
First, cordon the node. It notifies the scheduler the given node is not available for scheduling anymore.
kubectl cordon <node>
Then, drain the node. It moves the running pods to another node/nodes.
kubectl drain <node>
Now, you're safe to remove the node from the cluster.
This is so called 'at-most-one' semantic in K8S, pls check the link: https://v1-16.docs.kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/
Copied from the link: StatefulSet ensures that, at any time, there is at most one Pod with a given identity running in a cluster. This is referred to as at most one semantics provided by a StatefulSet.
From the docs here
Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. The Pods running on an unreachable Node enter the ‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node. The only ways in which a Pod in such a state can be removed from the apiserver are as follows:
The recommended best practice is to use the first or second approach. If a Node is confirmed to be dead (e.g. permanently disconnected from the network, powered down, etc), then delete the Node object. If the Node is suffering from a network partition, then try to resolve this or wait for it to resolve. When the partition heals, the kubelet will complete the deletion of the Pod and free up its name in the apiserver. Normally, the system completes the deletion once the Pod is no longer running on a Node, or the Node is deleted by an administrator. You may override this by force deleting the Pod.