Today my pod could not start and show this error:
2021-04-22 12:41:26.325 WARN 1 --- [ngPollService-1] c.c.f.a.i.RemoteConfigLongPollService : Long polling failed, will retry in 64 seconds. appId: 0010010006, cluster: default, namespaces: TEST1.RABBITMQ_CONFIG_REPORT+TEST1.RABBITMQ-CONFIG+application+TEST1.EUREKA+TEST1.DATASOURCE-DRUID+TEST1.COMMON_CONFIG+TEST1.REDIS-CONFIG, long polling url: null, reason: Get config services failed from http://service-apollo-config-server-test-alpha.sre.svc.cluster.local:8080/services/config?appId=0010010006&ip=172.30.184.11 [Cause: Could not complete get operation [Cause: Connection refused (Connection refused)]]
this error tell me this pod could not access the config service, and fetch config failed from config center, so it could not start. Then I login to another node(work fine node) pod and curl the config pod like this:
curl http://service-apollo-config-server-test-alpha.sre.svc.cluster.local:8080
works fine. so the config service is ok. now I run the same command in the problem node pod:
bash-4.4# curl http://service-apollo-config-server-test-alpha.sre.svc.cluster.local:8080
curl: (7) Failed to connect to service-apollo-config-server-test-alpha.sre.svc.cluster.local port 8080: Connection refused
bash-4.4# curl http://service-apollo-config-server-test-alpha.sre.svc.cluster.local:8080
and I ping the config node like this from problem node, works fine :
ping service-apollo-config-server-test-alpha.sre.svc.cluster.local
then I scan the config node using nmap
from problem node:
bash-4.4# nmap service-apollo-config-server-test-alpha.sre.svc.cluster.local
Starting Nmap 7.70 ( https://nmap.org ) at 2021-04-22 12:45 CST
Nmap scan report for service-apollo-config-server-test-alpha.sre.svc.cluster.local (10.254.82.131)
Host is up (0.000010s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3306/tcp open mysql
8443/tcp open https-alt
did not found the 8080 port. seems network is fine but could not access the service from node. why the problem node pod could not access the config service? what should I do to find out the problem and fix it? I found on the problem node using pod ip it could work, for example:
# pod ip access works
curl 172.30.112.2:11025
# service ip failed
curl 10.254.94.209:11025
# service name failed
curl soa-illidan-superhub.dabai-fat.svc.cluster.local:11025
Finally I found the kube-proxy process was exit, in CentOS 7.6, using this command to start:
systemctl start kube-proxy
fix it.