Kubectl is working but I can't access any of the components

9/26/2019

We have a small private k8s cluster and until this morning everything was working but as of this morning just kubectl is working and no traffic is going through.

I mean I can launch new deployments, kill them, etc and I can see that they are up and running

But when I want to access them via http, amqp, etc I can't.
I was looking at our nginx logs and tried to go to the homepage but there was no log in nginx and nothing loaded in browser which means that no traffic received by nginx.
We are using Weave net as our CNI.

I checked the dns logs and also tested it and dns is working. I don't know where to start looking for solving this problem, any suggestion?

Update

After some hours the problem almost solved and now I can access my applications but I want to ask another question which is very related to this:

Is there a way that we can detect that the problem is because of networking or it is from the cluster networking (the internal k8s network)? I am asking this because in the past I had a problem with k8s dns and this time I thought something is wrong with the k8s CNI.

Update 2

Now I see this error in weave:

ERRO: 2019/09/27 11:10:03.358321 Captured frame from MAC (d2:14:2a:47:62:d9) to (02:01:5b:b9:8e:fd) associated with another peer 4a:8d:75:d7:59:ff(serflex-argus-2)

And my environment:

  • Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: In house private cluster contains of 5 nodes and set up with kubeadm.

  • OS (e.g: cat /etc/os-release): All machines are running Ubuntu 18.04.3

  • Kernel (e.g. uname -a): Linux k8s-master 4.15.0-62-generic #69-Ubuntu SMP Wed Sep 4 20:55:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: kubeadm
  • Network plugin and version (weave status):
/home/weave # ./weave --local status
        Version: 2.5.2 (up to date; next check at 2019/09/27 15:12:49)
        Service: router
       Protocol: weave 1..2
           Name: 02:01:5b:b9:8e:fd(k8s-master)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 1
    Connections: 5 (4 established, 1 failed)
          Peers: 5 (with 20 established connections)
 TrustedSubnets: none
        Service: ipam
         Status: ready
  • Docker version: Docker version 19.03.2, build 6a30dfc
-- AVarf
kubernetes
weave

1 Answer

10/7/2019

I couldn't find a solution for this problem and I had to tear down the cluster and recreate it but this time I used Calico and after running for a week there was no problem.

The only thing I think could cause the problem was the 200Mb memory limit of the Weave and the fact that 4 out of 5 of my Weave pods were hitting that limit and also on their github I found that Weave has an issue with memleak and because of these I decided to change the CNI.

-- AVarf
Source: StackOverflow