kubectl replied connection refused on Linux while OK in another machine (Mac)

10/20/2019

Update: I just run the command inside the docker of the same Linux machine, and it worked. Therefore might be an issue related to Linux distro. I personally suspect something related to SSL certifications.

I set up a Kubernetes cluster in AWS EKS and a whole running environment by using MacBook. However, I found out myself cannot setup kubectl correctly in my Linux machine (ArchLinux).

I've tried to run the kubectl --v=1000 get svc (some cluster info was masked)

I1020 11:01:44.053581    3266 loader.go:359] Config loaded from file /home/realturner/.kube/config
I1020 11:01:44.054963    3266 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.14.7 (linux/amd64) kubernetes/1861c59" 'https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s'
I1020 11:01:44.299305    3266 round_trippers.go:438] GET https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s   in 244 milliseconds
I1020 11:01:44.299331    3266 round_trippers.go:444] Response Headers:
I1020 11:01:44.299367    3266 cached_discovery.go:121] skipped caching discovery info due to Get https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s:  dial tcp: lookup XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com  on [::1]:53: read udp [::1]:34122->[::1]:53: read: connection refused

When compared to another machine, the successful one replies headers and body

I1020 11:03:44.358266    1675 loader.go:359] Config loaded from file /Users/realturner/.kube/config-tv
I1020 11:03:44.359417    1675 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.14.6 (darwin/amd64) kubernetes/96fac5c" 'https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s'
I1020 11:03:46.186432    1675 round_trippers.go:438] GET https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s  200 OK in 1826 milliseconds
I1020 11:03:46.186481    1675 round_trippers.go:444] Response Headers:
I1020 11:03:46.186498    1675 round_trippers.go:447]     Content-Length: 149
I1020 11:03:46.186512    1675 round_trippers.go:447]     Date: Sun, 20 Oct 2019 03:03:46 GMT
I1020 11:03:46.186525    1675 round_trippers.go:447]     Audit-Id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
I1020 11:03:46.186538    1675 round_trippers.go:447]     Content-Type: application/json
I1020 11:03:46.262841    1675 request.go:942] Response Body: {"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"ip-10-xxx-xxx-xxx.ec2.internal:443"}]}

I'd suspect a network or firewall problem, but simply doing curl to that endpoint do have some response, though lack of permission:

$ curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.14.7 (linux/amd64) kubernetes/1861c59" 'https://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com/api?timeout=32s'
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying x.xxx.xxx.xx:443...
* TCP_NODELAY set
* Connected to XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com (x.xxx.xxx.xx) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: Oct 17 10:29:43 2019 GMT
*  expire date: Oct 16 10:29:44 2020 GMT
*  issuer: CN=kubernetes
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55cb670e87b0)
> GET /api?timeout=32s HTTP/2
> Host: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com
> Accept: application/json, */*
> User-Agent: kubectl/v1.14.7 (linux/amd64) kubernetes/1861c59
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 403 
< audit-id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
< content-type: application/json
< x-content-type-options: nosniff
< content-length: 188
< date: Sun, 20 Oct 2019 14:56:42 GMT
< 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/api\"","reason":"Forbidden","details":{},"code":403}
* Connection #0 to host XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eks.amazonaws.com left intact

Edit - here's my .kube/config, same as .kube/tv-config (some items masked). It's generated by aws eks update-kubeconfig --name <Cluster>

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: <Same as the one in `Certificate authority 
` of EKS>
    server: https://<PREFIX>.<REGION>.eks.amazonaws.com
  name: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
contexts:
- context:
    cluster: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
    user: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
  name: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
current-context: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
kind: Config
preferences: {}
users:
- name: arn:aws:eks:<REGION>:<AccountId>:cluster/<Cluster>
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - --region
      - <REGION>
      - eks
      - get-token
      - --cluster-name
      - <Cluster>
      command: aws
-- realturner
archlinux
aws-eks
kubectl
kubernetes
ssl

1 Answer

10/22/2019

Wow, it turns out to be the DNS resolution problem, despite that I used the newly install the system for several days without noticing it.

Previously I just tried getent hosts <DNS> for DNS resolution and use curl -v <PREFIX>.<REGION>.eks.amazonaws.com> for DNS resolution test. They both had replied correctly before I found my /etc/resolv.conf is actually empty.

I have missed configuring systemd's DNS resolution. As documented here:

Note that if you want to take advantage of automatic DNS configuration from DHCP, you need to enable systemd-resolved and symlink /run/systemd/resolve/resolv.conf to /etc/resolv.conf

After symbolic linking now it just works as expected!

-- realturner
Source: StackOverflow