OCP 4.7.1 - Curl oauth-openshift.apps resutls in SSL_ERROR_SYSCALL

3/17/2021

I am having an issue with the authentication operator not becoming stable (bouncing Between Avaialbe = True, and Degraded = True). The operator is trying to check the health using the endpoing https://oauth-openshift.apps.oc.sow.expert/healthz. and it sees it as not available (at least sometimes).

Cluster version :

[root@bastion ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.1     True        False         44h     Error while reconciling 4.7.1: the cluster operator ingress is degraded

Cluster operator describe:

[root@bastion ~]# oc describe clusteroperator authentication
Name:         authentication
Namespace:
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-03-15T19:54:21Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:exclude.release.openshift.io/internal-openshift-hosted:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
      f:spec:
      f:status:
        .:
        f:extension:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2021-03-15T19:54:21Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:relatedObjects:
        f:versions:
    Manager:         authentication-operator
    Operation:       Update
    Time:            2021-03-15T20:03:18Z
  Resource Version:  1207037
  Self Link:         /apis/config.openshift.io/v1/clusteroperators/authentication
  UID:               b7ca7d49-f6e5-446e-ac13-c5cc6d06fac1
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-03-17T11:42:49Z
    Message:               OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
    Reason:                AsExpected
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2021-03-17T11:42:53Z
    Message:               All is well
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-03-17T11:43:21Z
    Message:               OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
    Reason:                OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable
    Status:                False
    Type:                  Available
    Last Transition Time:  2021-03-15T20:01:24Z
    Message:               All is well
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   infrastructures
    Group:      config.openshift.io
    Name:       cluster
    Resource:   oauths
    Group:      route.openshift.io
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   routes
    Group:
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   services
    Group:
    Name:       openshift-config
    Resource:   namespaces
    Group:
    Name:       openshift-config-managed
    Resource:   namespaces
    Group:
    Name:       openshift-authentication
    Resource:   namespaces
    Group:
    Name:       openshift-authentication-operator
    Resource:   namespaces
    Group:
    Name:       openshift-ingress
    Resource:   namespaces
    Group:
    Name:       openshift-oauth-apiserver
    Resource:   namespaces
  Versions:
    Name:     oauth-apiserver
    Version:  4.7.1
    Name:     operator
    Version:  4.7.1
    Name:     oauth-openshift
    Version:  4.7.1_openshift
Events:       <none>

When I curl multiple times to the same endpoint from bastion server, it results in two different responses once with the error "OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443" and the other seems to be successful as follows:

[root@bastion ~]#  curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
*   Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443


[root@bastion ~]#  curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
*   Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.apps.oc.sow.expert
*  start date: Mar 15 20:05:53 2021 GMT
*  expire date: Mar 15 20:05:54 2023 GMT
*  issuer: CN=ingress-operator@1615838672
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /healthz HTTP/1.1
> Host: oauth-openshift.apps.oc.sow.expert
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200 OK
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Content-Type: text/plain; charset=utf-8
< Expires: 0
< Pragma: no-cache
< Referrer-Policy: strict-origin-when-cross-origin
< X-Content-Type-Options: nosniff
< X-Dns-Prefetch-Control: off
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Date: Wed, 17 Mar 2021 11:49:50 GMT
< Content-Length: 2
<
* Connection #0 to host oauth-openshift.apps.oc.sow.expert left intact
ok 

In the Bastion server, I am hosting the HAProxy load balancer and the squid proxy to allow internal instalnces to access the internet.

HAProxy configurations is as follows:

[root@bastion ~]# cat /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   https://www.haproxy.org/download/1.8/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

    # utilize system-wide crypto-policies
    #ssl-default-bind-ciphers PROFILE=SYSTEM
    #ssl-default-server-ciphers PROFILE=SYSTEM

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    tcp
    log                     global
    option                  tcplog
    option                  dontlognull
    option http-server-close
    #option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

# Control Plane config - external
frontend api
    bind 192.168.124.174:6443
    mode tcp
    default_backend             api-be

# Control Plane config - internal
frontend api-int
    bind 10.164.76.113:6443
    mode tcp
    default_backend             api-be

backend api-be
    mode tcp
    balance     roundrobin
#    server  bootstrap 10.94.124.2:6443 check
    server  master01 10.94.124.3:6443 check
    server  master02 10.94.124.4:6443 check
    server  master03 10.94.124.5:6443 check

frontend machine-config
    bind 10.164.76.113:22623
    mode tcp
    default_backend             machine-config-be

backend machine-config-be
    mode tcp
    balance     roundrobin
#    server  bootstrap 10.94.124.2:22623 check
    server  master01 10.94.124.3:22623 check
    server  master02 10.94.124.4:22623 check
    server  master03 10.94.124.5:22623 check

# apps config
frontend https
    mode tcp
    bind 10.164.76.113:443
    default_backend             https

frontend http
    mode tcp
    bind 10.164.76.113:80
    default_backend             http

frontend https-ext
    mode tcp
    bind 192.168.124.173:443
    default_backend             https

frontend http-ext
    mode tcp
    bind 192.168.124.173:80
    default_backend             http

backend https
    mode tcp
    balance     roundrobin
    server  storage01 10.94.124.6:443 check
    server  storage02 10.94.124.7:443 check
    server  storage03 10.94.124.8:443 check
    server  worker01 10.94.124.15:443 check
    server  worker02 10.94.124.16:443 check
    server  worker03 10.94.124.17:443 check
    server  worker04 10.94.124.18:443 check
    server  worker05 10.94.124.19:443 check
    server  worker06 10.94.124.20:443 check

backend http
    mode tcp
    balance     roundrobin
    server  storage01 10.94.124.6:80 check
    server  storage02 10.94.124.7:80 check
    server  storage03 10.94.124.8:80 check
    server  worker01 10.94.124.15:80 check
    server  worker02 10.94.124.16:80 check
    server  worker03 10.94.124.17:80 check
    server  worker04 10.94.124.18:80 check
    server  worker05 10.94.124.19:80 check
    server  worker06 10.94.124.20:80 check

And Here is the squid proxy configurations:

[root@bastion ~]# cat /etc/squid/squid.conf
#
# Recommended minimum configuration:
#

# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed
acl localnet src 0.0.0.1-0.255.255.255  # RFC 1122 "this" network (LAN)
acl localnet src 10.0.0.0/8             # RFC 1918 local private network (LAN)
acl localnet src 100.64.0.0/10          # RFC 6598 shared address space (CGN)
acl localnet src 169.254.0.0/16         # RFC 3927 link-local (directly plugged) machines
acl localnet src 172.16.0.0/12          # RFC 1918 local private network (LAN)
acl localnet src 192.168.0.0/16         # RFC 1918 local private network (LAN)
acl localnet src fc00::/7               # RFC 4193 local private network range
acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT

#
# Recommended minimum Access Permission configuration:
#
# Deny requests to certain unsafe ports
#http_access deny !Safe_ports

# Deny CONNECT to other than secure SSL ports
#http_access deny CONNECT !SSL_ports

# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost

# And finally deny all other access to this proxy
http_access deny all

# Squid normally listens to port 3128
http_port 3128
http_port 10.164.76.113:3128

# Uncomment and adjust the following to add a disk cache directory.
#cache_dir ufs /var/spool/squid 100 16 256

# Leave coredumps in the first cache dir
coredump_dir /var/spool/squid

#
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

Can someone please help me resolve the connection problem when hitting the application endpoint?

EDITED:

I get the following error in the console pod logs:

[root@bastion cp]# oc logs -n openshift-console console-6697f85d68-p8jxf
W0404 14:59:30.706793       1 main.go:211] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I0404 14:59:30.706887       1 main.go:288] cookies are secure!
E0404 14:59:31.221158       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:41.690905       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:52.155373       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:02.618751       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:13.071041       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:23.531058       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:33.999953       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:44.455873       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:54.935240       1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
I0404 15:01:05.666751       1 main.go:670] Binding to [::]:8443...
I0404 15:01:05.666776       1 main.go:672] using TLS
-- Abdullah Alsowaygh
haproxy
kubernetes
open-closed-principle
openshift
squid

1 Answer

4/8/2021

I just resolved this issue. To check you have the same issue:

oc logs -n openshift-console console-xxxxxxx-yyyyy

Check if you have messages like these:

error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc4.tt.testing/oauth/token failed: Head "https://oauth-openshift.apps.oc4.tt.testing": dial tcp: lookup oauth-openshift.apps.oc4.tt.testing on 172.30.0.10:53: no such host

In my case I'm deploying through libvirt. Libvirt does part of the DNS resolving. I had already added this entry to the libvirt network however I had to delete and add it again.

WORKER_IP=192.168.126.51
virsh net-update oc4-xxxx delete dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"
virsh net-update oc4-xxxx add dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"
-- th3penguinwhisperer
Source: StackOverflow