I am having an issue with the authentication operator not becoming stable (bouncing Between Avaialbe = True, and Degraded = True). The operator is trying to check the health using the endpoing https://oauth-openshift.apps.oc.sow.expert/healthz. and it sees it as not available (at least sometimes).
Cluster version :
[root@bastion ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.1 True False 44h Error while reconciling 4.7.1: the cluster operator ingress is degraded
Cluster operator describe:
[root@bastion ~]# oc describe clusteroperator authentication
Name: authentication
Namespace:
Labels: <none>
Annotations: exclude.release.openshift.io/internal-openshift-hosted: true
include.release.openshift.io/self-managed-high-availability: true
include.release.openshift.io/single-node-developer: true
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2021-03-15T19:54:21Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:exclude.release.openshift.io/internal-openshift-hosted:
f:include.release.openshift.io/self-managed-high-availability:
f:include.release.openshift.io/single-node-developer:
f:spec:
f:status:
.:
f:extension:
Manager: cluster-version-operator
Operation: Update
Time: 2021-03-15T19:54:21Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:relatedObjects:
f:versions:
Manager: authentication-operator
Operation: Update
Time: 2021-03-15T20:03:18Z
Resource Version: 1207037
Self Link: /apis/config.openshift.io/v1/clusteroperators/authentication
UID: b7ca7d49-f6e5-446e-ac13-c5cc6d06fac1
Spec:
Status:
Conditions:
Last Transition Time: 2021-03-17T11:42:49Z
Message: OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
Reason: AsExpected
Status: False
Type: Degraded
Last Transition Time: 2021-03-17T11:42:53Z
Message: All is well
Reason: AsExpected
Status: False
Type: Progressing
Last Transition Time: 2021-03-17T11:43:21Z
Message: OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
Reason: OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable
Status: False
Type: Available
Last Transition Time: 2021-03-15T20:01:24Z
Message: All is well
Reason: AsExpected
Status: True
Type: Upgradeable
Extension: <nil>
Related Objects:
Group: operator.openshift.io
Name: cluster
Resource: authentications
Group: config.openshift.io
Name: cluster
Resource: authentications
Group: config.openshift.io
Name: cluster
Resource: infrastructures
Group: config.openshift.io
Name: cluster
Resource: oauths
Group: route.openshift.io
Name: oauth-openshift
Namespace: openshift-authentication
Resource: routes
Group:
Name: oauth-openshift
Namespace: openshift-authentication
Resource: services
Group:
Name: openshift-config
Resource: namespaces
Group:
Name: openshift-config-managed
Resource: namespaces
Group:
Name: openshift-authentication
Resource: namespaces
Group:
Name: openshift-authentication-operator
Resource: namespaces
Group:
Name: openshift-ingress
Resource: namespaces
Group:
Name: openshift-oauth-apiserver
Resource: namespaces
Versions:
Name: oauth-apiserver
Version: 4.7.1
Name: operator
Version: 4.7.1
Name: oauth-openshift
Version: 4.7.1_openshift
Events: <none>
When I curl multiple times to the same endpoint from bastion server, it results in two different responses once with the error "OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443" and the other seems to be successful as follows:
[root@bastion ~]# curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
* Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443
[root@bastion ~]# curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
* Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: CN=*.apps.oc.sow.expert
* start date: Mar 15 20:05:53 2021 GMT
* expire date: Mar 15 20:05:54 2023 GMT
* issuer: CN=ingress-operator@1615838672
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /healthz HTTP/1.1
> Host: oauth-openshift.apps.oc.sow.expert
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200 OK
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Content-Type: text/plain; charset=utf-8
< Expires: 0
< Pragma: no-cache
< Referrer-Policy: strict-origin-when-cross-origin
< X-Content-Type-Options: nosniff
< X-Dns-Prefetch-Control: off
< X-Frame-Options: DENY
< X-Xss-Protection: 1; mode=block
< Date: Wed, 17 Mar 2021 11:49:50 GMT
< Content-Length: 2
<
* Connection #0 to host oauth-openshift.apps.oc.sow.expert left intact
ok
In the Bastion server, I am hosting the HAProxy load balancer and the squid proxy to allow internal instalnces to access the internet.
HAProxy configurations is as follows:
[root@bastion ~]# cat /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# https://www.haproxy.org/download/1.8/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
# utilize system-wide crypto-policies
#ssl-default-bind-ciphers PROFILE=SYSTEM
#ssl-default-server-ciphers PROFILE=SYSTEM
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode tcp
log global
option tcplog
option dontlognull
option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
# Control Plane config - external
frontend api
bind 192.168.124.174:6443
mode tcp
default_backend api-be
# Control Plane config - internal
frontend api-int
bind 10.164.76.113:6443
mode tcp
default_backend api-be
backend api-be
mode tcp
balance roundrobin
# server bootstrap 10.94.124.2:6443 check
server master01 10.94.124.3:6443 check
server master02 10.94.124.4:6443 check
server master03 10.94.124.5:6443 check
frontend machine-config
bind 10.164.76.113:22623
mode tcp
default_backend machine-config-be
backend machine-config-be
mode tcp
balance roundrobin
# server bootstrap 10.94.124.2:22623 check
server master01 10.94.124.3:22623 check
server master02 10.94.124.4:22623 check
server master03 10.94.124.5:22623 check
# apps config
frontend https
mode tcp
bind 10.164.76.113:443
default_backend https
frontend http
mode tcp
bind 10.164.76.113:80
default_backend http
frontend https-ext
mode tcp
bind 192.168.124.173:443
default_backend https
frontend http-ext
mode tcp
bind 192.168.124.173:80
default_backend http
backend https
mode tcp
balance roundrobin
server storage01 10.94.124.6:443 check
server storage02 10.94.124.7:443 check
server storage03 10.94.124.8:443 check
server worker01 10.94.124.15:443 check
server worker02 10.94.124.16:443 check
server worker03 10.94.124.17:443 check
server worker04 10.94.124.18:443 check
server worker05 10.94.124.19:443 check
server worker06 10.94.124.20:443 check
backend http
mode tcp
balance roundrobin
server storage01 10.94.124.6:80 check
server storage02 10.94.124.7:80 check
server storage03 10.94.124.8:80 check
server worker01 10.94.124.15:80 check
server worker02 10.94.124.16:80 check
server worker03 10.94.124.17:80 check
server worker04 10.94.124.18:80 check
server worker05 10.94.124.19:80 check
server worker06 10.94.124.20:80 check
And Here is the squid proxy configurations:
[root@bastion ~]# cat /etc/squid/squid.conf
#
# Recommended minimum configuration:
#
# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed
acl localnet src 0.0.0.1-0.255.255.255 # RFC 1122 "this" network (LAN)
acl localnet src 10.0.0.0/8 # RFC 1918 local private network (LAN)
acl localnet src 100.64.0.0/10 # RFC 6598 shared address space (CGN)
acl localnet src 169.254.0.0/16 # RFC 3927 link-local (directly plugged) machines
acl localnet src 172.16.0.0/12 # RFC 1918 local private network (LAN)
acl localnet src 192.168.0.0/16 # RFC 1918 local private network (LAN)
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
#
# Recommended minimum Access Permission configuration:
#
# Deny requests to certain unsafe ports
#http_access deny !Safe_ports
# Deny CONNECT to other than secure SSL ports
#http_access deny CONNECT !SSL_ports
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost
# And finally deny all other access to this proxy
http_access deny all
# Squid normally listens to port 3128
http_port 3128
http_port 10.164.76.113:3128
# Uncomment and adjust the following to add a disk cache directory.
#cache_dir ufs /var/spool/squid 100 16 256
# Leave coredumps in the first cache dir
coredump_dir /var/spool/squid
#
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
Can someone please help me resolve the connection problem when hitting the application endpoint?
EDITED:
I get the following error in the console pod logs:
[root@bastion cp]# oc logs -n openshift-console console-6697f85d68-p8jxf
W0404 14:59:30.706793 1 main.go:211] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I0404 14:59:30.706887 1 main.go:288] cookies are secure!
E0404 14:59:31.221158 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:41.690905 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:52.155373 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:02.618751 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:13.071041 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:23.531058 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:33.999953 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:44.455873 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:54.935240 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
I0404 15:01:05.666751 1 main.go:670] Binding to [::]:8443...
I0404 15:01:05.666776 1 main.go:672] using TLS
I just resolved this issue. To check you have the same issue:
oc logs -n openshift-console console-xxxxxxx-yyyyy
Check if you have messages like these:
error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc4.tt.testing/oauth/token failed: Head "https://oauth-openshift.apps.oc4.tt.testing": dial tcp: lookup oauth-openshift.apps.oc4.tt.testing on 172.30.0.10:53: no such host
In my case I'm deploying through libvirt. Libvirt does part of the DNS resolving. I had already added this entry to the libvirt network however I had to delete and add it again.
WORKER_IP=192.168.126.51
virsh net-update oc4-xxxx delete dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"
virsh net-update oc4-xxxx add dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"