Issues upgrade okd cluster from 3.6 to 3.7 (trying to get it fully to the latest)

1/11/2019

I'm running the automated in place upgrade in a lab environment so that I can test before updated our more used developer environment. Running the upgrade playbook executes without any errors. I reboot all nodes and then proceed to test but I get failures with the service network. Pods that are deployed are unable to hit anything on the 172.30.0.0 network. I've confirmed that I am unable to hit the kubernetes api on the 172.30.0.1 endpoint and get a "no route to host" message. Investigating further, it appear that after the upgrade none of the iptables NAT rules are being created for the 172.30.0.x addresses. I've looked through the documentation to see if there was something that I missed in the inventory file, but I do not see anything obvious. I've since rolled back the machines to snapshot pre-upgrade and tested to verify that all networking was working, which it is. Performing the upgrade again produces the same results. Any one ever run into this before?

-- Tony Saxon
kubernetes
okd
openshift
openshift-origin

1 Answer

1/11/2019

It figures after looking at it for 24 hours and finally deciding to post about it, I would figure it out a few hours later. Looks like it was an issue with kube proxy:

https://github.com/kubernetes/kubernetes/issues/58956

I was upgrading with the latest release-3.6 branch of openshift-ansible, but apparently it's still an issue. Fixed by downgrading:

yum downgrade http://vault.centos.org/centos/7.5.1804/updates/x86_64/Packages/iptables-1.4.21-24.1.el7_5.x86_64.rpm http://vault.centos.org/centos/7.5.1804/updates/x86_64/Packages/iptables-services-1.4.21-24.1.el7_5.x86_64.rpm

-- Tony Saxon
Source: StackOverflow