Troubleshoot OpenStack Octavia LBaaS v2 ERROR

9/29/2018

I have two Ubuntu 18.04 bare metal servers. using devstack deployment I have stood up a multi-node (2 nodes) cluster where one server has the controller services and compute, while the second has only compute. In the controller node, I have enabled lbaas v2 with Octavia.

# LBaaS enable_plugin neutron-lbaas https://git.openstack.org/openstack/neutron-lbaas stable/queens enable_plugin octavia https://git.openstack.org/openstack/octavia stable/queens enable_service q-lbaasv2 octavia o-cw o-hk o-hm o-api

I've created a kubernetes cluster with 1 master and 2 minion nodes. some initial testing was successful. deploying WordPress via Helm created a load balancer and I was able to access the app as expected.

I'm now trying to set up a nginx-ingress controller. when I deploy my nginx-ingress controller LoadBalancer service, I can see the load balancer created in OpenStack. however, attempts to access the ingress controller using the external IP always result in an empty reply.

Using the CLI i can see the load balancer, pools, and members. The member entries indicate there is an error:

+---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | address | 10.0.0.9 | | admin_state_up | True | | created_at | 2018-09-28T22:15:51 | | id | 109ad896-5953-4b2b-bbc9-d251d44c3817 | | name | | | operating_status | ERROR | | project_id | 12b95a935dc3481688eb840249c9b167 | | protocol_port | 31042 | | provisioning_status | ACTIVE | | subnet_id | 1e5efaa0-f95f-44a1-a271-541197f372ab | | updated_at | 2018-09-28T22:16:33 | | weight | 1 | | monitor_port | None | | monitor_address | None | +---------------------+--------------------------------------+

However, there is no indication of what the error is. there is no corresponding error in the log that I can find.

Using kubectl port-forward I verified that the nginx ingress controller is up/running and correctly configured. the problem seems to be in the load balancer.

My question is how can I diagnose what the error is?

I found only one troubleshooting guide related to lbaas v2 and it claims I should be able to see q-lbaas- namespaces when I run: ip netns list. However, there are none defined.

Using helm --dry-run --debug the service yaml is:

# Source: nginx-ingress/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx-ingress
    chart: nginx-ingress-0.25.1
    component: "controller"
    heritage: Tiller
    release: oslb2
  name: oslb2-nginx-ingress-controller
spec:
  clusterIP: ""
  externalTrafficPolicy: "Local"
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
  selector:
    app: nginx-ingress
    component: "controller"
    release: oslb2
  type: "LoadBalancer"

Interestingly, in comparing to a previous (wordpress) LoadBalancer service that worked, i noticed that the nginx-ingress externalRoutingPolicy is set to Local, while wordpress specified Cluster. I changed the values.yaml for the nginx-ingress chart to set externalRoutingPolicy to Cluster and now the load balancer is working.

We'd like to keep the policy at "Local" to preserve source IPs. Any thoughts on why it doesn't work?

-- jmer
internal-load-balancer
kubernetes
nginx
openstack

1 Answer

10/1/2018

It turns out I was barking up the wrong tree (apologies). There is no issue with the load balancer.

The problem stems from Kubernetes inability to match the minion/worker hostname with its node name. The nodes take the short form of the hostname, e.g.: k8s-cluster-fj7cs2gokrnz-minion-1 while kube-proxy does the look-up based on the fully qualified name: k8s-cluster-fj7cs2gokrnz-minion-1.novalocal

i found this in the log for kube-proxy:

Sep 27 23:26:20 k8s-cluster-fj7cs2gokrnz-minion-1.novalocal runc[2205]: W0927 23:26:20.050146       1 server.go:586]
 Failed to retrieve node info: nodes "k8s-cluster-fj7cs2gokrnz-minion-1.novalocal" not found
Sep 27 23:26:20 k8s-cluster-fj7cs2gokrnz-minion-1.novalocal runc[2205]: W0927 23:26:20.050241       1 proxier.go:463] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP

This has the effect of making Kubernetes fail to find "Local" endpoints for LoadBalancer (or other) services. When you specify externalTrafficPolicy: "Local" K8s will drop packets since it i) is restricted to routing only to endpoints local to the node and ii) it believes there are no local endpoints.

other folks who have enountered this issue configure kube-proxy with hostname-override to make the two match up.

-- jmer
Source: StackOverflow