Multi-cluster communication with custom hostnames and HTTPS requests fails

1/26/2022

We are migrating a legacy system to istio service mesh in kubernetes that uses consul DNS. Services within the mesh have to be addressed in the format of e.g. https://app.service.consul. Running this request within Istio service mesh has some challenges. While we managed to make it work with service entries and a custom envoy filter, we are now stuck making it work in a istio multi-cluster setup when calling the service in the external cluster. Below is a simplified diagram of how the service is called for reference.

enter image description here Request from client app serivce when trying to reach the external cluster service fails:

curl https://static-server-https.service.consul
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443


# but should be
hello world from clsuter-b

Verbose output:

curl https://static-server-https.service.consul
*   Trying 240.240.0.1:443...
* Connected to static-server-https.service.consul (240.240.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/certs/ca/ca-bundle.pem
*  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443 
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to static-server-https.service.consul:443

Istio proxy outbound log to external cluster:

[2022-01-25T10:09:59.177Z] "- - -" 0 UF,URX - - "-" 0 0 10000 - "-" "-" "-" "-" "35.157.89.251:443" outbound|443||static-server-https.service.consul - 240.240.0.2:443 100.96.7.214:47428 - -

Request from client app serivce when trying to reach the internal cluster service succeeds:

curl https://static-server-https.service.consul
hello world from cluster-a

Istio proxy outbound log to internal cluster:

[2022-01-25T09:45:54.381Z] "- - -" 0 - - - "-" 102 191 8 - "-" "-" "-" "-" "100.70.235.145:443" outbound|443||static-server-https.service.consul 100.96.7.214:48320 240.240.0.2:443 100.96.7.214:55200 - -

Notes on istio proxy outbound logs

When comparing the istio proxy outbound logs, they look quite different from when using istio’s out of the box multi cluster communication. The following is an example log from the standard configuration, when using http://static-server-http.default.svc.cluster.local:8080 with response code 200:

[2022-01-25T10:02:40.334Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 29 7 6 "-" "curl/7.81.0-DEV" "7a46868c-7d22-9f79-89e5-b4b04835bb35" "static-server-http:8080" "18.195.240.125:15443" outbound|8080||static-server-http.default.svc.cluster.local 100.96.7.214:35468 100.96.7.223:8080 100.96.7.214:44864 - default

Compared to the log with response code 200 when using https://static-server-https.service.consul:

[2022-01-25T09:45:54.381Z] "- - -" 0 - - - "-" 102 191 8 - "-" "-" "-" "-" "100.70.235.145:443" outbound|443||static-server-https.service.consul 100.96.7.214:48320 240.240.0.2:443 100.96.7.214:55200 - -

In my personal view, this might be due to the custom envoy filter, but not sure at this point how we should interpret the difference as of now.

Notes on kiali service representation We also found that kiali renders the graph of the https configured connection with tcp instead of http. It is not known as of now why this is the case and if it is relevant.

enter image description here

Notes on the setup The setup is based on the multi-primary setup from the official documentation. I also have to note here that our setup is based on AWS infrastructure, which requires to supply the network loadbalancer ip addresses instead of its host names.

The east-est gateway for cluster communication is modified to allow traffic over the consul hostnames:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cross-network-gateway
  namespace: istio-system
spec:
  selector:
    istio: eastwestgateway
  servers:
    - port:
        number: 15443
        name: tls
        protocol: TLS
      tls:
        mode: AUTO_PASSTHROUGH
      hosts:
        - "*.local"
        - "*.service.consul"

The target service is deployed with a service entry that configures the AWS NLB IP addresses as endoints. We also have a custom envoy filter to allow the use of https in our requests.

apiVersion: v1
kind: Service
metadata:
  name: static-server-https
  namespace: default
spec:
  selector:
    app: static-server-https
  ports:
    - name: https
      protocol: TCP 
      port: 443 
      targetPort: 8080
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: static-server-https
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: static-server-https
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: static-server-https
  template:
    metadata:
      name: static-server-https
      labels:
        app: static-server-https
      annotations:
        # the certificates for the custom envoy https filter
        sidecar.istio.io/userVolumeMount: '[{"name":"consul-external-cert", "mountPath":"/etc/certs/consul-external", "readonly":true},{"name":"root-ca", "mountPath":"/etc/certs/ca", "readonly":true}]'
        sidecar.istio.io/userVolume: '[{"name":"consul-external-cert", "secret":{"secretName":"consul-wildcard"}},{"name":"root-ca", "secret":{"secretName":"ca-bundle"}}]'
    spec:
      containers:
      - name: static-server-https
        image: hashicorp/http-echo:latest
        args:
          - -text="hello world from {{.Values.mesh.cluster.this.name}}"
          - -listen=:8080
        ports:
          - containerPort: 8080
            name: http
      serviceAccountName: static-server-https
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: static-server-https
  namespace: default
spec:
  hosts:
  - static-server-https.service.consul
  location: MESH_INTERNAL
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: DNS
  endpoints: 
  # the AWS NLB addresses to reach the app in the external cluster
  - address: {{.Values.mesh.cluster.external.gateway.address1}}
    ports: 
      http: 15443 
  - address: {{.Values.mesh.cluster.external.gateway.address2}}
    ports: 
      http: 15443 
  - address: {{.Values.mesh.cluster.external.gateway.address3}}
    ports: 
      http: 15443 
  # the app endpoint for the local cluster
  - address: static-server-https.default
  subjectAltNames:
  - "spiffe://cluster.local/ns/default/sa/static-server-https"
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: static-server-https
spec:
  host: static-server-https.service.consul
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: custom-envoy-filter
spec:
  configPatches:
    - applyTo: FILTER_CHAIN
      match:
        context: SIDECAR_OUTBOUND
        listener:
          portNumber: 443
          filterChain:
            filter:
              name: "*.service.consul"
      patch:
        operation: MERGE
        value:
          transport_socket:
            name: tls
            typed_config:
              "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext"
              common_tls_context:
                tls_certificates:
                  - certificate_chain:
                      filename: /etc/certs/consul-external/tls.crt
                    private_key:
                      filename: /etc/certs/consul-external/tls.key
                validation_context:
                  trusted_ca:
                    filename: /etc/certs/ca/ca.crt
-- Kunal Malhotra
consul
istio
kubernetes

0 Answers