I am trying to enable mTLS in my mesh that I have already working with istio's sidecars. The problem I have is that I just get working connections up to one point, and then it fails to connect.
This is how the services are set up right now with my failing implementation of mTLS (simplified):
Istio IngressGateway -> NGINX pod -> API Gateway -> Service A -> [ Database ] -> Service B
First thing to note is that I was using a NGINX pod as a load balancer to proxy_pass my requests to my API Gateway or my frontend page. I tried keeping that without the istio IngressGateway but I wasn't able to make it work. Then I tried to use Istio IngressGateway and connect directly to the API Gateway with VirtualService but also fails for me. So I'm leaving it like this for the moment because it was the only way that my request got to the API Gateway successfully.
Another thing to note is that Service A first connects to a Database outside the mesh and then makes a request to Service B which is inside the mesh and with mTLS enabled.
NGINX, API Gateway, Service A and Service B are within the mesh with mTLS enabled and "istioctl authn tls-check" shows that status is OK.
NGINX and API Gateway are in a namespace called "gateway", Database is in "auth" and Service A and Service B are in another one called "api".
Istio IngressGateway is in namespace "istio-system" right now.
So the problem is that everything work if I set STRICT mode to the gateway namespace and PERMISSIVE to api, but once I set STRICT to api, I see the request getting into Service A, but then it fails to send the request to Service B with a 500.
This is the output when it fails that I can see in the istio-proxy container in the Service A pod:
api/serviceA[istio-proxy]: [2019-09-02T12:59:55.366Z] "- - -" 0 - "-" "-" 1939 0 2 - "-" "-" "-" "-" "10.20.208.248:4567" outbound|4567||database.auth.svc.cluster.local 10.20.128.44:35366 10.20.208.248:4567
10.20.128.44:35364 -
api/serviceA[istio-proxy]: [2019-09-02T12:59:55.326Z] "POST /api/my-call HTTP/1.1" 500 - "-" "-" 74 90 60 24 "10.90.0.22, 127.0.0.1, 127.0.0.1" "PostmanRuntime/7.15.0" "14d93a85-192d-4aa7-aa45-1501a71d4924" "serviceA.api.svc.cluster.local:9090" "127.0.0.1:9090" inbound|9090|http-serviceA|serviceA.api.svc.cluster.local - 10.20.128.44:9090 127.0.0.1:0 outbound_.9090_._.serviceA.api.svc.cluster.local
No messages in ServiceB though.
Currently, I do not have a global MeshPolicy, and I am setting Policy and DestinationRule per namespace
Policy:
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
name: "default"
namespace: gateway
spec:
peers:
- mtls:
mode: STRICT
---
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
name: "default"
namespace: auth
spec:
peers:
- mtls:
mode: STRICT
---
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
name: "default"
namespace: api
spec:
peers:
- mtls:
mode: STRICT
DestinationRule:
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
name: "mutual-gateway"
namespace: "gateway"
spec:
host: "*.gateway.svc.cluster.local"
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
name: "mutual-api"
namespace: "api"
spec:
host: "*.api.svc.cluster.local"
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
name: "mutual-auth"
namespace: "auth"
spec:
host: "*.auth.svc.cluster.local"
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
Then I have some DestinationRule to disable mTLS for Database (I have some other services in the same namespace that I want to enable with mTLS) and for Kubernetes API
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: "myDatabase"
namespace: "auth"
spec:
host: "database.auth.svc.cluster.local"
trafficPolicy:
tls:
mode: DISABLE
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: "k8s-api-server"
namespace: default
spec:
host: "kubernetes.default.svc.cluster.local"
trafficPolicy:
tls:
mode: DISABLE
Then I have my IngressGateway like so:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: ingress-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway # use istio default ingress gateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- my-api.example.com
tls:
httpsRedirect: true # sends 301 redirect for http requests
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
privateKey: /etc/istio/ingressgateway-certs/tls.key
hosts:
- my-api.example.com
And lastly, my VirtualServices:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ingress-nginx
namespace: gateway
spec:
hosts:
- my-api.example.com
gateways:
- ingress-gateway.istio-system
http:
- match:
- uri:
prefix: /
route:
- destination:
port:
number: 80
host: ingress.gateway.svc.cluster.local # this is NGINX pod
corsPolicy:
allowOrigin:
- my-api.example.com
allowMethods:
- POST
- GET
- DELETE
- PATCH
- OPTIONS
allowCredentials: true
allowHeaders:
- "*"
maxAge: "24h"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: api-gateway
namespace: gateway
spec:
hosts:
- my-api.example.com
- api-gateway.gateway.svc.cluster.local
gateways:
- mesh
http:
- match:
- uri:
prefix: /
route:
- destination:
port:
number: 80
host: api-gateway.gateway.svc.cluster.local
corsPolicy:
allowOrigin:
- my-api.example.com
allowMethods:
- POST
- GET
- DELETE
- PATCH
- OPTIONS
allowCredentials: true
allowHeaders:
- "*"
maxAge: "24h"
One thing that I don't understand is why do I have to create a VirtualService for my API Gateway and why do I have to use "mesh" in the gateways block. If I remove this block, I don't get my request in API Gateway, but if I do, it works and my requests even get to the next service (Service A), but not the next one to that.
Thanks for the help. I am really stuck with this.
Dump of listeners of ServiceA:
ADDRESS PORT TYPE
10.20.128.44 9090 HTTP
10.20.253.21 443 TCP
10.20.255.77 80 TCP
10.20.240.26 443 TCP
0.0.0.0 7199 TCP
10.20.213.65 15011 TCP
0.0.0.0 7000 TCP
10.20.192.1 443 TCP
0.0.0.0 4568 TCP
0.0.0.0 4444 TCP
10.20.255.245 3306 TCP
0.0.0.0 7001 TCP
0.0.0.0 9160 TCP
10.20.218.226 443 TCP
10.20.239.14 42422 TCP
10.20.192.10 53 TCP
0.0.0.0 4567 TCP
10.20.225.206 443 TCP
10.20.225.166 443 TCP
10.20.207.244 5473 TCP
10.20.202.47 44134 TCP
10.20.227.251 3306 TCP
0.0.0.0 9042 TCP
10.20.207.141 3306 TCP
0.0.0.0 15014 TCP
0.0.0.0 9090 TCP
0.0.0.0 9091 TCP
0.0.0.0 9901 TCP
0.0.0.0 15010 TCP
0.0.0.0 15004 TCP
0.0.0.0 8060 TCP
0.0.0.0 8080 TCP
0.0.0.0 20001 TCP
0.0.0.0 80 TCP
0.0.0.0 10589 TCP
10.20.128.44 15020 TCP
0.0.0.0 15001 TCP
0.0.0.0 9000 TCP
10.20.219.237 9090 TCP
10.20.233.60 80 TCP
10.20.200.156 9100 TCP
10.20.204.239 9093 TCP
0.0.0.0 10055 TCP
0.0.0.0 10054 TCP
0.0.0.0 10251 TCP
0.0.0.0 10252 TCP
0.0.0.0 9093 TCP
0.0.0.0 6783 TCP
0.0.0.0 10250 TCP
10.20.217.136 443 TCP
0.0.0.0 15090 HTTP
Dump clusters in json format: https://pastebin.com/73zmAPWg
Dump listeners in json format: https://pastebin.com/Pk7ddPJ2
Curl command from serviceA container to serviceB:
/opt/app # curl -X POST -v "http://serviceB.api.svc.cluster.local:4567/session/xxxxxxxx=?parameters=hi"
* Trying 10.20.228.217...
* TCP_NODELAY set
* Connected to serviceB.api.svc.cluster.local (10.20.228.217) port 4567 (#0)
> POST /session/xxxxxxxx=?parameters=hi HTTP/1.1
> Host: serviceB.api.svc.cluster.local:4567
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server
* Connection #0 to host serviceB.api.svc.cluster.local left intact
curl: (52) Empty reply from server
If I disable mTLS, request gets from serviceA to serviceB with Curl
General tips for debugging Istio service mesh: