I am unable to connect to any service I expose from my GKE cluster despite being able to see the pod up and running. Any ideas what I am doing wrong?
I have a GKE cluster which has private nodes (public master). I have it configured to assign the pods ips from 10.0.x.x and the services ips from 10.2.x.x
A terraform for the cluster is shown below
resource "google_container_cluster" "playground" {
provider = "google-beta"
name = "playground"
description = "Playground cluster"
project = "${module.playground_project.project_id}"
zone = "europe-west4-a"
min_master_version = "1.11.2-gke.9"
master_auth {
username = "admin"
password = "xxx"
}
lifecycle {
ignore_changes = ["initial_node_count", "node_config", "node_pool", "network", "subnetwork"]
}
network = "${google_compute_network.playground.self_link}"
subnetwork = "${google_compute_subnetwork.playground-gke.self_link}"
private_cluster_config {
enable_private_endpoint = false
enable_private_nodes = true
master_ipv4_cidr_block = "172.30.16.0/28"
}
master_authorized_networks_config {
cidr_blocks = [
{
cidr_block = "${var.my_ip}"
},
]
}
node_pool {
name = "default-pool" # Default empty node pool
}
ip_allocation_policy {
# create_subnetwork = true # subnetwork_name = "gke-playground"
cluster_secondary_range_name = "subnet-play-gke-pods" # 10.0.0.0/15
services_secondary_range_name = "subnet-play-gke-services" #10.2.0.0/15
}
}
resource "google_container_node_pool" "np" {
provider = "google-beta"
name = "node-pool-1"
project = "${module.playground_project.project_id}"
zone = "europe-west4-a"
cluster = "playground"
depends_on = ["google_container_cluster.playground"]
management {
auto_upgrade = true
auto_repair = true
}
lifecycle {
ignore_changes = ["node_count"]
}
# Enable this or autoscaling, not both
# node_count = 1
autoscaling {
min_node_count = 1
max_node_count = 3
}
initial_node_count = 1
node_config {
# preemptible = true
machine_type = "n1-standard-1"
disk_size_gb = "20"
disk_type = "pd-standard"
# metadata
# labels
# tags
tags = ["gke"]
labels = [
{
environment = "playground"
},
]
oauth_scopes = [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
}
}
I have setup a bastion within the VPC that can reach the nodes (verified by sshing to the nodes)
I am able to deploy an application like so:
local $ kubectl run hello --image=gcr.io/google-samples/hello-app:1.0 --port=8080
deployment.apps "hello" created
local $ kubectl get deployment hello
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hello 1 1 1 1 1m
local $ kubectl expose deployment hello --target-port=8080 --type=NodePort
service "hello" exposed
local $ kubectl describe service hello
Name: hello
Namespace: default
Labels: run=hello
Annotations: <none>
Selector: run=hello
Type: NodePort
IP: 10.2.109.113
Port: <unset> 8080/TCP
TargetPort: 8080/TCP
NodePort: <unset> 32420/TCP
Endpoints: 10.0.2.13:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Now if I connect to my bastion I can query the pod directly on 10.0.2.13:8080, which is great
bastion $ curl 10.0.2.13:8080
Hello, world!
Version: 1.0.0
Hostname: hello-68669bb559-x7zpb
But if I try to connect to the service url on 10.2.109.113 my connection times out:
bastion $ curl -vvvv --connect-timeout 10 10.2.109.113:32420
* Rebuilt URL to: 10.2.109.113:32420/
* Trying 10.2.109.113...
* TCP_NODELAY set
* Connection timed out after 10001 milliseconds
* Curl_http_done: called premature == 1
* stopped the pause stream!
* Closing connection 0
curl: (28) Connection timed out after 10001 milliseconds
A dump from gcloud with my firewall rules is below to give an idea of the firewall rules
local $gcloud compute firewall-rules list
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
default-allow-icmp default INGRESS 65534 icmp False
default-allow-internal default INGRESS 65534 tcp:0-65535,udp:0-65535,icmp False
default-allow-rdp default INGRESS 65534 tcp:3389 False
default-allow-ssh default INGRESS 65534 tcp:22 False
egress-from-bastion-to-me-over-ssh vpc-play EGRESS 1000 tcp:22 False
gke-playground-f9a5cbc4-all vpc-play INGRESS 1000 sctp,tcp,udp,icmp,esp,ah False
gke-playground-f9a5cbc4-master vpc-play INGRESS 1000 tcp:10250,tcp:443 False
gke-playground-f9a5cbc4-vms vpc-play INGRESS 1000 icmp,tcp:1-65535,udp:1-65535 False
ingress-from-bastion-to-gke-over-all vpc-play INGRESS 1000 all False
ingress-from-me-to-bastion-over-ssh vpc-play INGRESS 1000 tcp:22 False
k8s-fw-l7--fff685c495e2595e vpc-play INGRESS 1000 tcp:30000-32767 False
nat-europe-west4-a vpc-play INGRESS 1000 all False
nat-gateway-europe-west4-a-vm-ssh vpc-play INGRESS 1000 tcp:22 False
service IPs are not rout-able outside the cluster. If you want to test the service, try curl [node_IP]:32420 This will hit the nodeport on the node which will reach your service endpoint