What I try to do: EKS with both Linux and Windows (2019) nodes, nginx pod on Linux should access IIS pod on Windows.
The issue: The Windows pods don't start.
Log:
E0526 10:59:31.963644 4392 pod_workers.go:186] Error syncing pod b35e92cc-7fa2-11e9-b07b-0ac0c740dc70 ("phoenix-57b76c578c-cczs2_kaltura(b35e92cc-7fa2-11e9-b07b-0ac0c740dc70)"), skipping: failed to "KillPodSandbox" for "b35e92cc-7fa2-11e9-b07b-0ac0c740dc70" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"phoenix-57b76c578c-cczs2_kaltura\" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address"
I0526 10:59:37.049583 5020 proxier.go:117] Hns Endpoint resource, {"ID":"9638A3AE-DCB9-4F85-B682-9D2879E09D98","Name":"Ethernet","VirtualNetwork":"82363D68-76A8-4225-8EFC-76F179330CC1","VirtualNetworkName":"vpcbr0a05d9b85b68","Policies":[{"Type":"L2Driver"}],"MacAddress":"00:11:22:33:44:55","IPAddress":"172.31.32.190","PrefixLength":20,"IsRemoteEndpoint":true}
I0526 10:59:37.051589 5020 proxier.go:117] Hns Endpoint resource, {"ID":"8A4C02B1-537B-4650-ADC5-BA24598E3ABA","Name":"Ethernet","VirtualNetwork":"82363D68-76A8-4225-8EFC-76F179330CC1","VirtualNetworkName":"vpcbr0a05d9b85b68","Policies":[{"Type":"L2Driver"}],"MacAddress":"00:11:22:33:44:55","IPAddress":"172.31.36.90","PrefixLength":20,"IsRemoteEndpoint":true}
E0526 10:59:37.064582 5020 proxier.go:1034] Policy creation failed: hnsCall failed in Win32: The provided policy configuration is invalid or missing parameters. (0x803b000d)
E0526 10:59:37.064582 5020 proxier.go:1018] Endpoint information not available for service kaltura/phoenix:https. Not applying any policy
E0526 10:59:38.433836 4392 kubelet_network.go:102] Failed to ensure that nat chain KUBE-MARK-DROP exists: error creating chain "KUBE-MARK-DROP": executable file not found in %PATH%:
E0526 10:59:39.362013 4392 helpers.go:735] eviction manager: failed to construct signal: "allocatableMemory.available" error: system container "pods" not found in metrics
W0526 10:59:39.362013 4392 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
E0526 10:59:48.965710 4392 cni.go:280] Error deleting network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
E0526 10:59:48.965710 4392 remote_runtime.go:115] StopPodSandbox "04961285217a628c589467359f6ff6335355c73fdd61f3c975215105a6c307f6" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "phoenix-57b76c578c-cczs2_kaltura" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
E0526 10:59:48.965710 4392 kuberuntime_manager.go:799] Failed to stop sandbox {"docker" "04961285217a628c589467359f6ff6335355c73fdd61f3c975215105a6c307f6"}
E0526 10:59:48.965710 4392 kuberuntime_manager.go:594] killPodWithSyncResult failed: failed to "KillPodSandbox" for "b35e92cc-7fa2-11e9-b07b-0ac0c740dc70" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"phoenix-57b76c578c-cczs2_kaltura\" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address"
E0526 10:59:48.965710 4392 pod_workers.go:186] Error syncing pod b35e92cc-7fa2-11e9-b07b-0ac0c740dc70 ("phoenix-57b76c578c-cczs2_kaltura(b35e92cc-7fa2-11e9-b07b-0ac0c740dc70)"), skipping: failed to "KillPodSandbox" for "b35e92cc-7fa2-11e9-b07b-0ac0c740dc70" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"phoenix-57b76c578c-cczs2_kaltura\" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address"
E0526 10:59:49.368785 4392 helpers.go:735] eviction manager: failed to construct signal: "allocatableMemory.available" error: system container "pods" not found in metrics
W0526 10:59:49.368785 4392 helpers.go:808] eviction manager: no observation found for eviction signal nodefs.inodesFree
kubectl -n kaltura describe pods phoenix-695b5bdff8-zzbq6
Name: phoenix-695b5bdff8-zzbq6
Namespace: kaltura
Priority: 0
PriorityClassName: <none>
Node: ip-10-10-12-97.us-east-2.compute.internal/10.10.12.97
Start Time: Tue, 28 May 2019 12:30:48 +0300
Labels: app.kubernetes.io/instance=kaltura-core
app.kubernetes.io/name=phoenix
pod-template-hash=2516168994
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/phoenix-695b5bdff8
Containers:
kaltura:
Container ID:
Image: <my-account-id>.dkr.ecr.us-east-2.amazonaws.com/vfd1-phoenix:latest
Image ID:
Port: 8040/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Liveness: http-get http://:80/tvp_api delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:80/tvp_api delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
TCM_SECTION: kaltura-core
TCM_URL: https://10.10.12.99
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jdd98 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-jdd98:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jdd98
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=windows
kaltura.role=api
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 113s (x1707 over 7h27m) kubelet, ip-10-10-12-97.us-east-2.compute.internal Pod sandbox changed, it will be killed and re-created.
Deployment yaml (from helm):
apiVersion: v1
kind: Service
metadata:
name: phoenix
labels:
app.kubernetes.io/name: phoenix
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 443
protocol: TCP
name: https
selector:
app.kubernetes.io/name: phoenix
app.kubernetes.io/instance: {{ .Release.Name }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: phoenix
labels:
app.kubernetes.io/name: phoenix
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: phoenix
app.kubernetes.io/instance: {{ .Release.Name }}
template:
metadata:
labels:
app.kubernetes.io/name: phoenix
app.kubernetes.io/instance: {{ .Release.Name }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.env.repository }}/{{ .Values.env.tag }}-phoenix:latest"
imagePullPolicy: Always
env:
- name: TCM_SECTION
value: {{ .Values.env.tag }}
ports:
- name: http
containerPort: 8040
protocol: TCP
livenessProbe:
httpGet:
path: /tvp_api
port: 80
readinessProbe:
httpGet:
path: /tvp_api
port: 80
strategy:
type: RollingUpdate
maxUnavailable: 1
nodeSelector:
kaltura.role: api
beta.kubernetes.io/os: windows
beta.kubernetes.io/arch: amd64
Additionally to this pod I have an nginx pod running on linux nodes, that pod is load-balanced using aws-alb-ingress-controller.
Solved. Apparently the vpc admission webhook was defined on the default namespace while my deployment of windows pods was on a different namespace.