My old windows pods are dead and don't respond to http requests / exec fails

12/25/2019

I have an AKS cluster with a mix of Windows and Linux nodes and an nginx-ingress.

This all worked great, but a few days ago all my windows pods have become unresponsive.

Everything is still green on the K8s dashboard, but they don't respond to HTTP requests and kubectl exec fails.

All the linux pods still work.

I created a new deployment with the exact same image and other properties, and this new pod works, responds to HTTP and kubectl exec works.

Q: How can I find out why my old pods died? How can I prevent this from occuring again in the future?

Note that this is a test cluster, so I have the luxury of being able to investigate, if this was prod I would have burned and recreated the cluster already.


Details:

https://aks-test.progress-cloud.com/eboswebApi/ is one of the old pods, https://aks-test.progress-cloud.com/eboswebApi2/ is the new pod.

When I look at the nginx log, I see a lot of connect() failed (111: Connection refused) while connecting to upstream.

When I try kubectl exec -it <podname> --namespace <namespace> -- cmd I get one of two behaviors:

Either the command immediatly returns without printing anything, or I get an error:

container 1dfffa08d834953c29acb8839ea2d4c6b78b7a530371d98c16b15132d49f5c52 encountered an error during CreateProcess: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf) extra info: {"CommandLine":"cmd","WorkingDirectory":"C:\\inetpub\\wwwroot","Environment":{...},"EmulateConsole":true,"CreateStdInPipe":true,"CreateStdOutPipe":true,"ConsoleSize":[0,0]}
command terminated with exit code 126

kubectl describe pod works on both.

The only difference I could find was that on the old pod, I don't get any events:

Events:          <none>

whereas on the new pod I get a bunch of them for pulling the image etc:

Events:
  Type     Reason     Age   From                     Message
  ----     ------     ----  ----                     -------
  Normal   Scheduled  39m   default-scheduler        Successfully assigned ingress-basic/ebos-webapi-test-2-78786968f4-xmvfw to aksnpwin000000
  Warning  Failed     38m   kubelet, aksnpwin000000  Error: failed to start container "ebos-webapi-test-2": Error response from daemon: hcsshim::CreateComputeSystem ebos-webapi-test-2: The binding handle is invalid.
(extra info: {"SystemType":"Container","Name":"ebos-webapi-test-2","Owner":"docker","VolumePath":"\\\\?\\Volume{dac026db-26ab-11ea-bb33-e3730ff9432d}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ebos-webapi-test-2","Layers":[{"ID":"8c160b6e-685a-58fc-8c4b-beb407ad09b4","Path":"C:\\ProgramData\\docker\\windowsfilter\\12061f29088664dc41c0836c911ed7ced1f6d7ed38b1c932c25cd8ca85a3a88e"},{"ID":"6a230a46-a97c-5e30-ac4a-636e62cd9253","Path":"C:\\ProgramData\\docker\\windowsfilter\\8c0ce5a9990bc433c4d937aa148a4251ef55c1aa7caccf1b2025fd64b4feee97"},{"ID":"240d5705-d8fe-555b-a966-1fc304552b64","Path":"C:\\ProgramData\\docker\\windowsfilter\\2b334b769fe19d0edbe1ad8d1ae464c8d0103a7225b0c9e30fdad52e4b454b35"},{"ID":"5f5d8837-5f62-5a76-a706-9afb789e45e4","Path":"C:\\ProgramData\\docker\\windowsfilter\\3d1767755b0897aaae21e3fb7b71e2d880de22473f0071b0dca6301bb6110077"},{"ID":"978503cb-b816-5f66-ba41-ed154db333d5","Path":"C:\\ProgramData\\docker\\windowsfilter\\53d2e85a90d2b8743b0502013355df5c5e75448858f0c1f5b435281750653520"},{"ID":"d7d0d14e-b097-5104-a492-da3f9396bb06","Path":"C:\\ProgramData\\docker\\windowsfilter\\38830351b46e7a0598daf62d914eb2bf01e6eefde7ac560e8213f118d2bd648c"},{"ID":"90b1c608-be4c-55a1-a787-db3a97670149","Path":"C:\\ProgramData\\docker\\windowsfilter\\84b71fda82ea0eacae7b9382eae2a26f3c71bf118f5c80e7556496f21e754126"},{"ID":"700711b2-d578-5d7c-a17f-14165a5b3507","Path":"C:\\ProgramData\\docker\\windowsfilter\\08dd6f93c96c1ac6acd3d2e8b60697340c90efe651f805809dbe87b6bd26a853"},{"ID":"270de12a-461c-5b0c-8976-a48ae0de2063","Path":"C:\\ProgramData\\docker\\windowsfilter\\115de87074fadbc3c44fc33813257c566753843f8f4dd7656faa111620f71f11"},{"ID":"521250bb-4f30-5ac4-8fcd-b4cf45866627","Path":"C:\\ProgramData\\docker\\windowsfilter\\291e51f5f030d2a895740fae3f61e1333b7fae50a060788040c8d926d46dbe1c"},{"ID":"6dded7bf-8c1e-53bb-920e-631e78728316","Path":"C:\\ProgramData\\docker\\windowsfilter\\938e721c29d2f2d23a00bf83e5bc60d92f9534da409d0417f479bd5f06faa080"},{"ID":"90dec4e9-89fe-56ce-a3c2-2770e6ec362c","Path":"C:\\ProgramData\\docker\\windowsfilter\\d723ebeafd1791f80949f62cfc91a532cc5ed40acfec8e0f236afdbcd00bbff2"},{"ID":"94ac6066-b6f3-5038-9e1b-d5982fcefa00","Path":"C:\\ProgramData\\docker\\windowsfilter\\00d1bb6fc8abb630f921d3651b1222352510d5821779d8a53d994173a4ba1126"},{"ID":"037c6d16-5785-5bea-bab4-bc3f69362e0c","Path":"C:\\ProgramData\\docker\\windowsfilter\\c107cf79e8805e9ce6d81ec2a798bf4f1e3b9c60836a40025272374f719f2270"}],"ProcessorWeight":5000,"HostName":"ebos-webapi-test-2-78786968f4-xmvfw","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\c44f445c-272b-11ea-b9bc-ae0ece5532e1\\volumes\\kubernetes.io~secret\\default-token-n5tnc","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"4c9bede623553673fde0da6e8dc92f9a55de1ff823a168a35623ad8128f83ecb"})
  Normal  Pulling  38m (x2 over 38m)  kubelet, aksnpwin000000  Pulling image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
  Normal  Pulled   38m (x2 over 38m)  kubelet, aksnpwin000000  Successfully pulled image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
  Normal  Created  38m (x2 over 38m)  kubelet, aksnpwin000000  Created container ebos-webapi-test-2
  Normal  Started  38m                kubelet, aksnpwin000000  Started container ebos-webapi-test-2
-- Lukas Rieger
azure-aks
kubernetes

0 Answers