I have an AKS cluster with a mix of Windows and Linux nodes and an nginx-ingress.
This all worked great, but a few days ago all my windows pods have become unresponsive.
Everything is still green on the K8s dashboard, but they don't respond to HTTP requests and kubectl exec
fails.
All the linux pods still work.
I created a new deployment with the exact same image and other properties, and this new pod works, responds to HTTP and kubectl exec
works.
Q: How can I find out why my old pods died? How can I prevent this from occuring again in the future?
Note that this is a test cluster, so I have the luxury of being able to investigate, if this was prod I would have burned and recreated the cluster already.
Details:
https://aks-test.progress-cloud.com/eboswebApi/ is one of the old pods, https://aks-test.progress-cloud.com/eboswebApi2/ is the new pod.
When I look at the nginx log, I see a lot of connect() failed (111: Connection refused) while connecting to upstream
.
When I try kubectl exec -it <podname> --namespace <namespace> -- cmd
I get one of two behaviors:
Either the command immediatly returns without printing anything, or I get an error:
container 1dfffa08d834953c29acb8839ea2d4c6b78b7a530371d98c16b15132d49f5c52 encountered an error during CreateProcess: failure in a Windows system call: The remote procedure call failed and did not execute. (0x6bf) extra info: {"CommandLine":"cmd","WorkingDirectory":"C:\\inetpub\\wwwroot","Environment":{...},"EmulateConsole":true,"CreateStdInPipe":true,"CreateStdOutPipe":true,"ConsoleSize":[0,0]}
command terminated with exit code 126
kubectl describe pod
works on both.
The only difference I could find was that on the old pod, I don't get any events:
Events: <none>
whereas on the new pod I get a bunch of them for pulling the image etc:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned ingress-basic/ebos-webapi-test-2-78786968f4-xmvfw to aksnpwin000000
Warning Failed 38m kubelet, aksnpwin000000 Error: failed to start container "ebos-webapi-test-2": Error response from daemon: hcsshim::CreateComputeSystem ebos-webapi-test-2: The binding handle is invalid.
(extra info: {"SystemType":"Container","Name":"ebos-webapi-test-2","Owner":"docker","VolumePath":"\\\\?\\Volume{dac026db-26ab-11ea-bb33-e3730ff9432d}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\ebos-webapi-test-2","Layers":[{"ID":"8c160b6e-685a-58fc-8c4b-beb407ad09b4","Path":"C:\\ProgramData\\docker\\windowsfilter\\12061f29088664dc41c0836c911ed7ced1f6d7ed38b1c932c25cd8ca85a3a88e"},{"ID":"6a230a46-a97c-5e30-ac4a-636e62cd9253","Path":"C:\\ProgramData\\docker\\windowsfilter\\8c0ce5a9990bc433c4d937aa148a4251ef55c1aa7caccf1b2025fd64b4feee97"},{"ID":"240d5705-d8fe-555b-a966-1fc304552b64","Path":"C:\\ProgramData\\docker\\windowsfilter\\2b334b769fe19d0edbe1ad8d1ae464c8d0103a7225b0c9e30fdad52e4b454b35"},{"ID":"5f5d8837-5f62-5a76-a706-9afb789e45e4","Path":"C:\\ProgramData\\docker\\windowsfilter\\3d1767755b0897aaae21e3fb7b71e2d880de22473f0071b0dca6301bb6110077"},{"ID":"978503cb-b816-5f66-ba41-ed154db333d5","Path":"C:\\ProgramData\\docker\\windowsfilter\\53d2e85a90d2b8743b0502013355df5c5e75448858f0c1f5b435281750653520"},{"ID":"d7d0d14e-b097-5104-a492-da3f9396bb06","Path":"C:\\ProgramData\\docker\\windowsfilter\\38830351b46e7a0598daf62d914eb2bf01e6eefde7ac560e8213f118d2bd648c"},{"ID":"90b1c608-be4c-55a1-a787-db3a97670149","Path":"C:\\ProgramData\\docker\\windowsfilter\\84b71fda82ea0eacae7b9382eae2a26f3c71bf118f5c80e7556496f21e754126"},{"ID":"700711b2-d578-5d7c-a17f-14165a5b3507","Path":"C:\\ProgramData\\docker\\windowsfilter\\08dd6f93c96c1ac6acd3d2e8b60697340c90efe651f805809dbe87b6bd26a853"},{"ID":"270de12a-461c-5b0c-8976-a48ae0de2063","Path":"C:\\ProgramData\\docker\\windowsfilter\\115de87074fadbc3c44fc33813257c566753843f8f4dd7656faa111620f71f11"},{"ID":"521250bb-4f30-5ac4-8fcd-b4cf45866627","Path":"C:\\ProgramData\\docker\\windowsfilter\\291e51f5f030d2a895740fae3f61e1333b7fae50a060788040c8d926d46dbe1c"},{"ID":"6dded7bf-8c1e-53bb-920e-631e78728316","Path":"C:\\ProgramData\\docker\\windowsfilter\\938e721c29d2f2d23a00bf83e5bc60d92f9534da409d0417f479bd5f06faa080"},{"ID":"90dec4e9-89fe-56ce-a3c2-2770e6ec362c","Path":"C:\\ProgramData\\docker\\windowsfilter\\d723ebeafd1791f80949f62cfc91a532cc5ed40acfec8e0f236afdbcd00bbff2"},{"ID":"94ac6066-b6f3-5038-9e1b-d5982fcefa00","Path":"C:\\ProgramData\\docker\\windowsfilter\\00d1bb6fc8abb630f921d3651b1222352510d5821779d8a53d994173a4ba1126"},{"ID":"037c6d16-5785-5bea-bab4-bc3f69362e0c","Path":"C:\\ProgramData\\docker\\windowsfilter\\c107cf79e8805e9ce6d81ec2a798bf4f1e3b9c60836a40025272374f719f2270"}],"ProcessorWeight":5000,"HostName":"ebos-webapi-test-2-78786968f4-xmvfw","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\c44f445c-272b-11ea-b9bc-ae0ece5532e1\\volumes\\kubernetes.io~secret\\default-token-n5tnc","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"4c9bede623553673fde0da6e8dc92f9a55de1ff823a168a35623ad8128f83ecb"})
Normal Pulling 38m (x2 over 38m) kubelet, aksnpwin000000 Pulling image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Pulled 38m (x2 over 38m) kubelet, aksnpwin000000 Successfully pulled image "progress.azurecr.io/eboswebapi:release-2019-11-11_16-41"
Normal Created 38m (x2 over 38m) kubelet, aksnpwin000000 Created container ebos-webapi-test-2
Normal Started 38m kubelet, aksnpwin000000 Started container ebos-webapi-test-2