I have a Kubernetes server and recently hit a major issue where it went down for a few hours. The reason was very deceiving and I'm going to share the answer below.
To give some context, I could boot up the server and login. After around 15 seconds, everything would hang and the error,
INFO: task [TASK]:[PID] blocked for more than 120 seconds
would pop up.
Before that, I was getting the following error message,
IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
The way I fixed that was that I logged in and entered,
sudo systemctl disable docker
When I tried to use sudo systemctl stop docker
it took too long and would hang so what I decided to do was to disable it and then reboot.
After that the message,
INFO: task [TASK]:[PID] blocked for more than 120 seconds
kept popping up. It wasn't a specific task (often I found it was task cron:...
) so I realised that something was blocking my IO and I needed to kill it before it killed my session.
I found the solution was that my backup software was running and was completely destroying my disk IO. Fortunately for me, I had installed iotop
which showed me that there was read/write of 500M/s on my hard drives which is really pushing it.
So what I did was I stopped my backup service and everything was sorted. Now I know that this is probably not the same situation with you, however, you can use the same approach.