INFO: task [TASK]:[PID] blocked for more than 120 seconds

2/26/2018

I have a Kubernetes server and recently hit a major issue where it went down for a few hours. The reason was very deceiving and I'm going to share the answer below.

To give some context, I could boot up the server and login. After around 15 seconds, everything would hang and the error,

INFO: task [TASK]:[PID] blocked for more than 120 seconds

would pop up.


Before that, I was getting the following error message,

IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready

The way I fixed that was that I logged in and entered,

sudo systemctl disable docker

When I tried to use sudo systemctl stop docker it took too long and would hang so what I decided to do was to disable it and then reboot.


After that the message,

INFO: task [TASK]:[PID] blocked for more than 120 seconds

kept popping up. It wasn't a specific task (often I found it was task cron:...) so I realised that something was blocking my IO and I needed to kill it before it killed my session.

-- Kerren
kubernetes
server
terminal
ubuntu

1 Answer

2/26/2018

I found the solution was that my backup software was running and was completely destroying my disk IO. Fortunately for me, I had installed iotop which showed me that there was read/write of 500M/s on my hard drives which is really pushing it.

So what I did was I stopped my backup service and everything was sorted. Now I know that this is probably not the same situation with you, however, you can use the same approach.

  1. Login and find what process is using up all of your disk IO.
  2. Kill the process or end the service.
  3. Disable the service (if it is a service) from starting up on the next boot.
  4. Find if there is a known bug or get a hold of support and find a way to throttle the IO of that process/service so that it doesn't cause the same issues again.
-- Kerren
Source: StackOverflow