Ephemeral Storage usage in AKS

5/22/2019

I have a simple 3-node cluster created using AKS. Everything has been going fine for 3 months. However, I'm starting to have some disk space usage issues that seem related to the Os disks attached to each nodes.

I have no error in kubectl describe node and all disk-related checks are fine. However, when I try to run kubectl logs on some pods, I sometimes get "no space left on device".

How can one manage storage used in those disks? I can't seem to find a way to SSH into those nodes as it seems to only be manageable via Azure CLI / web interface. Is there also a way to clean what takes up this space (I assume unused docker images would take place, but I was under the impression that those would get cleaned automatically...)

-- X. Math
azure-aks
kubernetes

2 Answers

5/23/2019

Generally, the AKS nodes just run the pods or other resources for you, the data is stored in other space just like remote storage server. In Azure, it means managed disks and Azure file Share. You can also store the growing data in the nodes, but you need to configure big storage for each node and I don't think it's a good way.

To SSH into the AKS nodes, there are ways. One is that set the NAT rule manually for the node which you want to SSH into in the load balancer. Another is that create a pod as the jump box and the steps here.

The last point is that the AKS will delete the unused images regularly and automatically. It's not recommended to delete the unused images manually.

-- Charles Xu
Source: StackOverflow

5/23/2019

Things you can do to fix this:

  1. Create AKS with bigger OS disk (I usually use 128gb)
  2. Upgrade AKS to a newer version (this would replace all the existing vms with new ones, so they won't have stale docker images on them)
  3. Manually clean up space on nodes
  4. Manually extend OS disk on nodes (will only work until you scale\upgrade the cluster)

I'd probably go with option 1, else this problem would haunt you forever :(

-- 4c74356b41
Source: StackOverflow