I have a simple 3-node cluster created using AKS. Everything has been going fine for 3 months. However, I'm starting to have some disk space usage issues that seem related to the Os disks attached to each nodes.
I have no error in kubectl describe node and all disk-related checks are fine. However, when I try to run kubectl logs on some pods, I sometimes get "no space left on device".
How can one manage storage used in those disks? I can't seem to find a way to SSH into those nodes as it seems to only be manageable via Azure CLI / web interface. Is there also a way to clean what takes up this space (I assume unused docker images would take place, but I was under the impression that those would get cleaned automatically...)
Generally, the AKS nodes just run the pods or other resources for you, the data is stored in other space just like remote storage server. In Azure, it means managed disks and Azure file Share. You can also store the growing data in the nodes, but you need to configure big storage for each node and I don't think it's a good way.
To SSH into the AKS nodes, there are ways. One is that set the NAT rule manually for the node which you want to SSH into in the load balancer. Another is that create a pod as the jump box and the steps here.
The last point is that the AKS will delete the unused images regularly and automatically. It's not recommended to delete the unused images manually.
Things you can do to fix this:
I'd probably go with option 1, else this problem would haunt you forever :(