How do you reconnect VHDs if Azure's Storage Account drops off?

4/7/2017

I was asked to post this here by Azure's Twitter Support (instead of on ServerFault.com).

Our Kubernetes environment has been working wonderfully for over a week without needing changes, with 24 VHDs all using Container Services on Azure.

Then we suddenly receive alerts that all services have stopped working. All pods using Persistent Volume Claims are stuck on ContainerCreating. A quick kubectl describe pod podname shows:

Unable to mount volumes for pod "***-1370023040-st581_default(9b050936-1baa-11e7-9b77-000d3ab513dc)": timeout expired waiting for volumes to attach/mount for pod "default"/"***-1370023040-st581". list of unattached/unmounted volumes=[***-persistent-storage]

and

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"***-1370023040-st581". list of unattached/unmounted volumes=[***-persistent-storage]

on all of the pods.

In the Azure Portal I can see against the agent that there is only the Agent OS VHD attached as a Disk. Manual attempts to add the disks fail with:

Failed to update disks for the virtual machine 'k8s-agent-CD93CDEA-0'. Error: A disk named '***mgmt-dynamic-pvc-018bdc6e-161a-11e7-8ca8-000d3ab513dc.vhd' already uses the same VHD URL …https://***.blob.core.windows.net/vhds/***mgmt-dynamic-pvc-018bdc6e-161a-11e7-8ca8-000d3ab513dc.vhd ….

Restarting the agent/master also doesn't clear the problem.

We are using an F16S for the agent which supports 32 data disks.

How do you reattach the VHDs to get going again?

-- Metalshark
azure-container-service
kubernetes

1 Answer

4/9/2017

It must have been a system outage in Azure as they are coming back again on their own (almost 48 hours of outage with it billing us for the resources!).

Turns out you have to pay for support in Azure for their system outages.

The Twitter support team created a free ticket. Their telephone support confirmed this was an engineering issue.

-- Metalshark
Source: StackOverflow