Azure Kubernetes Service (AKS) currently does not support availability zones. What are the ways in which we can achieve resiliency from data center failures while using AKS (Master and Worker nodes)?
One thing i can think of is to have an AKS Clusters in 2 different Azure regions, and have Traffic Manager in front of it. Any other recommendations, solutions/reference architectures are appreciated.
Ark(https://github.com/heptio/ark) Can help as opensource for managing at any level. From Azure perspective Azure Traffic Manager to route traffic to the different regions can be mix the configurations for provide an optional implementation.
Azure now supports AKS cluster distributed across availability zones. follow the link --> https://docs.microsoft.com/en-us/azure/aks/availability-zones
A Data Center-wide outage and a Region-wide outage are both low-probability disaster scenarios. So having a solution that is resilient against Data Center failures, but isn't resilient against Region-wide issues is probably not a great idea. That's why Azure didn't initially implement Availability Zones, instead using Availability Sets.
So yes, if you are planning your application seamlessly surviving a disaster, you want to fail over to a seperate region, which would mean a separate cluster and replicated data.