DNS Resolution problems in Kubernetes can be a headache to solve, it is indeed a bit hard to find the root cause. Furthermore, when you are using private endpoints to access Azure resources such as storage accounts or databases and for some reason dns resolution stops working then it is a big problem.
But don’t panic, le me share with you some tips to solve this problem:
- Connect to a pod and use the command nslookup to verify if the name resolution problem is happening only inside your kubernetes cluster.
- Check if the service principal of your AKS cluster is not expired and working correctly, sometimes this can lead to kubernetes services not working correctly.
- If you use private endpoints for your aks deployments this means being dependent on azure private resolutions, so make sure you are using the right private zone, and that its correctly connected to concerned vnet via vnet link, eventually check dns forward configurations if you are making resolution from on-prem.
- There is a bug in Linux related to DNS resolution using UDP and TCP packets, in order to solve it you only have to create a new config map called coredns-custom and force tcp name resolution, this solved a DNS resolution problem in one of our clusters after upgrading aks version (Racing udp packets. Known DNS issue on linux where UDP packets sent in parallel have race conditions that may cause timeouts. This should not be an issue after the 2021-01-04 release which moves to transparent mode).
- Try to upgrade the AKS nodes image https://docs.microsoft.com/en-us/azure/aks/node-image-upgrade
- Upgrade your AKS cluster to a newer version.
- If that didn’t work you might have to redeploy your kubernetes cluster, but I would highly advise you to contact Microsoft Support first, they are very helpful.
So I hope these tips are useful to you, and good luck!