How to troubleshoot DNS failures with Amazon EKS

by Jiji Jose | Published on September 4, 2021 | Updated on September 4, 2021

Looking for how to troubleshoot DNS failures with Amazon EKS? We can help you with this!

As a part of our AWS Support Services, we often receive similar requests from our AWS customers.

Today, let’s see the steps followed by our Support Techs to help our customers to troubleshoot DNS failures with Amazon EKS.

DNS failures with Amazon EKS

For querying internal and external DNS records, pods running inside the EKS cluster use the CoreDNS service’s cluster IP as the default name server.

So if there is any issue with CoreDNS pods, it may result in the application DNS resolution failure.

For troubleshooting issues with CoreDNS pods, we must verify that all the components of the kube-dns service are properly working. Because the CoreDNS pods are abstracted by kube-dns.

Now let’s see the resolution applies to the CoreDNS ClusterIP 10.100.0.10.

Firstly, we need to run the following command to find the ClusterIP of the CoreDNS service:

kubectl get service kube-dns -n kube-systemCopy Code

2. Then run the following command to verify that DNS endpoints pointing to CoreDNS pods.

kubectl -n kube-system get endpoints kube-dnsCopy Code

For example, the above command results the following:

NAME ENDPOINTS AGE
kube-dns 192.166.4.219:53,192.166.3.117:53,192.168.2.228:53 + 1 more... 100dCopy Code

3. Also we should verify that while communicating with CoreDNS, the pods aren’t blocked by a security group or network ACL.

Check and verify that kube-proxy pod is working

We need to check the logs for timeout errors or 403 unauthorized errors for verifying the kube-proxy pod has access to API servers for EKS cluster.

Run the follwing command to view the kube-proxy logs:

kubectl logs -n kube-system --selector 'k8s-app=kube-proxy'Copy Code

To troubleshoot the DNS issue, connect to the application pod

Firstly, run the following command to access a shell inside the running pod:

$ kubectl exec -it your-pod-name -- shCopy Code

If we get an error similar to the following, then the application pod might not have a shell binary available.

OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
command terminated with exit code 126Copy Code

2. Now check the resolv.conf file to verify that the cluster IP of the kube-dns service is in pod’s /etc/resolv.conf

cat /etc/resolv.confCopy Code

For example, resolv.conf shows a pod that’s configured to point at 10.100.0.10 for DNS requests.

nameserver 10.100.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5Copy Code

Here the IP should match the ClusterIP of your kube-dns service.

3. Now using nslookup command, we can verify that the pod can resolve an internal domain using the default clusterIP.

nslookup kubernetes 10.100.0.10Copy Code

The result of nslookup command:

Server: 10.100.0.10
Address: 10.100.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.100.0.1Copy Code

4. By using nslookup command, we can verify that the pod can resolve an external domain using the default clusterIP.

nslookup amazon.com 10.100.0.10Copy Code

The result of nslookup command:

Server: 10.100.0.10
Address: 10.100.0.10#53
Non-authoritative answer:
Name: amazon.com
Address: 176.32.98.167
Name: amazon.com
Address: 205.251.243.104
Name: amazon.com
Address: 176.32.103.204Copy Code

5. Also by nslookup command, we can verify that the pod can resolve using the IP address of the CoreDNS pod directly.

nslookup kubernetes COREDNS_POD_IP

nslookup amazon.com COREDNS_POD_IPCopy Code

COREDNS_POD_IP: Replace this with endpoint IP.

Logs from coreDNS pods

Let’s see the steps to get detailed logs from coreDNS pods for debugging:

Using the following command we can enable the debug log of CoreDNS pods and adding the log plugin to the CoreDNS ConfigMap.

kubectl -n kube-system edit configmap corednsCopy Code

2. We need to add the log string in the editing window that shows in the output.

Also note that reloading the configuration, this can take several minutes for CoreDNS . To apply the changes immediately, we can restart the pods one by one.

kind: ConfigMap
apiVersion: v1
data:
Corefile: |
.:53 {
log # Enabling CoreDNS Logging
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
...
...Copy Code

3. Now run the below command to check if the CoreDNS logs are failing or getting any hits from the application pod.

kubectl logs --follow -n kube-system --selector 'k8s-app=kube-dns'Copy Code

Search and ndots combination

DNS uses nameserver for name resolutions usually the cluster IP of a kube-dns service. To complete a query name to a fully qualified domain name, DNS uses search. The ndots value is the number of dots that must appear in a name to resolve a query before an initial query is made.

For example, you can set the ndots option to the default value 5 in a domain name that’s not fully qualified. Then, all external domains that don’t fall under the internal domain cluster.local are appended to the search domains before querying.

Example with the /etc/resolv.conf setting of the application pod:

nameserver 10.100.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5Copy Code

The CoreDNS looks for five dots in the domain being queried.

The logs look similar to the following if the pod makes a DNS resolution call for amazon.com:

[INFO] 192.168.3.72:33238 - 36534 "A IN amazon.com.default.svc.cluster.local. udp 54 false 512" NXDOMAIN qr,aa,rd 147 0.000473434s
[INFO] 192.168.3.72:57098 - 43241 "A IN amazon.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.000066171s
[INFO] 192.168.3.72:51937 - 15588 "A IN amazon.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000137489s
[INFO] 192.168.3.72:52618 - 14916 "A IN amazon.com.ec2.internal. udp 41 false 512" NXDOMAIN qr,rd,ra 41 0.001248388s
[INFO] 192.168.3.72:51298 - 65181 "A IN amazon.com. udp 28 false 512" NOERROR qr,rd,ra 106 0.001711104sCopy Code

NXDOMAIN indicates that the domain record wasn’t found.

NOERROR indicates that the domain record was found.

Here before making the final call on the absolute domain at the end, for each search domain is prepended with amazon.com. The final domain name is appended with a dot. At the end, this makes it a fully qualified domain name. This means that for every external domain name query there could be four to five additional calls.

To fix this issue, we should either change ndots to 1 or append a dot at the end of the domain that’s queried or used.

nslookup example.com.Copy Code

VPC resolver (AmazonProvidedDNS) limits

The VPC resolver can accept only a maximum limit of 1024 packets per second per network interface.

If there is more than one CoreDNS pod is on the same worker node, then the chances of hitting this limit are higher.

We need to add the following options to the CoreDNS deployment to use PodAntiAffinity rules to schedule CoreDNS pods on separate instances:

spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostnameCopy Code

[Need help with more AWS queries? We’d be happy to assist]

Conclusion

To conclude, today we discussed the steps followed by our Support Engineers to help our customers to troubleshoot DNS failures with Amazon EKS.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

Software Development

Server Management

How to troubleshoot DNS failures with Amazon EKS

DNS failures with Amazon EKS

Check and verify that kube-proxy pod is working

To troubleshoot the DNS issue, connect to the application pod

Logs from coreDNS pods

Search and ndots combination

VPC resolver (AmazonProvidedDNS) limits

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

0 Comments

Submit a Comment Cancel reply

Related Articles

Speed issues driving customers away?
We’ve got your back!

INFORMATION

LATEST BLOG POSTS

Software Development

Server Management

How to troubleshoot DNS failures with Amazon EKS

DNS failures with Amazon EKS

Check and verify that kube-proxy pod is working

To troubleshoot the DNS issue, connect to the application pod

Logs from coreDNS pods

Search and ndots combination

VPC resolver (AmazonProvidedDNS) limits

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

Related posts:

0 Comments

Submit a Comment Cancel reply

Related Articles

Speed issues driving customers away? We’ve got your back!

Speed issues driving customers away?
We’ve got your back!