Are you finding it difficult to change the status of nodes to Ready in EKS? Well, worry not! Bobcares is here to help!
If you are looking for a way to change the status of your worker nodes back to Ready, you are in the right place. One of our customers had a similar issue and our team was able to get them back on track in a jiffy.
Let us today discuss how our Support Engineers perform this.
How to Change Status of Nodes to Ready in EKS?
In case your worker nodes are in Unknown or NotReady status, you will not be able to schedule pods. In order to schedule pods, the node has to be in Ready status. Let’s see what our Support Techs have to say about changing the status of the node from Unknown or NotReady status to Ready status.
In case the node is currently in the DIsPRessure, PIDPressure, or MemoryPressure status, you have to manage the resources so that additional pods can be scheduled on the node. Additionally, if the node status is NetworkUnavailable, you will have to configure the network on the node.
How to Change Status of Nodes to Ready in EKS :Verify kube-proxy pods and aws-node pods
If the node status is NotReady, it is not available for scheduling pods.
In order to improve security posture, the managed node group does not attach the CNI or Container Network Interface policy to the node role’s ARN or Amazon Resource Name.
- Initially, run this command to check if the aws-node is in the error state:
$ kubectl get pods -n kube-system -o wide
2. Next, verify the status of the kube-proxy pods and the aws-node with the following command:
$kubectl get pods -n kube-system -o wide
3. Then review the output from the first step. In case the node status is normal, the kube-proxy pods and the aws-node would be in Running status.
For instance:
$kubectl get pods -n kube-system -o wide
NAME | READY | STATUS | RESTARTS | AGE | IP | NODE |
aws-node-qvqr2 | 1/1 | Running | 0 | 4h31m | 192.168.54.115 | ip-192-168-54-115.ec2.internal |
kube-proxy-292b4 | 1/1 | Running | 0 | 4h31m | 192.168.54.115 | ip-192-168-54-115.ec2.internal |
Run the following command if the status of either one of the pods is not Running:
$ kubectl describe pod yourPodName -n kube-system
4. In order to obtain additional information from the kube-proxy pod and aws-node logs, run this command:
$kubectl logs yourPodName -n kube-system
In other words, these logs as well as the events from the describe output exhibit why the pods are not in Running status. In order for the node to have Ready status, both the kube-proxy pods and aws-node have to have Running status on that specific node.
However, the names of the pods may differ as seen in the example above.
5. Run the following commands if the kube-proxy pods and the aws-node are not listed after following Step 1:
$ kubectl describe daemonset aws-node -n kube-system
$kubectl describe daemonset kube-proxy -n kube-system
6. Review the output of Step 4’s commands to understand what prevents the pods from being started.
7. Additionally, check that the versions of kube-proxy and aws-node are compatible as per AWS guidelines with the cluster version. For instance, the following commands are helpful to check the pod versions:
$ kubectl describe daemonset aws-node –namespace kube-system | grep Image | cut -d “/” -f 2
$ kubectl get daemonset kube-proxy –namespace kube-system -o=jsonpath='{$.spec.template.spec.containers[:1].image}’
In addition, the node may also be in Unknown status. Therefore, this shows that the kubelet on the node and the control plane with the right status of the node is unable to communicate.
Change Status of Nodes to Ready in EKS: Troubleshoot Nodes in Unknown Status
Complete the following steps to troubleshoot nodes that are in Unknown status
- Verify the network configuration between the control plane and the nodes
- Check the kubelet status
- Verify that the Amazon EC2 API endpoint is reachable
Verify the network configuration between the control plan and the nodes
- Make sure that no ACL or network access control list rules on the subnet block traffic between the worker nodes and the Amazon EKS control plane.
- Check whether the security groups for the nodes and the control plane are in compliance with minimum inbound and outbound requirements.
- This is an optional step. Check if the nodes are configured to use a proxy. Make sure that traffic is passing through to the API server endpoints.
- To check if the node has access to the API server, run this command from within the worker node:
$nc -vz 9FCF4EA77D81408ED82517B9BB7E60D52.ylr.eu-north-1.eks.amazonaws.com 443
Connection to 9FCF4EA77D81408ED82517B9BB7E60D52.ylr.eu-north-1.eks.amazonaws.com 443 port [tcp/https] succeeded!
Remember to use your API server endpoint while running the command.
5. Ensure that the route tables are accurately configured. This will allow communication with the API server endpoint via NAT gateway or an internet gateway. In case the cluster uses PrivateOnly networking, check if the VPC endpoints are configured accurately.
Verify the kubelet status
- Connect to the affected worker node via SSH
- Run the following command to check kubelet logs:
$journalctl -u kubelet > kubelet.log
The kubelet.log consists of information on different kubelet operations. Moreover, this can be used to identify the root cause behind the node status issue.
If the kubelet logs do not have the required information, run the following command to verify kubelet status on the worker node:
$sudo systemctl status kubelet
kubelet.service -Kubernetes Kubelet
Loaded: loaded (/etc/systemd/syste,/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc.systemd/system/kubelet.service.d
10-eksclt.al2.conf
7 Active: inactive (dead) since Thu 2020-12-04 07:58:43 UTC; 40s ago
Run the following command to restart the kubelet if it is not in Running status:
$ sudo systemctl restart kubelet
Verify that the Amazon EC2 API endpoint is reachable
- Connect to a worker node via SSH.
- Run the following command to verify if the Amazon EC2 API endpoint for that specific AWS Region is reachable:
$nc -vz ec2.<region>.amazonaws.com 443
Connection to ec2.us-east-1.amazonaws.com 443 port [tcp/https] succeeded!
Verify the Worker Node Instance Profile & the ConfigMap
- Verify that the worker node instance profile has all of the recommended policies.
- Check that the worker node instance role is in the aws-auth ConfigMap with this command:
$ kubectl get cm aws-auth -n kube-system -o yaml
There will be an entry for the worker node instance IAM or AWS Identity and Access Management role in the ConfigMap. For instance”
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
-rolearn: <ARN of isntance role (not isntance profile)>
username: system:node:{ {EC2PrivateDNSName}}
groups:
-system:bootstrappers
-system: nodes
[Need help with more AWS queries?Bobcares is here to assist you]
Conclusion
These suggestions from the Support Techs at Bobcares will help you change the status of nodes to Ready in EKS in no time.
PREVENT YOUR SERVER FROM CRASHING!
Never again lose customers to poor server speed! Let us help you.
Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.
0 Comments