Wondering how to resolve eksctl issues with Amazon EKS clusters? We can help you.
Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.
Today, let us see steps followed by our Support Techs to resolve eksctl issues.
How to resolve eksctl issues with Amazon EKS clusters?
Today, let us see the solutions provided by our Support Techs for eksctl issue.
Specify kubelet bootstrap options
By default, eksctl creates a bootstrap script and adds it to the launch template that the worker nodes run during the bootstrap process.
To specify your own kubelet bootstrap options, use the overrideBootstrapCommand specification to override the eksctl bootstrap script.
Use the overrideBootstrapCommand for managed and self-managed node groups.
Config file specification:
managedNodeGroups:
name: custom-ng
ami: ami-0e124de4755b2734d
securityGroups:
attachIDs: ["sg-1234"]
maxPodsPerNode: 80
ssh:
allow: true
volumeSize: 100
volumeName: /dev/xvda
volumeEncrypted: true
disableIMDSv1: true
overrideBootstrapCommand: |#!/bin/bash/etc/eks/bootstrap.sh managed-cluster --kubelet-extra-args '--node-labels=eks.amazonaws.com/nodegroup=custom-ng,eks.amazonaws.com/nodegroup-image=ami-0e124de4755b2734d'
Note: You can use overrideBootstrapCommand only when using a custom AMI.
If you don’t specify an AMI ID, then cluster creation fails.
A custom AMI ID wasn’t specified
If you don’t specify a custom AMI ID when you create managed node groups, then EKS uses an Amazon EKS optimized AMI and a bootstrap script by default.
To use an Amazon EKS optimized AMI and also have custom user data to specify bootstrap parameters, you can specify the AMI ID in your managed node group configuration.
To get the latest AMI ID for the latest Amazon EKS optimized AMI, run the following command:
aws ssm get-parameter –name /aws/service/eks/optimized-ami/1.21/amazon-linux-2/recommended/image_id –region Region –query “Parameter.Value” –output text
Note: Replace Region with your AWS Region.
Resolve operation timeout issues
You’re creating a node and receive the following error:
waiting for at least 1 node(s) to become ready in “nodegroup”
When you create an EKS node group with eksctl, the eksctl CLI connects to the API server to continuously check for the Kubernetes node status.
The CLI waits for the nodes to move to Ready state and eventually times out if the nodes fail to move.
The following are reasons why the nodes fail to move to Ready state:
Firstly, the kubelet can’t communicate or authenticate with the EKS API server endpoint during the bootstrapping process.
Then, the aws-node and kube-proxy pods are not in Running state.
The Amazon Elastic Compute Cloud (Amazon EC2) worker node user data wasn’t successfully run.
The kubelet can’t communicate with the EKS API server endpoint
If the kubelet can’t communicate with the EKS API server endpoint during the bootstrapping process, then get the EKS API server endpoint.
Run the following command on your worker node:
curl -k https://123456DC0A12EC12DE0C12BC312FCC1A.yl4.us-east-1.eks.amazonaws.com
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
"reason": "Forbidden",
"details": {
},
"code": 403
}
The preceding command should return the HTTP 403 status code.
If the command times out, you might have a network connectivity issue between the EKS API server and worker nodes.
To resolve the connectivity issue, complete one of the following steps that relates to your use case:
1. If the worker nodes are in a private subnet, then check that the EKS API server endpoint is in Private or Public and Private access mode.
2. If the EKS API server endpoint is set to Private, then you must apply certain rules for the private hosted zone to route traffic to the API server.
The Amazon Virtual Private Cloud (Amazon VPC) attributes enableDnsHostnames and enableDnsSupport must be set to True.
Also, the DHCP options set for the Amazon VPC must include AmazonProvideDNS in its domain list.
3. Then, if you created the node group in public subnets, then make sure that the subnets’ IPv4 public addressing attribute is set to True.
If you don’t set the attribute to True, then the worker nodes aren’t assigned a public IP address and can’t access the internet.
Check if the Amazon EKS cluster security group allows ingress requests to port 443 from the worker node security group.
The kubelet can’t authenticate with the EKS API server endpoint
If the kubelet can’t authenticate with the EKS API server endpoint during the bootstrapping process, then complete the following steps.
1. Firstly, run the following command to verify that the worker node has access to the STS endpoint:
telnet sts.region.amazonaws.com 443
Note: Replace region with your AWS Region.
2. Then, make sure that the worker node’s IAM role was added to the aws-auth ConfigMap.
Note: For Microsoft Windows node groups, you must add an additional eks:kube-proxy-windows RBAC group to the mapRoles section for the node group IAM role.
The aws-node and kube-proxy pods aren’t in Running state
To check whether the aws-node and kube-proxy pods are in Running state, run the following command:
kubectl get pods -n kube-system
If the aws-node pod is in Failing state, then check the connection between the worker node and the Amazon EC2 endpoint:
ec2.region.amazonaws.com
Note: Replace region with your AWS Region.
Check that the AWS managed policies AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryReadOnly are attached to the node group’s IAM role.
If the nodes are in a private subnet, then you must configure Amazon ECR VPC endpoints to allow image pulls from Amazon Elastic Container Registry (Amazon ECR).
If you use IRSA for your Amazon VPC CNI, then attach the AmazonEKS_CNI_Policy AWS managed policy to the IAM role that the aws-node pods use. You must also attach the policy to the node group’s IAM role without IRSA.
The EC2 worker node user data wasn’t successfully run
To check whether any errors occurred when the user data was run, review thecloud-init logs at /var/log/cloud-init.log and /var/log/cloud-init-output.log.
To gather more information, run the EKS Logs Collector script on the worker nodes.
[Need help with the procedure? We’d be glad to assist you]
Conclusion
In short, we saw how our Support Techs resolve eksctl issues with Amazon EKS clusters.
PREVENT YOUR SERVER FROM CRASHING!
Never again lose customers to poor server speed! Let us help you.
Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.
None of these possible solutions seems to help my symptom.
2022-05-03 21:23:41 [ℹ] waiting for at least 1 node(s) to become ready in “spotnodes120”
Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in “spotnodes120”
But, the cluster node is up and running and “Ready” along with the aws-node and kube-proxy pods.
In fact I can use the cluster and make deployments while the eksctl continues to wait for a node to become ready.
What else might be the problem?
Hi,
Our Experts can help you with the issue, we’ll be happy to talk to you on chat (click on the icon at right-bottom).