Need help?

Our experts have had an average response time of 12.14 minutes in September 2021 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

ECS task stuck in the PENDING state – Let us Troubleshoot

by | Aug 19, 2021

ECS task stuck in the PENDING state can be a result of an unresponsive docker daemon or large docker image, etc.

Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.

Today, let us see how we can resolve this issue.

 

ECS task stuck in the PENDING state

An ECS task can be stuck in the PENDING state due to several reasons. That includes:

  • An unresponsive Docker daemon
  • Large Docker image
  • The Amazon ECS container agent lost connectivity with the Amazon ECS service in the middle of a task launch
  • The Amazon ECS container agent takes a long time to stop an existing task

In order to avoid errors while we run the AWS CLI commands, we make sure we have the most recent AWS CLI version.

 

How to resolve this?

Moving ahead, let us see a few troubleshooting steps our Support Techs employ to find why the task is stuck in the PENDING state.

The Docker daemon is unresponsive

  • For CPU issues:

1. We use Amazon CloudWatch metrics to check if the container instance exceeded the maximum CPU.

2. If necessary, we increase the size of the container instance.

  • For memory issues:

1. We run the free command to see the available memory for the system.

2. Then we increase the size of the container instance as needed.

  • For I/O issues:

1. Initially, we run the iotop command.

2. This will give an idea of the services that use the most IOPS. Then, we distribute these tasks to distinct container instances using task placement constraints and strategies.

-or-

We use CloudWatch to create an alarm for the Amazon EBS BurstBalance metrics. Then, we use an AWS Lambda function or a custom logic to balance tasks.

The Docker image is large

Larger images increase the amount of time the task is in the PENDING state.

To speed up the transition time, we tune the ECS_IMAGE_PULL_BEHAVIOR parameter to take advantage of image caching.

The Amazon ECS container agent lost connectivity with the Amazon ECS service in the middle of a launch

1. To verify the status and connectivity of the Amazon ECS container agent, we run the following commands.

For Amazon Linux 1:

$ sudo status ecs
$ sudo docker ps -f name=ecs-agent

For Amazon Linux 2:

$ sudo systemctl status ecs
$ sudo docker ps -f name=ecs-agent

2. Then we view metadata on running tasks in the ECS container instance via:

$ curl http://localhost:51678/v1/metadata
{
"Cluster": "CLUSTER_ID",
"ContainerInstanceArn": "arn:aws:ecs:REGION:ACCOUNT_ID:container-instance/TASK_ID",
"Version": "Amazon ECS Agent - AGENT "
}

3. In addition, to view information on running tasks, we run:

$ curl http://localhost:51678/v1/tasks
{
"Tasks": [
{
"Arn": "arn:aws:ecs:REGION:ACCOUNT_ID:task/TASK_ID",
"DesiredStatus": "RUNNING",
"KnownStatus": "RUNNING",
... ...
}
]
}

4. If the issue relates to a disconnected agent, we restart the container agent with either of the following commands:

For Amazon Linux 1:

$ sudo stop ecs
$ sudo start ecs

For Amazon Linux 2:

$ sudo systemctl stop ecs
$ sudo systemctl start ecs
ecs start/running, process xxxx

5. To determine agent connectivity, check the following logs for keywords such as “error,” “warn,” or “agent transition state”:

  • Amazon ECS container agent log at /var/log/ecs/ecs-agent.log.yyyy-mm-dd-hh.
  • Amazon ECS init log at /var/log/ecs/ecs-init.log.
  • Finally, the Docker logs at /var/log/docker.

The Amazon ECS container agent takes a long time to stop an existing task

The agent won’t start new tasks if the Amazon ECS container agent has older tasks to stop.

Generally, there are two parameters to control container stop and start timeout at the container instance level.

1. In /etc/ecs/ecs.config, we can set the value of the ECS_CONTAINER_STOP_TIMEOUT parameter to the amount of time to pass before the containers are forcibly killed if they don’t exit normally on their own.

2. In /etc/ecs/ecs.config, we can set the value of the ECS_CONTAINER_START_TIMEOUT parameter to the amount of time that to pass before the Amazon ECS container agent stops trying to start the container.

[Need help with the procedures? We’d be happy to assist]

 

Conclusion

In short, we saw how our Support Techs fix the ECS task stuck error.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF