Container Instances for Amazon ECS Disconnected? We can help you.
Generally, these change events are normal. However, if the container agent remains disconnected, then it can’t operate as part of the ECS cluster.
The agent is disconnected when agentConnected returns false.
Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.
Today, let us see how we can fix this issue.
Container Instances for Amazon ECS Disconnected
This can occur due to the following:
- Networking issues prevent communication between the instance and Amazon ECS.
- The container agent doesn’t have the required AWS IAM permissions to communicate with Amazon ECS endpoints.
- Problems with the host or Docker service inside the container instance.
Resolution
Moving ahead, let us see how our Support Techs fix this issue.
These steps apply to Amazon ECS-optimized Amazon Linux 2 AMIs.
Verify that the Docker service is running on the container instance
1. To verify that the Docker service runs on the affected container instance, we run:
sudo systemctl status docker
docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2019-06-28 03:23:52 UTC; 1 day 12h ago Docs: https://docs.docker.com Process: 5519 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS) Process: 5509 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS) Main PID: 5531 (dockerd) Tasks: 60 Memory: 55.4M CGroup: /system.slice/docker.service ├─5531 /usr/bin/dockerd --default-ulimit nofile=1024:4096 ├─5570 docker-containerd --config /var/run/docker/containerd/containerd.toml ├─5782 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/... ├─6006 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/... └─6284 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/...
If the Docker service is inactive, then we restart the Docker service:
sudo systemctl restart docker
2. Eventually, to start the container agent, we run:
sudo systemctl start ecs
Verify that the container agent is running on the container instance
To verify that the container agent runs on the affected container instance, we run:
sudo systemctl status ecs
ecs.service - Amazon Elastic Container Service - container agent Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2019-06-29 15:45:57 UTC; 4min 5s ago Docs: https://aws.amazon.com/documentation/ecs/ Process: 18896 ExecStopPost=/usr/libexec/amazon-ecs-init post-stop (code=exited, status=0/SUCCESS) Process: 18818 ExecStop=/usr/libexec/amazon-ecs-init stop (code=exited, status=0/SUCCESS) Process: 19422 ExecStartPre=/usr/libexec/amazon-ecs-init pre-start (code=exited, status=0/SUCCESS) Main PID: 19455 (amazon-ecs-init) Tasks: 7 Memory: 2.7M CGroup: /system.slice/ecs.service └─19455 /usr/libexec/amazon-ecs-init start
Suppose the output fails to show the active status of the service. Then we restart it:
sudo systemctl restart ecs
Review log files for the container agent and Docker
The container instance may still be disconnected. In that case, we review the log files on the container host for the container agent and Docker.
To output the log files for the container agent and Docker, we run:
sudo journalctl -u ecs
sudo journalctl -u docker
in addition, we run the Amazon ECS logs collector to collect log information from the container instance.
Verify if the IAM instance profile has the necessary permissions
In case the container agent is still disconnected, we verify the IAM instance profile associated with the container instance has the necessary IAM permissions.
1. To do so, we connect to the instance via SSH.
2. To view the instance metadata on the instance profile associated with the instance, we run:
curl http://xxx.xxx.xxx.xxx/latest/meta-data/iam/info
{ "Code" : "Success", "LastUpdated" : "2019-06-29T15:47:03Z", "InstanceProfileArn" : "arn:aws:iam::1122334455:instance-profile/ecsInstanceRole", "InstanceProfileId" : "AIPAJ5WF3LZVY7PLUHV72" }
3. Then we ensure that the IAM role contains the correct permissions for the container instances.
4. To verify specific credential errors with the container agent and to check the container agent log for a list of ECS logs, we run:
cat /var/log/ecs/ecs-agent.log.YYYY-MM-DD-**
We will receive the following error if the container agent doesn’t have the necessary credentials:
2019-06-29T16:10:09Z [ERROR] Unable to register as a container instance with ECS: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f 2019-06-29T16:10:09Z [ERROR] Error re-registering: AccessDeniedException: User: arn:aws:sts::1122334455:assumed-role/ecsInstanceRole/i-0052b2e858b1891ef is not authorized to perform: ecs:RegisterContainerInstance on resource: arn:aws:ecs:us-east-1:1122334455:cluster/exampleCluster status code: 400, request id: 0b73e260-5088-4688-a425-6f35f1ef440f
[Need help with the fix? We’d be happy to assist]
Conclusion
In short, we saw how our Support Techs fix the Amazon ECS error for our customers.
0 Comments