Health Check Failures on Amazon ECS Tasks on AWS Fargate? We can help you.
Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.
Today, let us see how we can troubleshoot this issue.
Health Check Failures on Amazon ECS Tasks on AWS Fargate
Generally, the error has two variants. They are as following:
(service AWS-service) (port 8080) is unhealthy in (target-grouparn:uxyztargetgroup/aws-targetgroup/123456789) due to (reason Health checks failed with these codes: ) or [request timeout]
(service AWS-Service) (port 8080) is unhealthy in target-group tf-20190411170 due to (reason Health checks failed)
In case we receive either of the above errors, our Support Techs recommend the following troubleshooting methods:
- If the container maps to port 80, we confirm that the container security group allows inbound traffic on port 80 for the load balancer.
- We need to make sure that we have the correct configuration of ping port value for the load balancer health. If not, then the load balancer could de-register the container from itself.
- Define a minimum health check grace period.
- We can monitor the CPU and memory metrics of the service.
- Check the application logs for application errors.
- Check if the ping port and the health check path configure correctly.
- We need to ensure the backend database connection is successful.
Troubleshoot 504 errors
A 504 error can be due to any of the following reasons:
- Load balancer fails to establish a connection to the target before the connection timeout expired.
- Load balancer establishes a connection to the target. However, the target didn’t respond before the idle timeout period elapsed.
- The network access control list for the subnet didn’t allow traffic from the targets to the load balancer nodes on the ephemeral ports.
(service AWS-Service) (port 8080) is unhealthy in target-group due to (reason Health checks failed with these codes:
In this case, our Support Techs recommend the steps below:
- First, we confirm there is a successful response from the backend without delay.
- Then we set the response time out value correctly. A lower value can fail the health check.
- After that, we check the access logs of the load balancer for more information about errors.
Troubleshoot failed container health checks
This error means the service does not integrate with the load balancer.
However, the containers in the task use health checks that the service can’t pass:
(service AWS-Service) (task ff3e71a4-d7e5-428b-9232-2345657889) failed container health checks
In this case, we suggest to:
- Confirm that the command that we pass to the container is correct and has the right syntax.
- Check the application logs and Amazon CloudWatch logs if the task runs for a while.
[Need help with the troubleshooting? We are here for you]
In short, we saw how our Support Techs troubleshoot health check failures.