CloudWatch alarm trigger without any breaching data points

by Jiji Jose | Published on August 26, 2021

Wondering Why did your CloudWatch alarm trigger without any breaching data points? We can help you with this!

As a part of our AWS Support Services, we often receive similar requests from our AWS customers.

Today, let’s see the steps followed by our Support Techs to help our customers to resolve the CloudWatch alarm trigger issue.

CloudWatch alarm trigger without any breaching data points

Amazon CloudWatch is a monitoring and observability service from AWS.CloudWatch alarms that measure time-aggregated metrics perform this measurement continuously in a rolling window.

CloudWatch alarms evaluate metrics based on data points available at a specific moment. As new values continue to flow into the CloudWatch metric, Each successive alarm evaluation might use different aggregated data points. We might be unable to see a breaching data point that triggered the alarm if that data has not flowed into the metric yet.

We can see the complete set of data points, which have now flowed into the metric by reviewing the event history later.

Detect breaching data point

We have to change the Statistic to Maximum/Minimum for detecting a breaching data point in the CloudWatch alarm metric’s graph.

Here is an example for alarm configuration:

Standard resolution alarm
Metric: CPUUtilization
Threshold: 60%
Statistic: Average
Period: 120 seconds
Evaluation Period: 1
Detailed Monitoring: enabled for the monitored Amazon EC2 instance.

The following values were received by the metric when the example alarm evaluation period 11:00:00 – 11:02:00 IST starts :

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336Copy Code

The average of the above values is 61.58 and it breaches the threshold of 60%. So this will trigger a change to the ALARM state. The alarm’s event history lists the aggregated values exceeding the threshold as the reason for the state change.

When we again evaluate the alarm later, additional values have flowed in for the minute 11:00:00 – 11:02:00 IST.

For example:

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336
Sample-5: 11:01:45 IST, numeric value: 15.18181818181819
Sample-6: 11:00:51 IST, numeric value: 10.26490Copy Code

Now the new average is 45.3 and which will not breach the threshold of 60%. So the alarm changes back to the OK state. The alarm’s event history lists the aggregated values being below the threshold as the reason for the state change.

So now we may not see the breaching data point in our CloudWatch metric’s graph. The Average is listed as 45.3 in the CPUUtilization metric’s graph.

We can see the breaching data point 94.89 at 11:00:00 IST if we change the CloudWatch metric graph’s Statistic to Maximum.

Also, we need to change the CloudWatch metric graph’s Statistic to a Minimum, if we configure the alarm to trigger when data falls below the threshold.

Configure an “M out of N” alarm

We need to configure an “M out of N” alarm to prevent an alarm from changing to the ALARM state where the Evaluation Period and the Datapoints to Alarm have different values.

This makes alarms evaluate more number of aggregated data points and the state of the alarm changes only if at least a certain number of data points (M) is breaching in a given set of data points (N).

Here is an example for this alarm configuration:

Standard resolution alarm
Metric: CPUUtilization
Threshold: 60%
Statistic: Average
Period: 120 seconds
Evaluation Period: 2 out of 3
Detailed Monitoring: enabled for the monitored Amazon EC2 instance

This alarm configuration is similar to the previous one and the only difference is with the evaluation period. The evaluation period checks 2 out of 3 available data points before triggering the alarm.

The following values were received by the metric when the example alarm evaluation period 11:00:00 IST starts :

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336Copy Code

Because of the increased evaluation period, the CloudWatch looks for data points that are older than 11:00:00 IST:

10:58:00 IST, Average=41.874304539920
10:59:00 IST, Average=5.230773650991253
11:00:00 IST, Average=64.93403361344538Copy Code

Here the aggregated data point at 11:00:00 IST breaches the threshold. But the alarm remains in the OK state and doesn’t change to the ALARM state. This happens because only one out of three data points breach the threshold, whereas two out of three are required to trigger the alarm.

[Need help with more AWS queries? We’d be happy to assist]

Conclusion

To conclude, today we discussed the steps followed by our Support Engineers to help our customers to fix the issue ‘CloudWatch alarm trigger without any breaching data points’.

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

Software Development

Server Management

CloudWatch alarm trigger without any breaching data points

CloudWatch alarm trigger without any breaching data points

Detect breaching data point

Configure an “M out of N” alarm

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

0 Comments

Submit a Comment Cancel reply

Outsourced Support

Software Development

Cloud

Application Support

Server Management

Software Development

Server Management

CloudWatch alarm trigger without any breaching data points

CloudWatch alarm trigger without any breaching data points

Detect breaching data point

Configure an “M out of N” alarm

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

0 Comments

Submit a Comment Cancel reply

Subscribe to our newsletter & get a

10%