Need help?

Our experts have had an average response time of 13.52 minutes in October 2021 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

CloudWatch alarm trigger without any breaching data points

by | Aug 26, 2021

Wondering Why did your CloudWatch alarm trigger without any breaching data points? We can help you with this!

As a part of our AWS Support Services, we often receive similar requests from our AWS customers.

Today, let’s see the steps followed by our Support Techs to help our customers to resolve the CloudWatch alarm trigger issue.

 

CloudWatch alarm trigger without any breaching data points

 
Amazon CloudWatch is a monitoring and observability service from AWS.CloudWatch alarms that measure time-aggregated metrics perform this measurement continuously in a rolling window.

CloudWatch alarms evaluate metrics based on data points available at a specific moment. As new values continue to flow into the CloudWatch metric, Each successive alarm evaluation might use different aggregated data points. We might be unable to see a breaching data point that triggered the alarm if that data has not flowed into the metric yet.

We can see the complete set of data points, which have now flowed into the metric by reviewing the event history later.
 

Detect breaching data point

 
We have to change the Statistic to Maximum/Minimum for detecting a breaching data point in the CloudWatch alarm metric’s graph.

Here is an example for alarm configuration:

  • Standard resolution alarm
  • Metric: CPUUtilization
  • Threshold: 60%
  • Statistic: Average
  • Period: 120 seconds
  • Evaluation Period: 1
  • Detailed Monitoring: enabled for the monitored Amazon EC2 instance.

The following values were received by the metric when the example alarm evaluation period 11:00:00 – 11:02:00 IST starts :

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336

The average of the above values is 61.58 and it breaches the threshold of 60%. So this will trigger a change to the ALARM state. The alarm’s event history lists the aggregated values exceeding the threshold as the reason for the state change.

When we again evaluate the alarm later, additional values have flowed in for the minute 11:00:00 – 11:02:00 IST.

For example:

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336
Sample-5: 11:01:45 IST, numeric value: 15.18181818181819
Sample-6: 11:00:51 IST, numeric value: 10.26490

Now the new average is 45.3 and which will not breach the threshold of 60%. So the alarm changes back to the OK state. The alarm’s event history lists the aggregated values being below the threshold as the reason for the state change.

So now we may not see the breaching data point in our CloudWatch metric’s graph. The Average is listed as 45.3 in the CPUUtilization metric’s graph.

We can see the breaching data point 94.89 at 11:00:00 IST if we change the CloudWatch metric graph’s Statistic to Maximum.

Also, we need to change the CloudWatch metric graph’s Statistic to a Minimum, if we configure the alarm to trigger when data falls below the threshold.
 

Configure an  “M out of N” alarm

 
We need to configure an “M out of N” alarm to prevent an alarm from changing to the ALARM state where the Evaluation Period and the Datapoints to Alarm have different values.

This makes alarms evaluate more number of aggregated data points and the state of the alarm changes only if at least a certain number of data points (M) is breaching in a given set of data points (N).

Here is an example for this alarm configuration:

  • Standard resolution alarm
  • Metric: CPUUtilization
  • Threshold: 60%
  • Statistic: Average
  • Period: 120 seconds
  • Evaluation Period: 2 out of 3
  • Detailed Monitoring: enabled for the monitored Amazon EC2 instance

This alarm configuration is similar to the previous one and the only difference is with the evaluation period. The evaluation period checks 2 out of 3 available data points before triggering the alarm.

The following values were received by the metric when the example alarm evaluation period 11:00:00 IST starts :

Sample-1: 11:00:05 IST, numeric value: 80.96470588235294
Sample-2: 11:00:16 IST, numeric value: 16.929612366666664
Sample-3: 11:00:27 IST, numeric value: 53.57142857142857
Sample-4: 11:01:38 IST, numeric value: 94.89033212334336

Because of the increased evaluation period, the CloudWatch looks for data points that are older than 11:00:00 IST:

10:58:00 IST, Average=41.874304539920
10:59:00 IST, Average=5.230773650991253
11:00:00 IST, Average=64.93403361344538

Here the aggregated data point at 11:00:00 IST breaches the threshold. But the alarm remains in the OK state and doesn’t change to the ALARM state. This happens because only one out of three data points breach the threshold, whereas two out of three are required to trigger the alarm.

[Need help with more AWS queries? We’d be happy to assist]
 

Conclusion

 
To conclude, today we discussed the steps followed by our Support Engineers to help our customers to fix the issue ‘CloudWatch alarm trigger without any breaching data points’.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Reviews

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF