How to troubleshoot performance graph problems in Nagios?

by Arya MA | Published on November 18, 2020

Webmasters often notice problems with the performance graph in Nagios that it does not display the correct data.

As a part of our Server Management Services, we help our Customers to fix Nagios errors regularly.

Let us today discuss the possible causes and fixes for this error.

What are the performance graph problems in Nagios?

Users often notice that the performance graphs in Nagios are not displaying data when their checks are returning true performance data.

With the performance data feature enabled, Nagios generates performance graphs, that are updated automatically with the execution of a single check.

This delivers us the “performance data” and it stores the results within RRD databases.

In the RRD Databases, Datasources are at fixed positions. However, after updates of Nagios checks the number or the names of data sources of a check result may change.

This leads to the problem, that the performance graph is not growing/ updating anymore.

How to resolve performance graph problems in Nagios?

Our Support specialists here have developed a systematic analysis approach to troubleshooting the performance graph problems in Nagios. Let us have a look at the step by step.

Check that Performance Data is enabled

The first step in this process is to make sure that Performance data is enabled.

For this, navigate to Admin > System Information > Monitoring Engine Status

Ensure that the Performance Data process is green.

performance graph problems in Nagios?

Count The Amount Of Spooled Files

Nagios spools performance data into small files. Sometimes it stops processing these files and these files begin to spool up.

The following commands will count the number of files in these locations:

# ls /usr/local/nagios/var/spool/perfdata/ | wc -l
# ls /usr/local/nagios/var/spool/xidpe/ | wc -lCopy Code

If the number of files is greater than 20,000, it is more likely for the processes to get caught in a loop. Thus, we will need to delete them.

To delete a large number of files in a directory, execute this command:

# find /usr/local/nagios/var/spool/perfdata/ -type f -deleteCopy Code

After deleting the files, wait approximately thirty minutes to see if performance graphs start to work.

Increase Performance Data Logging Verbosity

If deleting the spooled files doesn’t help, we need to increase the Performance Data Logging Verbosity.

Edit the following file from an SSH session and change the LOG_LEVEL value fro 0 to 2

/usr/local/nagios/etc/pnp/process_perfdata.cfgCopy Code

The process_perfdata.pl script should now log all errors and debug information to the file /usr/local/nagios/var/perfdata.log. We can watch it using this command:

# tail -f /usr/local/nagios/var/perfdata.logCopy Code

Look for any errors, incorrect exit codes, and/or timeouts.

Remember to return this value to its default setting after completing.

A common error found in this log is the typical timeout error. To resolve it temporarily, we can increase the performance data processor’s timeout range by changing the TIMEOUT field in the process_perfdata.cfg file.

Increase NPCD Logging Verbosity

NPCD is a bulk processing tool which reaps and processes the performance data. To increase its logging verbosity edit the following file in an SSH session and change the log_level field from 0 to -1:

/usr/local/nagios/etc/pnp/npcd.cfgCopy Code

Now, restart the NPCD service using the restart command.

Also. remember to return this value to its default setting after completing troubleshooting.

NPCD should now log all errors and debug information to the file /usr/local/nagios/var/npcd.log file. We can watch it using this command:

# tail -f /usr/local/nagios/var/npcd.logCopy Code

A common error that we may find in the log file is the one indicating that we are hitting a load threshold.

We can increase this threshold by editing the following file and changing the load_threshold value to a higher one:

/usr/local/nagios/etc/pnp/npcd.cfgCopy Code

Check Nagios User Account

In some situations, the Nagios user account can expire causing issues like this to occur. Thus, we can run this command to see if the Nagios user account expired:

# chage -l nagiosCopy Code

We can enable the expired Nagios user with the command below

# chage -I -1 -m 0 -M 99999 -E -1 nagiosCopy Code

[Need any further assistance in fixing Nagios errors? – We’re available 24*7]

Conclusion

In short, performance graphs in Nagios often do not displays the correct data even though their checks return true performance data. Today, we saw how our Support Engineers fix this error.

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

Software Development

Server Management

How to troubleshoot performance graph problems in Nagios?

What are the performance graph problems in Nagios?

How to resolve performance graph problems in Nagios?

Check that Performance Data is enabled

Count The Amount Of Spooled Files

Increase Performance Data Logging Verbosity

Increase NPCD Logging Verbosity

Check Nagios User Account

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

0 Comments

Submit a Comment Cancel reply

Outsourced Support

Software Development

Cloud

Application Support

Server Management

Software Development

Server Management

How to troubleshoot performance graph problems in Nagios?

What are the performance graph problems in Nagios?

How to resolve performance graph problems in Nagios?

Check that Performance Data is enabled

Count The Amount Of Spooled Files

Increase Performance Data Logging Verbosity

Increase NPCD Logging Verbosity

Check Nagios User Account

Conclusion

PREVENT YOUR SERVER FROM CRASHING!

0 Comments

Submit a Comment Cancel reply

Subscribe to our newsletter & get a

10%