How to troubleshoot performance graph problems in Nagios?

Please Note: This article is part of our historical archive. Because it was published a while ago, some of the information, links, or context may now be outdated.

Webmasters often notice problems with the performance graph in Nagios that it does not display the correct data.

As a part of our Server Management Services, we help our Customers to fix Nagios errors regularly.

Let us today discuss the possible causes and fixes for this error.

What are the performance graph problems in Nagios?

Users often notice that the performance graphs in Nagios are not displaying data when their checks are returning true performance data.

With the performance data feature enabled, Nagios generates performance graphs, that are updated automatically with the execution of a single check.

This delivers us the “performance data” and it stores the results within RRD databases.

In the RRD Databases, Datasources are at fixed positions. However, after updates of Nagios checks the number or the names of data sources of a check result may change.

This leads to the problem, that the performance graph is not growing/ updating anymore.

How to resolve performance graph problems in Nagios?

Our Support specialists here have developed a systematic analysis approach to troubleshooting the performance graph problems in Nagios. Let us have a look at the step by step.

Check that Performance Data is enabled

The first step in this process is to make sure that Performance data is enabled.

For this, navigate to Admin > System Information > Monitoring Engine Status

Ensure that the Performance Data process is green.

performance graph problems in Nagios?

Count The Amount Of Spooled Files

Nagios spools performance data into small files. Sometimes it stops processing these files and these files begin to spool up.

The following commands will count the number of files in these locations:

# ls /usr/local/nagios/var/spool/perfdata/ | wc -l
# ls /usr/local/nagios/var/spool/xidpe/ | wc -l

If the number of files is greater than 20,000, it is more likely for the processes to get caught in a loop. Thus, we will need to delete them.

To delete a large number of files in a directory, execute this command:

# find /usr/local/nagios/var/spool/perfdata/ -type f -delete

After deleting the files, wait approximately thirty minutes to see if performance graphs start to work.

Increase Performance Data Logging Verbosity

If deleting the spooled files doesn’t help, we need to increase the Performance Data Logging Verbosity.

Edit the following file from an SSH session and change the LOG_LEVEL value fro 0 to 2

/usr/local/nagios/etc/pnp/process_perfdata.cfg

The process_perfdata.pl script should now log all errors and debug information to the file /usr/local/nagios/var/perfdata.log. We can watch it using this command:

# tail -f /usr/local/nagios/var/perfdata.log

Look for any errors, incorrect exit codes, and/or timeouts.

Remember to return this value to its default setting after completing.

A common error found in this log is the typical timeout error. To resolve it temporarily, we can increase the performance data processor’s timeout range by changing the TIMEOUT field in the process_perfdata.cfg file.

Increase NPCD Logging Verbosity

NPCD is a bulk processing tool which reaps and processes the performance data. To increase its logging verbosity edit the following file in an SSH session and change the log_level field from 0 to -1:

/usr/local/nagios/etc/pnp/npcd.cfg

Now, restart the NPCD service using the restart command.

Also. remember to return this value to its default setting after completing troubleshooting.

NPCD should now log all errors and debug information to the file /usr/local/nagios/var/npcd.log file. We can watch it using this command:

# tail -f /usr/local/nagios/var/npcd.log

A common error that we may find in the log file is the one indicating that we are hitting a load threshold.

We can increase this threshold by editing the following file and changing the load_threshold value to a higher one:

/usr/local/nagios/etc/pnp/npcd.cfg

Check Nagios User Account

In some situations, the Nagios user account can expire causing issues like this to occur. Thus, we can run this command to see if the Nagios user account expired:

# chage -l nagios

We can enable the expired Nagios user with the command below

# chage -I -1 -m 0 -M 99999 -E -1 nagios

[Need any further assistance in fixing Nagios errors? – We’re available 24*7]

Conclusion

In short, performance graphs in Nagios often do not displays the correct data even though their checks return true performance data. Today, we saw how our Support Engineers fix this error.

How to troubleshoot performance graph problems in Nagios?

What are the performance graph problems in Nagios?

How to resolve performance graph problems in Nagios?

Check that Performance Data is enabled

Count The Amount Of Spooled Files

Increase Performance Data Logging Verbosity

Increase NPCD Logging Verbosity

Check Nagios User Account

Conclusion

Submit a Comment Cancel reply

Subscribe to our newsletter

Footer newsletter