Webmasters often notice problems with the performance graph in Nagios that it does not display the correct data.
As a part of our Server Management Services, we help our Customers to fix Nagios errors regularly.
Let us today discuss the possible causes and fixes for this error.
What are the performance graph problems in Nagios?
Users often notice that the performance graphs in Nagios are not displaying data when their checks are returning true performance data.
With the performance data feature enabled, Nagios generates performance graphs, that are updated automatically with the execution of a single check.
This delivers us the “performance data” and it stores the results within RRD databases.
In the RRD Databases, Datasources are at fixed positions. However, after updates of Nagios checks the number or the names of data sources of a check result may change.
This leads to the problem, that the performance graph is not growing/ updating anymore.
How to resolve performance graph problems in Nagios?
Our Support specialists here have developed a systematic analysis approach to troubleshooting the performance graph problems in Nagios. Let us have a look at the step by step.
Check that Performance Data is enabled
The first step in this process is to make sure that Performance data is enabled.
For this, navigate to Admin > System Information > Monitoring Engine Status
Ensure that the Performance Data process is green.
Count The Amount Of Spooled Files
Nagios spools performance data into small files. Sometimes it stops processing these files and these files begin to spool up.
The following commands will count the number of files in these locations:
# ls /usr/local/nagios/var/spool/perfdata/ | wc -l
# ls /usr/local/nagios/var/spool/xidpe/ | wc -l
If the number of files is greater than 20,000, it is more likely for the processes to get caught in a loop. Thus, we will need to delete them.
To delete a large number of files in a directory, execute this command:
# find /usr/local/nagios/var/spool/perfdata/ -type f -delete
After deleting the files, wait approximately thirty minutes to see if performance graphs start to work.
Increase Performance Data Logging Verbosity
If deleting the spooled files doesn’t help, we need to increase the Performance Data Logging Verbosity.
Edit the following file from an SSH session and change the LOG_LEVEL value fro 0 to 2
/usr/local/nagios/etc/pnp/process_perfdata.cfg
The process_perfdata.pl script should now log all errors and debug information to the file /usr/local/nagios/var/perfdata.log. We can watch it using this command:
# tail -f /usr/local/nagios/var/perfdata.log
Look for any errors, incorrect exit codes, and/or timeouts.
Remember to return this value to its default setting after completing.
A common error found in this log is the typical timeout error. To resolve it temporarily, we can increase the performance data processor’s timeout range by changing the TIMEOUT field in the process_perfdata.cfg file.
Increase NPCD Logging Verbosity
NPCD is a bulk processing tool which reaps and processes the performance data. To increase its logging verbosity edit the following file in an SSH session and change the log_level field from 0 to -1:
/usr/local/nagios/etc/pnp/npcd.cfg
Now, restart the NPCD service using the restart command.
Also. remember to return this value to its default setting after completing troubleshooting.
NPCD should now log all errors and debug information to the file /usr/local/nagios/var/npcd.log file. We can watch it using this command:
# tail -f /usr/local/nagios/var/npcd.log
A common error that we may find in the log file is the one indicating that we are hitting a load threshold.
We can increase this threshold by editing the following file and changing the load_threshold value to a higher one:
/usr/local/nagios/etc/pnp/npcd.cfg
Check Nagios User Account
In some situations, the Nagios user account can expire causing issues like this to occur. Thus, we can run this command to see if the Nagios user account expired:
# chage -l nagios
We can enable the expired Nagios user with the command below
# chage -I -1 -m 0 -M 99999 -E -1 nagios
[Need any further assistance in fixing Nagios errors? – We’re available 24*7]
Conclusion
In short, performance graphs in Nagios often do not displays the correct data even though their checks return true performance data. Today, we saw how our Support Engineers fix this error.
0 Comments