‘Last check time not updating in Nagios XI’ arises due to many reasons that include connection issues to backend historical database, crashed database tables & so on.
Here at Bobcares, we have seen several such Nagios-related errors as part of our Server Management Services for web hosts and online service providers.
Today we’ll take a look at how to fix this Nagios error.
Why is the last check time not updating in Nagios XI
Before we get into the solution part of this error, let’s first see what causes this problem to arise.
Often, this problem arises due to the connection issue to the backend historical database. Also, crashed database tables, core scheduling/check execution issues, or lack of resources cause this error to occur.
How we troubleshoot the problem ‘last check time not updating Nagios XI’
Now let’s see how our Support Engineers troubleshoot this problem in Nagios.
The troubleshooting step is mainly to verify if the checks are actually being scheduled and executed. If they are not then it is usually an issue with the Nagios Core engine. If they are then it is most likely a database issue.
However, the easiest way to verify this is to check the Nagios Core web frontend to see if the “Last Check” time is updating. For that, browse the below link:
http://<server_ip_or_hostname>/nagios/
If an object is currently experiencing issues with “Last Check” times then check any of the details. In case, the Core interface displays accurate “Last Check” times, proceed to Step 2 below. If the Core interface is experiencing the same issues as the XI interface then follow Step 1 below.
1. The check is failing to be scheduled or executed
Issues with the Nagios Core auto-rescheduler directives:
Initially, with the introduction of the auto_rescheduling feature in Nagios Core 4.0.8 there were a few bugs. Those affected by this bug will notice the nagios.log file filled with errors pertaining to rescheduled checks.
Originally, the new directives added to nagios.cfg could cause rescheduled checks to never execute, and instead be continuously rescheduled. The original /usr/local/nagios/etc/nagios.cfg directives were:
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=180
In order to resolve this issue, reduce the auto_rescheduling_window to 45.
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
After making the above changes are made to nagios.cfg, restart Nagios Core using one of the commands below:
RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14
# service nagios restart
RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18
# systemctl restart nagios.service
Resource Issues forcing the rescheduling of checks:
If the system ulimit settings are too restrictive, checks may be orphaned and forced to reschedule. Normally, this behavior is identified by checking the nagios.log file for lines similar to:
[1331905537] Warning: The check of service ‘SERVICE’ on host ‘NAMESERVER’ looks like it WAS orphaned (results never Came back). I’m scheduling an immediate check of the service … [1331755699] Warning: The check of service ‘SWAP’ on host ‘nameserver’ not could be due to Performed to fork () error ‘Resource temporarily unavailable’. The check will be rescheduled.
In case, many of these lines exist in nagios.log, perform the following tasks to increase the kernel ulimts:
Edit the file /etc/security/limits.conf and define/update the following settings:
#locked memory * hard memlock 128 * soft memlock 128 #open files * soft nofile 10000 * hard nofile 10000 root hard nofile 10000 root soft nofile 10000 #max user processes * hard nproc 4096 * soft nproc 4096 #stack size * hard stack 20480 * soft stack 20480
If the setting does not exist then add the line. After making the changes save the file and restart the server.
Once the reboot completes, execute the following command to verify that the new settings are in place:
# ulimit -a
2. ndo2db is failing to insert the check result into the “Nagios” MySQL database.
Presence of crashed tables in the Nagios database:
Identification of crashed tables can be done by checking the MySQL/MariaDB logs located at:
/var/log/mysqld.log
or for MariaDB:
/var/log/mariadb/
The relevant errors should resemble:
141127 10:40:24 [ERROR] /usr/libexec/mysqld: Table ‘./nagios/nagios_logentries’ is marked as crashed and last (automatic?) repair failed
So, run the following commands to repair the tables.
# cd /usr/local/nagiosxi/scripts/
# ./repair_databases.sh
After following all the above steps, ensure that multiple Nagios processes are not running. In order to check it run the below command.
# ps -ef | grep nagios.cfg | grep -v grep
The following output is healthy:
nagios 5713 1 0 08:40 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg nagios 5723 5713 0 08:40 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
In the above output, there is only one PID 5713 that is the parent process.
The second line has the PID 5723. However you can see that it references the parent PID of 5713, this is a child process of the parent and is normal behavior. On heavily-loaded systems, you may see multiple child processes – this is also normal behavior.
In case, if your output has more than one parent process, execute the following commands:
RHEL 6|CentOS 6|Oracle Linux 6|Ubuntu 14
# service nagios stop
# killall -9 nagios
# service nagios start
RHEL 7|CentOS 7|Oracle Linux 7|Debian|Ubuntu 16/18
# systemctl stop nagios.service
# killall -9 nagios
# systemctl start nagios.service
[Need any further assistance in fixing Nagios errors? – We are here to help you]
Conclusion
Today, we saw how our Support Engineers resolve this Nagios error and provide a solution to our customers.
0 Comments