Restarting Linux Services With NCPA? We can help you.
An advanced, cross-platform agent, we can install NCPA on Windows / Linux / AIX / Mac OS X machines.
As part of our Server Management Services, we assist our customers with several Linux queries.
Today, let us see how to automatically restart problematic services on Linux servers using the Nagios Cross-Platform Agent (NCPA).
Restarting Linux Services With NCPA
In order to begin, our Support Techs suggest having an NCPA configured on the Linux machine we would like to restart services on.
Create Restart Script
First, we create a service_restart.sh script in the /usr/local/ncpa/plugins directory that will perform the service restart command:
# vi /usr/local/ncpa/plugins/service_restart.sh
Then we paste the following code into the terminal session:
#!/bin/bash sudo service $1 restart exit 0
Once done, we save the changes and close the file.
Grant NCPA Permission to Restart Services
The Nagios user needs permission to execute the service command.
We execute the following commands as root to give NCPA permission to restart services:
# echo “nagios ALL = NOPASSWD: `which service`” >> /etc/sudoers # echo ‘Defaults:nagios !requiretty’ >> /etc/sudoers
Test the Commands from Nagios XI Server
Now we will test if the script we just created on the Linux server is working.
The example below will restart the crond service as it is unlikely to cause any issues:
# cd /usr/local/nagios/libexec # ./check_ncpa.py -H 10.25.13.30 -P 5693 -t Str0ngT0k3n -M ‘plugins/service_restart.sh’ -a crond [root@xi-c7x-x64 libexec]# ./check_ncpa.py -H 10.25.13.30 -P 5693 -t Str0ngT0k3n -M ‘plugins/service_restart.sh’ -a crond Stopping crond: [ OK ] Starting crond: [ OK ] | ‘status’=0;1;2;
Since we received back the results from the service_restart.sh command, it appears to work.
Create Event Handler Script
Next, we create a script Nagios XI to use for the event handler. It will be called service_restart.sh and locate in the /usr/local/nagios/libexec/ directory on the Nagios XI server:
# vi /usr/local/nagios/libexec/service_restart.sh
Then we paste the following:
#!/bin/sh case “$1” in OK) ;; WARNING) ;; UNKNOWN) ;; CRITICAL) /usr/local/nagios/libexec/check_ncpa.py -H “$2” -P 5693 -t “$3” -M ‘plugins/service_restart.sh’ -a “$4” ;; esac exit 0
Eventually, we save the changes and close the file.
Now to set the correct permissions we execute the following commands:
# chown apache:nagios /usr/local/nagios/libexec/service_restart.sh # chmod 775 /usr/local/nagios/libexec/service_restart.sh
Then we test the script:
# /usr/local/nagios/libexec/service_restart.sh CRITICAL 10.25.13.30 Str0ngT0k3n crond
Once the script runs, it receives three arguments referenced as $1, $2, $3, $4 in the script.
$1 = The state of the service. $2 = The host address of the Linux server. $3 = The NCPA Token on the Linux server. $4 = The name of the service being restarted.
Make note that only when the service is in a CRITICAL state will we execute the service_restart.sh command.
Create Event Handler
Moving ahead, we create an event handler on the Nagios XI server to be used by our services.
For that, we navigate to Configure > Core Config Manager.
Select Commands from the list on the left, click the >_ Commands link and then, Add New.
Then we populate the fields with the values on the following page:
Command
Service Restart – Linux
Command-line
$USER1$/service_restart.sh $SERVICESTATE$ $HOSTADDRESS$ Str0ngT0k3n $_SERVICESERVICE$
Command type
misc command
Ensure to check the Active check box.
Eventually, Save and Apply Configuration.
Add a Service Check
Now we need to create a Service using the NCPA Monitoring Wizard. To do so, we select the crond service from the list of Services.
Then we finish the wizard to create the new service.
Update Service With Event Handler
Here, we need to do two things:
- Select Event Handler
- Add the name of the service we want to restart as a custom variable to the service object.
For that, our Support Techs suggests:
- Navigate to Configure > Core Config Manager > Monitoring > Services.
- Click the Service status for: crond to edit the service.
- Then click the Check Settings tab.
- From the Event handler drop-down list, select the option Service Restart – Linux.
- For Event handler enabled click On.
- Click the Misc Settings tab and then click the Manage Free Variables button.
- We will add a custom variable so that the event handler knows the name of the service to restart.
Name:_SERVICE
Value:crond
- If we click Insert the variable will add to the list on the right.
- Then click Close >> Save.
- Finally, Apply Configuration for the changes to take affect.
Test
To test, we force the service to stop on the Linux machine:
# service crond stop
We wait for the Nagios service to go to a critical state or force the next check.
Once the Nagios XI Cron Scheduling Daemon service is in a critical state the event handler will execute and the Linux crond service will restart.
Next time Nagios XI checks the Cron Scheduling Daemon service it will return to an OK state.
Troubleshooting
However, if the event handler does not work properly, check the /usr/local/nagios/var/nagios.log file for any errors.
For example,
[1481763272] SERVICE ALERT: 10.25.13.34;Cron Scheduling Daemon;CRITICAL;SOFT;1;crond is stopped [1481763272] wproc: SERVICE EVENTHANDLER job 7 from worker Core Worker 12627 is a non-check helper but exited with return code 13 [1481763272] wproc: early_timeout=0; exited_ok=1; wait_status=3328; error_code=0; [1481763272] wproc: stderr line 01: execvp(/usr/local/nagios/libexec/service_restart.sh, …) failed. Errno is 13: Permission denied
Here, we can see that the worker did not have permission to execute the service_restart.sh command.
[Need help with the procedures? We are here for you]
Conclusion
In short, today we saw how our Support Techs go about Restarting Linux Services With NCPA.
0 Comments