Bobcares

Know thy server- better, cheaper and quicker!

by | Feb 25, 2010

Knowledge is Power. Be it war, peace, administration or day-to-day business, information about any dynamic parameter you are dealing with, adds to your strength. Imagine a day when you are idly playing a video on Youtube, and out of the blue comes a popup screaming – “Dude, you’ve got a problem. MySQL is kinda acting weird, and the server load is leaping over the moon. You better check it right away!!” Aaaah!! Life would have been so much easier! If this is what you have been dreaming about, this is exactly what Nagios can deliver.

 

Nagios can be configured to monitor the status of various services like Apache, MySQL, POP, IMAP etc; monitor server health like server load, disk space, mail queue, etc. It uses a simple concept of client-server connection, where the server(s) to be monitored(beta machines) are installed with the Nagios client, and a monitoring server(alpha machine) is set up as the master. The server(alpha machine) will constantly monitor the status of various services running at the client end and alert you if anything is amiss. How do you want to get notified? Various alert mechanisms- from Firefox pop-ups and mails in Thunderbird, to SMS alerts can be configured using various plugins.

How is the connection made?

The two popular methods to fetch data from the beta machines are NRPE and SNMP. SNMP is a service which gives you information regarding the physical parameters, like the various hardware entities in the system including routers and other devices used to establish the network; monitor the temperature etc.

NRPE monitors the services running in the machine as well as collect information about the soft resources in the remote machine. It also allows you to execute plugins on remote Linux/Unix hosts.

 

Fail-over and Redundancy

What if we depend on Nagios all the time to monitor the services and one fine day, the Alpha machine itself crashes? We’ve got solution for that too!! Nagios also contain features for redundant and fail-over network monitoring. Both the Alpha and one among the beta machines monitor the same hosts and service on the network. Under normal circumstances only the Alpha will be sending out notifications to contacts about problems. The Beta running Nagios will take over the job of notifying contacts about problems if:

 

1. The Alpha machine that runs Nagios is down or..
2. The Nagios process on the Alpha machine stops running for some reason

 

Now, you can get some scripts from the eventhandlers/ folder in the Nagios distribution. There are 2 scripts I would like to explain:

 

Host Event Handler (handle-master-host-event):


#!/bin/sh
# Only take action on hard host states...
case "$2" in
HARD)
case "$1" in
DOWN)
# The master host has gone down!
# We should now become the master host and take
# over the responsibilities of monitoring the
# network, so enable notifications...
/usr/local/nagios/libexec/eventhandlers/enable_notifications
;;
UP)
# The master host has recovered!
# We should go back to being the slave host and
# let the master host do the monitoring, so
# disable notifications...
/usr/local/nagios/libexec/eventhandlers/disable_notifications
;;
esac
;;
esac
exit 0

 

Service Event Handler (handle-master-proc-event):

 


#!/bin/sh
# Only take action on hard service states...
case "$2" in
HARD)
case "$1" in
CRITICAL)
# The master Nagios process is not running!
# We should now become the master host and
# take over the responsibility of monitoring
# the network, so enable notifications...
/usr/local/nagios/libexec/eventhandlers/enable_notifications
;;
WARNING)
UNKNOWN)
# The master Nagios process may or may not
# be running.. We won't do anything here, but
# to be on the safe side you may decide you
# want the slave host to become the master in
# these situations...
;;
OK)
# The master Nagios process running again!
# We should go back to being the slave host,
# so disable notifications...
/usr/local/nagios/libexec/eventhandlers/disable_notifications
;;
esac
;;
esac
exit 0

 

Obviously, when everything is working fine(when Alpha is up and running), Beta will remain mute. No notifications, no mails. Beta gets activated when Alpha goes down and handle-master-host-event is executed; or when the Nagios running at Alpha goes down and handle-master-proc-event is executed. As soon as Beta takes over, it will start monitoring all clients of Alpha. Phew! We’ve the systems under control again!!

So, what is this all about?

There are plenty of PlugIns available along with Nagios, which gives us better visibility of the services and monitored attributes. Even more plug-ins come up every day. Even though it may sound a bit confusing, configuring the services like NRPE and SNMP is a piece of cake! The documentation of Nagios, which is pretty awesome, will give you a detailed description that leads you by the hand.

We’ve been in this business for quite some time, and have seen the same mistake repeated over and over again: leaving the server unattended. Leave the server unattended; the load surges; services fail; and boom!!! the system goes down, and everyone then goes into firefighting mode. Why take the chance? Monitor the systems continuously, use Nagios.


About the Author

Jeevan Joseph has been with Bobcares for an year, and is now heading the Public Relations wing of the company. He has worked as a developer as well as a technical support in the past for startups to Datacenters.

Jeevan, after graduating in Electronics and Communications Engineering, turned into the web-hosting domain out of the passion for open source. As an eloquent orator, he spends his spare time with the Toastmasters, and have given sessions to over 2500 people in the past 4 years.

 

4 Comments

  1. Suhas

    Hey Jeevan,

    Very interesting article. Brief and resourceful. Concept and technology well explained. Keep it up and keep posting more such, so that even our knowledge expands!!

    Cheers
    Suhas

  2. lijo

    Good one. Clear, simple and crisp.
    Straight on the target.

  3. Sarath Nair

    Dear Jeevs,

    Nice article.

    Best regards,
    Sarath

  4. Jeevan

    @Sarath, @Lijo, @Suhas

    Folks, Thank you for finding time to read my article. I appreciate it! Thanks once again for visiting the Bobcares Blog. Keep coming back for more.

    Jeevan/Jeevs

Never again lose customers to poor
server speed! Let us help you.