Bobcares

Monitor Docker Containers With Nagios

by | Mar 12, 2022

Wondering how to monitor Docker Containers With Nagios? We can help you.

At Bobcares, we offer solutions for every query, big and small, as a part of our Server Management Service.

Let’s take a look at how our Support Team help a customer  deal with this Nagios query.

How to monitor Docker Containers With Nagios?

Basically, the Docker configuration wizard allows two methods for monitoring Docker.

It is recommended to make use of Docker’s Remote API, if this is not possible then a plugin can execute on the Docker server using the NCPA.

Using Docker Remote API

It is recommend to make use of Docker’s built-in cURL API by binding the docker socket to a TCP port.

This is most easily done by adding an additional host to the docker startup command.

We can test the connection to the TCP port by executing the following command from our Nagios XI server inside a terminal session

Make sure to replace ip and port with the relevant values for docker server:

curl -f -g http://ip:port/containers/json?all=true

We can also test this by clicking the Populate Containers/Networks button on the first page of the Docker configuration wizard, after entering the relevant information.

Then, check whether the list successfully populates, or the command above returns a JSON object other than {“message”: “page not found”}.

If so, use Docker Configuration Wizard.

Monitor Docker Containers using NCPA

If we are unable to bind the Docker daemon to a TCP port, we will need to install NCPA on Docker machine.

Once installed, we need to download the check_docker.py plugin to the NCPA’s plugins folder.

The plugin can download directly from the Nagios XI server, in the following commands replace xi_address with the IP address of our Nagios XI server.

In a terminal session on the Docker server, execute the following commands:

cd /usr/local/ncpa/plugins/
wget http://xi_address/nagiosxi/includes/configwizards/docker/plugins/check_docker.py

We will also need to add the nagios user to the docker group, this will enable the nagios user to read/write to the docker socket, which is necessary for the check_docker.py plugin to function.

In the same terminal session, execute the following command:

usermod -a -G docker nagios

We will then need to restart the machine for the group changes to take effect.

Docker Configuration Wizard

The Docker Configuration Wizard communicates with Docker installation through the Docker UNIX socket.

Each check will retrieve the relevant metrics from Docker installation and compare them to thresholds we set in the wizard.

To begin using the wizard, navigate via the top bar to Configure > Configuration Wizards and select the Docker wizard.

Step 1:

Split up into two sections, Docker Server Information and Checks to Run.

The Docker Server Information section has different options depending on how we are accessing Docker.

Remote Agent (NCPA)

• Firstly, IP Address is the IP address of the machine which is running Docker
• NCPA Listener Port is the port that NCPA is configured to listen on
• Then, NCPA Token is the Token that allows access to NCPA
• Docker Socket is the location of the Docker socket, normally /var/run/docker.sock
• Then, docker API Base URL is the URL to access Docker, this will normally be closely related to our API version, i.e. http:/v1.30/ for an installation running API version 1.30

Remote API

• IP Address is the IP address of the machine which is running Docker
• Docker API Base URL is the URL to access Docker API, i.e. http://ip:port/

Security

The security section will shown when we have selected the Remote API access method, these are only required if we have configured Docker with TLS for additional security.

The three options available need to populate with the locations of the relevant files on our Nagios XI server.

Checks to Run

This section provides a list of monitoring options that we will need to select before proceeding to Step 2.

The options A list of containers and The containers on a list of networks both display the Populate Container/Network List button.

Clicking the button will provide a list of containers that will be used in Step 2 of the wizard.

After making all selections, click Next to proceed to Step 2.

The choices presented in Step 2 will depend on the checks we selected in Step 1.

In Remote Host Details, we have the choice of defining the Host Name to our requirements.

All the services created by this wizard will be assigned to this newly created host.

Existing Containers (if the section is present)

• Service Description is the name we will see associated with this check
• Then, thresholds are the normal nagios thresholds
• Timeout will tell the check how long it has to complete before returning UNKNOWN

Running Containers (if the section is present)

• Service Description is the name we will see associated with this check
• Thresholds are the normal nagios thresholds
• Timeout will tell the check how long it has to complete before returning UNKNOWN
• List Non-Running Containers will tell the check to give a list of containers that are not running in the service output
• Express thresholds as a percentage will tell the check to treat entered thresholds as a percentage, and to output the percent of containers that are running out of those selected, rather than a count

Healthy Containers

• Service Description is the name we will see associated with this check.
• Next, thresholds are the normal nagios thresholds
• Then, timeout will tell the check how long it has to complete before returning UNKNOWN.
• List Unhealthy Containers will tell the check to give a list of containers that are not healthy in the service output.
• When a container has no health check… will tell the check how to treat containers that have no healthcheck specified.

It will default to excluding them from the total count, but if we prefer, we can have these automatically counted as healthy or unhealthy.

CPU Usage

• Firstly, there may be a table that shows up before the service description.
• A container’s CPU Usage will always be collected as a percent of its host system’s CPU Usage.
• Service Description is the name we will see associated with this check.
• Timeout will tell the check how long it has to complete before returning UNKNOWN.
• Then, list Containers that are outside of acceptable ranges will tell the check to give a list of containers that fail the check in the service output.
• Use aggregate statistics will allow to set additional thresholds based on total and average CPU usage across all selected containers or networks. It will also allow to discard the individual warning/critical thresholds if we choose.

For Memory Usage (if the section is present)

• There may  a table that shows up before the service description.
• A container’s Memory Usage is consider to equivalent to its
resident set size.
• Then, service Description is the name we will see associated with this check.
• Next, timeout will tell the check how long it has to complete before returning UNKNOWN.
• Express a container’s memory usage will let us determine whether the check should compare memory usage to a set quantity (in bytes), or to a percentage of its limit.
• List Containers that are outside of acceptable ranges will tell the check to give a list of containers that fail the check in the service output.
• Use aggregate statistics will allow to set additional thresholds based on total and average memory usage across all selected containers or networks. It will also allow to discard the individual warning/critical thresholds if we choose.

Click Next and then complete the wizard by choosing the required options in Step 3 – Step 5.

To finish up, click on Finish in the final step of the wizard.

Once the wizard applies the configuration, click the View status details for link to see the new services that have been created.

[Need a solution to another query? We are just a click away.]

Conclusion

Today, we saw steps followed by our Support Engineers to monitor Docker Containers With Nagios

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.