Saturday, January 14, 2012

NAGIOS REFERENCE

Nagios 3.0 Jumpstart Guide For Linux – Overview, Installation and Configuration
Let us discuss the overview, installation and configuration of Nagios, a powerful open source monitoring solution for host and services.
 I. Overview of nagios
II. 8 steps for installing nagios on Linux:
  1. Download the nagios and plugins
  2. Take care of the prerequisites
  3. Create user and group for nagios
  4. Install nagios
  5. Configure the web interface
  6. Compile and install nagios plugins
  7. Start Nagios
  8. Login to web interface
III. Configuration files overview

I. Overview of Nagios

.
Nagios is a host and service monitor tool. Following are some of the features of nagios.

  • Monitor equipments such as servers, switches, routers, firewalls, power supply etc.
  • Monitor services such as disk space, cpu usage, memory usage, temperature of the equipment, HTTP, Mail, SSH etc.
  • Nagios can monitor pretty much anything. for e.g. host, services, databases, applications etc.
  • Nagios has an extensible plugin interface for monitoring user defined services. There are lot of plugins available for Nagios. Visit NagiosPlugins and NagiosExchange for review the available user developed plugins.
  • It can send out various notifications ( email, pager etc.) when the problem occurs and get resolved.
  • Web interface to view current status, notifications, problem history, log files etc.
Following is a partial screenshot of the nagios web dashboard:

Nagios Web UI
Fig: Nagios Web UI (click on the image to enlarge)

II. 8 steps for installing nagios on Linux:

1. Download the nagios and plugins

Download following files from Nagios.org and move to /home/downloads



  • nagios-3.0.1.tar.gz
  • nagios-plugins-1.4.11.tar.gz

2. Take care of the prerequisites

  • Make sure apache is working on the server by verifying from browser: http://localhost
  • Verify whether gcc is installed
[root@localhost]#rpm -qa | grep gcc
      gcc-3.4.6-8
      compat-gcc-32-3.2.3-47.3
      libgcc-3.4.6-8
      compat-libgcc-296-2.96-132.7.2
      compat-gcc-32-c++-3.2.3-47.3
      gcc-c++-3.4.6-8
  • Verify whether GD is installed
[root@localhost]# rpm -qa gd
      gd-2.0.28-5.4E

3. Create user and group for nagios

[root@localhost]# useradd nagios
[root@localhost]# passwd nagios
[root@localhost]# groupadd nagcmd
[root@localhost]# usermod -G nagcmd nagios
[root@localhost]# usermod -G nagcmd apache

4. Install nagios

[root@localhost]# tar xvf nagios-3.0.1.tar.gz
[root@localhost]# cd nagios-3.0.1
[root@localhost]# ./configure --with-command-group=nagcmd
[root@localhost]# make all
[root@localhost]# make install
[root@localhost]# make install-config
[root@localhost]# make install-commandmode
Following are some additional parameters that you can pass to ./configure to customize your installation. I used only --with-command-group as shown above.
--prefix  /opt/nagios Where to put the Nagios files
 --with-cgiurl  /nagios/cgi-bin Web server url where the cgi's will be available
 --with-htmurl  /nagios  Web server url where nagios will be available
 --with-nagios-user nagios  user account under which Nagios will run
 --with-nagios-group nagios  group account under which Nagios will run
 --with-command-group nagcmd  group account which will allow the apache user to submit
      commands to Nagios
At the end of the configure output, it will display a summary as shown below:
*** Configuration summary for nagios 3.0.1 05-28-2008 ***:

General Options:
-------------------------
Nagios executable:  nagios
Nagios user/group:  nagios,nagios
Command user/group:  nagios,nagcmd
Embedded Perl:  no
Event Broker:  yes
Install ${prefix}:  /usr/local/nagios
Lock file:  ${prefix}/var/nagios.lock
Check result directory:  ${prefix}/var/spool/checkresults
Init directory:  /etc/rc.d/init.d
Apache conf.d directory:  /etc/httpd/conf.d
Mail program:  /bin/mail
Host OS:  linux-gnu

Web Interface Options:
------------------------
HTML URL:  http://localhost/nagios/
CGI URL:  http://localhost/nagios/cgi-bin/
Traceroute (used by WAP):  /bin/traceroute

5. Configure the web interface.

[root@localhost]# make install-webconf
[root@localhost# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

6. Compile and install nagios plugins

[root@localhost]# tar xvf nagios-plugins-1.4.11.tar.gz
[root@localhost]# cd nagios-plugins-1.4.11
[root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
[root@localhost]# make
[root@localhost]# make install
Note: On Red Hat, the ./configure command mentioned above did not work and was hanging at the when it was displaying the message: checking for redhat spopen problem… Add –enable-redhat-pthread-workaround to the ./configure command as a work-around for the above problem as shown below.
[root@localhost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround

7. Start Nagios

  • Add the nagios to the startup routine:
[root@localhost]# chkconfig --add nagios
      [root@localhost]# chkconfig nagios on
  • Verify to make sure there are no errors in the nagios configuration file:
[root@localhost]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

      Total Warnings: 0
      Total Errors:   0
      Things look okay - No serious problems were detected during the pre-flight check
  • Start the nagios
[root@localhost]# service nagios start
      Starting nagios: done.

8. Login to web interface

Nagios Web URL: http://localhost/nagios/
Use the userid, password that was created from step#5 above.

III. Configuration files overview

.
The first configuration to modify is to change the default value of email address in /usr/local/nagios/etc/objects/contacts.cfg file to your email address.
Following are the three major configuration files located under /usr/local/nagios/etc

  1. nagios.cfg – This is the primary Nagios configuration file where lot of global parameters that controls thenagios can be defined.
  2. cgi.cfg - This files has configuration information for nagios web interface.
  3. resource.cfg – If you have to pass some sensitive information (username, password etc.) to a plugin to monitor a specific service, you can define them here. This file is readable only by nagios user and group.
Following are the other configuration files under /usr/local/nagios/etc/objects directory:

  • contacts.cfg: All the contacts who needs to be notified should be defined here. You can specify name, email address, what type of notifications they need to receive and what is the time period this particular contact should be receiving notifications etc.
  • commands.cfg – All the commands to check services are defined here. You can use $HOSTNAME$ and $HOSTADDRESS$ macro on the command execution that will substitute the corresponding hostname or host ip-address automatically.
  • timeperiods.cfg – Define the timeperiods. for e.g. if you want a service to be monitored only during the business hours, define a time period called businesshours and specify the hours that you would like to monitor.
  • templates.cfg – Multiple host or service definition that has similar characteristics can use a template, where all the common characteristics can be defined. Use template is a time saver.
  • localhost.cfg – Defines the monitoring for the local host. This is a sample configuration file that comes withnagios installation that you can use as a baseline to define other hosts that you would like to monitor.
  • printer.cfg – Sample config file for printer
  • switch.cfg – Sample config file for switch
  • windows.cfg – Sample config file for a windows machine
------------------------------------------------------------------------------------------------------------------------------------
windows machine and the various service running on the windows server using nagiosmonitoring server. Following three sections are covered in this article.

I. Overview
II. 4 steps to install nagios on remote windows host
  1. Install NSClient++ on the remote windows server
  2. Modify the NSClient++ Service
  3. Modify the NSC.ini
  4. Start the NSClient++ Service
III. 6 configuration steps on nagios monitoring server
  1. Verify check_nt command and windows-server template
  2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios.cfg
  3. Modify /usr/local/nagios/etc/objects/windows.cfg
  4. Define windows services that should be monitored.
  5. Enable Password Protection
  6. Verify Configuration and Restart Nagios.

I. Overview

.
Following three steps will happen on a very high level when Nagios (installed on the nagios-server) monitors a service (for e.g. disk space usage) on the remote Windows host.
  1. Nagios will execute check_nt command on nagios-server and request it to monitor disk usage on remote windows host.
  2. The check_nt on the nagios-server will contact the NSClient++ service on remote windows host and request it to execute the USEDDISKSPACE on the remote host.
  3. The results of the USEDDISKSPACE command will be returned back by NSClient++ daemon to the check_nt on nagios-server.

Following flow summarizes the above explanation:

Nagios Server (check_nt) —–> Remote host (NSClient++) —–> USEDDISKSPACE
Nagios Server (check_nt) <—– Remote host (NSClient++) <—– USEDDISKSPACE (returns disk space usage)

II. 4 steps to setup nagios on remote windows host

.

1. Install NSClient++ on the remote windows server

Download NSCP 0.3.1 (NSClient++-Win32-0.3.1.msi) from NSClient++ Project. NSClient++ is an open source windows service that allows performance metrics to be gathered by Nagios for windows services. Go through the following five NSClient++ installation steps to get the installation completed.

(1) NSClient++ Welcome Screen

(2) License Agreement Screen

(3) Select Installation option and location. Use the default option and click next.

NSClient++ Install Screen

(4) Ready to Install Screen.  Click on Install to get it started.

(5) Installation completed Screen.



2. Modify the NSClient++ Service

Go to Control Panel -> Administrative Tools -> Services. Double click on the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ service and select the check-box that says “Allow service to interact with desktop” as shown below.
NSClient++ Service Modification

3. Modify the NSC.ini

(1) Modify NSC.ini and uncomment *.dll: Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment everything under [modules] except RemoteConfiguration.dll and CheckWMI.dll
[modules]
;# NSCLIENT++ MODULES
;# A list with DLLs to load at startup.
;  You will need to enable some of these for NSClient++ to work.
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
; *                                                               *
; * N O T I C E ! ! ! - Y O U   H A V E   T O   E D I T   T H I S *
; *                                                               *
; ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
NSClientListener.dll
NRPEListener.dll
SysTray.dll
CheckEventLog.dll
CheckHelpers.dll
;CheckWMI.dll
;
; RemoteConfiguration IS AN EXTREM EARLY IDEA SO DONT USE FOR PRODUCTION ENVIROMNEMTS!
;RemoteConfiguration.dll
; NSCA Agent is a new beta module use with care!
NSCAAgent.dll
; LUA script module used to write your own "check deamon" (sort of) early beta.
LUAScript.dll
; Script to check external scripts and/or internal aliases, early beta.
CheckExternalScripts.dll
; Check other hosts through NRPE extreme beta and probably a bit dangerous! :)
NRPEClient.dll

(2) Modify NSC.ini and uncomment allowed_hosts. Edit the C:\Program Files\NSClient++\NSC.ini file and Uncomment allowed_host under settings and add the ip-address of the nagios-server.
;# ALLOWED HOST ADDRESSES
;  This is a comma-delimited list of IP address of hosts that are allowed to talk to the all daemons.
;  If leave this blank anyone can access the deamon remotly (NSClient still requires a valid password).
;  The syntax is host or ip/mask so 192.168.0.0/24 will allow anyone on that subnet access
allowed_hosts=192.168.1.2/255.255.255.0
Note: allowed_host is located under [Settings], [NSClient] and [NRPE] section. Make sure to change allowed_host under [Settings] for this purpose.

(3) Modify NSC.ini and uncomment port. Edit the C:\Program Files\NSClient++\NSC.ini file and uncomment the port# under [NSClient] section
;# NSCLIENT PORT NUMBER
;  This is the port the NSClientListener.dll will listen to.
port=12489

(4) Modify NSC.ini and specify password. You can also specify a password the nagios server needs to use to remotely access the NSClient++ agent.
[Settings]
;# OBFUSCATED PASSWORD
;  This is the same as the password option but here you can store the password in an obfuscated manner.
;  *NOTICE* obfuscation is *NOT* the same as encryption, someone with access to this file can still figure out the
;  password. Its just a bit harder to do it at first glance.
;obfuscated_password=Jw0KAUUdXlAAUwASDAAB
;
;# PASSWORD
;  This is the password (-s) that is required to access NSClient remotely. If you leave this blank everyone will be able to access the daemon remotly.
password=My2Secure$Password

4. Start the NSClient++ Service

Start the NSClient++ service either from the Control Panel -> Administrative tools -> Services -> Select “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ and click on start (or) Click on “Start -> All Programs -> NSClient++ -> Start NSClient++ (Win32) . Please note that this will start the NSClient++ as a windows service.

Later if you modify anything in the NSC.ini file, you should restart the “NSClientpp (Nagios) 0.3.1.14 2008-03-12 w32″ from the windows service.

III. 6 configuration steps on nagios monitoring server

.

1. Verify check_nt command and windows-server template

Verify that the check_nt is enabled under /usr/local/nagios/etc/objects/commands.cfg
# 'check_nt' command definition
define command{
command_name    check_nt
command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}

Verify that the windows-server template is enabled under /usr/local/nagios/etc/objects/templates.cfg
# Windows host definition template - This is NOT a real host, just a template!
define host{
name                    windows-server  ; The name of this host template
use                     generic-host    ; Inherit default values from the generic-host template
check_period            24x7            ; By default, Windows servers are monitored round the clock
check_interval          5               ; Actively check the server every 5 minutes
retry_interval          1               ; Schedule host check retries at 1 minute intervals
max_check_attempts      10              ; Check each server 10 times (max)
check_command           check-host-alive        ; Default command to check if servers are "alive"
notification_period     24x7            ; Send notification out at any time - day or night
notification_interval   30              ; Resend notifications every 30 minutes
notification_options    d,r             ; Only send notifications for specific host states
contact_groups          admins          ; Notifications get sent to the admins by default
hostgroups              windows-servers ; Host groups that Windows servers should be a member of
register                0               ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}

2. Uncomment windows.cfg in /usr/local/nagios/etc/nagios.cfg

# Definitions for monitoring a Windows machine
cfg_file=/usr/local/nagios/etc/objects/windows.cfg

3. Modify /usr/local/nagios/etc/objects/windows.cfg

By default a sample host definition for a windows server is given under windows.cfg, modify this to reflect the appropriate windows server that needs to be monitored through nagios.
# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation

define host{
use             windows-server              ; Inherit default values from a template
host_name   remote-windows-host      ; The name we're giving to this host
alias            Remote Windows Host     ; A longer name associated with the host
address       192.168.1.4                   ; IP address of the remote windows host
}

4. Define windows services that should be monitored.

Following are the default windows services that are already enabled in the sample windows.cfg. Make sure to update the host_name on these services to reflect the host_name defined in the above step.
define service{
use                     generic-service
host_name               remote-windows-host
service_description     NSClient++ Version
check_command           check_nt!CLIENTVERSION
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Uptime
check_command           check_nt!UPTIME
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     CPU Load
check_command           check_nt!CPULOAD!-l 5,80,90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Memory Usage
check_command           check_nt!MEMUSE!-w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     C:\ Drive Space
check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     W3SVC
check_command           check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
define service{
use                     generic-service
host_name               remote-windows-host
service_description     Explorer
check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}

5. Enable Password Protection

If you specified a password in the NSC.ini file of the NSClient++ configuration file on the Windows machine, you’ll need to modify the check_nt command definition to include the password. Modify the /usr/local/nagios/etc/commands.cfg file and add password as shown below.
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s My2Secure$Password -v $ARG1$ $ARG2$
}

6. Verify Configuration and Restart Nagios.

Verify the nagios configuration files as shown below.
[nagios-server]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Restart nagios as shown below.
[nagios-server]# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

[nagios-server]# /etc/rc.d/init.d/nagios start
Starting nagios: done.

Verify the status of the various services running on the remote windows host from the Nagios web UI (http://nagios-server/nagios) as shown below.
Nagios Web UI - Remote Windows Host Status

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

How To Monitor Remote Linux Host using Nagios 3.0
I. Overview
II. 6 steps to install Nagios plugin and NRPE on remote host.
  1. Download Nagios Plugins and NRPE Add-on
  2. Create nagios account
  3. Install Nagios Plugins
  4. Install NRPE
  5. Setup NRPE to run as daemon
  6. Modify the /usr/local/nagios/etc/nrpe.cfg
III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:
  1. Download NRPE Add-on
  2. Install check_nrpe
  3. Create host and service definition for remote host
  4. Restart the nagios service

I. Overview:

.
Following three steps will happen on a very high level when Nagios (installed on the nagios-servers) monitors a service (for e.g. disk space usage) on the remote Linux host.



  1. Nagios will execute check_nrpe command on nagios-server and request it to monitor disk usage on remote host using check_disk command.
  2. The check_nrpe on the nagios-server will contact the NRPE daemon on remote host and request it to execute the check_disk on remote host.
  3. The results of the check_disk command will be returned back by NRPE daemon to the check_nrpe on nagios-server.

Following flow summarizes the above explanation:

Nagios Server (check_nrpe) —–> Remote host (NRPE deamon) —–> check_disk
Nagios Server (check_nrpe) <—– Remote host (NRPE deamon) <—– check_disk (returns disk space usage)

II. 7 steps to install Nagios Plugins and NRPE on the remote host

.

1. Download Nagios Plugins and NRPE Add-on

Download following files from Nagios.org and move to /home/downloads:
  • nagios-plugins-1.4.11.tar.gz
  • nrpe-2.12.tar.gz

2. Create nagios account

[remotehost]# useradd nagios
[remotehost]# passwd nagios

3. Install nagios-plugin

[remotehost]# cd /home/downloads
[remotehost]# tar xvfz nagios-plugins-1.4.11.tar.gz
[remotehost]# cd nagios-plugins-1.4.11
[remotehost]# export LDFLAGS=-ldl

[remotehost]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround
[remotehost]# make
[remotehost]# make install

[remotehost]# chown nagios.nagios /usr/local/nagios
[remotehost]# chown -R nagios.nagios /usr/local/nagios/libexec/

Note: On Red Hat, For me the ./configure command was hanging with the the message: “checking for redhat spopen problem…”. Add --enable-redhat-pthread-workaround to the ./configure command as a work-around for the above problem.

4. Install NRPE

[remotehost]# cd /home/downloads
[remotehost]# tar xvfz nrpe-2.12.tar.gz
[remotehost]# cd nrpe-2.12

[remotehost]# ./configure
[remotehost]# make all
[remotehost]# make install-plugin
[remotehost]# make install-daemon
[remotehost]# make install-daemon-config
[remotehost]# make install-xinetd

5. Setup NRPE to run as daemon (i.e as part of xinetd):

  • Modify the /etc/xinetd.d/nrpe to add the ip-address of the Nagios monitoring server to the only_from directive. Note that there is a space after the 127.0.0.1 and the nagios monitoring server ip-address (in this example,nagios monitoring server ip-address is: 192.168.1.2)
only_from       = 127.0.0.1 192.168.1.2
  • Modify the /etc/services and add the following at the end of the file.
nrpe 5666/tcp # NRPE
  • Start the service
[remotehost]#service xinetd restart
  • Verify whether NRPE is listening
[remotehost]# netstat -at | grep nrpe
       tcp 0      0 *:nrpe *:*                         LISTEN
  • Verify to make sure the NRPE is functioning properly
[remotehost]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.12

6. Modify the /usr/local/nagios/etc/nrpe.cfg

The nrpe.cfg file located on the remote host contains the commands that are needed to check the services on the remote host. By default the nrpe.cfg comes with few standard check commands as samples. check_users and check_load are shown below as an example.
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

In all the check commands, the “-w” stands for “Warning” and “-c” stands for “Critical”. for e.g. in the check_disk command below, if the available disk space gets to 20% of less, nagios will send warning message. If it gets to 10% or less, nagios will send critical message. Change the value of “-c” and “-w” parameter below depending on your environment.
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1

Note: You can execute any of the commands shown in the nrpe.cfg on the command line on remote host and see the results for yourself. For e.g. When I executed the check_disk command on the command line, it displayed the following:
[remotehost]#/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
DISK CRITICAL - free space: / 6420 MB (10% inode=98%);| /=55032MB;51792;58266;0;64741

In the above example, since the free disk space on /dev/hda1 is only 10% , it is displaying the CRITICAL message, which will be returned to nagios server.

III. 4 Configuration steps on the Nagios monitoring server to monitor remote host:

.

1. Download NRPE Add-on

Download nrpe-2.12.tar.gz from Nagios.org and move to /home/downloads:

2. Install check_nrpe on the nagios monitoring server

[nagios-server]# tar xvfz nrpe-2.12.tar.gz
[nagios-server]# cd nrpe-2.1.2
[nagios-server]# ./configure
[nagios-server]# make all
[nagios-server]# make install-plugin

./configure will give a configuration summary as shown below:

*** Configuration summary for nrpe 2.12 05-31-2008 ***:

General Options:
————————-
NRPE port: 5666
NRPE user: nagios
NRPE group: nagios
Nagios user: nagios
Nagios group: nagios

Note: I got the “checking for SSL headers… configure: error: Cannot find ssl headers” error message while performing ./configure. Install openssl-devel as shown below and run the ./configure again to fix the problem.
[nagios-server]# rpm -ivh openssl-devel-0.9.7a-43.16.i386.rpm krb5-devel-1.3.4-47.i386.rpm zlib-devel-1.2.1.2-1.2.i386.rpm e2fsprogs-devel-1.35-12.5.
el4.i386.rpm
warning: openssl-devel-0.9.7a-43.16.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e
Preparing… ########################################### [100%]
1:e2fsprogs-devel ########################################### [ 25%]
2:krb5-devel ########################################### [ 50%]
3:zlib-devel ########################################### [ 75%]
4:openssl-devel ########################################### [100%]
Verify whether nagios monitoring server can talk to the remotehost.
[nagios-server]#/usr/local/nagios/libexec/check_nrpe -H 192.168.1.3
NRPE v2.12

Note: 192.168.1.3 in the ip-address of the remotehost where the NRPE and nagios plugin was installed as explained in Section II above.

3. Create host and service definition for remotehost

Create a new configuration file /usr/local/nagios/etc/objects/remotehost.cfg to define the host and service definition for this particular remotehost. It is good to take the localhost.cfg and copy it as remotehost.cfg and start modifying it according to your needs.
host definition sample:
define host{
use linux-server
host_name remotehost
alias Remote Host
address 192.168.1.3
contact_groups admins
}

Service definition sample:
define service{
use generic-service
service_description Root Partition
contact_groups admins
check_command check_nrpe!check_disk
}
Note: In all the above examples, replace remotehost with the corresponding hostname of your remotehost.

4. Restart the nagios service

Restart the nagios as shown below and login to the nagios web (http://nagios-server/nagios/) to verify the status of the remotehost linux sever that was added to nagios for monitoring.
[nagios-server]# service nagios reload
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

How To Monitor Network Switch and Ports Using Nagios

Nagios is hands-down the best monitoring tool to monitor host and network equipments. Using Nagios plugins you can monitor pretty much monitor anything.

I use Nagios intensively and it gives me peace of mind knowing that I will get an alert on my phone, when there is a problem. More than that, if warning levels are setup properly, Nagios will proactively alert you before a problem becomes critical.
In this article, I’ll explain how to configure Nagios to monitor network switch and it’s active ports.

1. Enable switch.cfg in nagios.cfg

Uncomment the switch.cfg line in /usr/local/nagios/etc/nagios.cfg as shown below.
[nagios-server]# grep switch.cfg /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg

2. Add new hostgroup for switches in switch.cfg

Add the following switches hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file.
define hostgroup{
hostgroup_name  switches
alias           Network Switches
}

3. Add a new host for the switch to be monitered

In this example, I’ve defined a host to monitor the core switch in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your switch ip-address accordingly.
define host{
use             generic-switch
host_name       core-switch
alias           Cisco Core Switch
address         192.168.1.50
hostgroups      switches
}

4. Add common services for all switches

Displaying the uptime of the switch and verifying whether switch is alive are common services for all switches. So, define these services under the switches hostgroup_name as shown below.
# Service definition to ping the switch using check_ping
define service{
use                     generic-service
hostgroup_name          switches
service_description     PING
check_command           check_ping!200.0,20%!600.0,60%
normal_check_interval   5
retry_check_interval    1
}

# Service definition to monitor switch uptime using check_snmp
define service{
use                     generic-service
hostgroup_name          switches
service_description     Uptime
check_command           check_snmp!-C public -o sysUpTime.0
}

5. Add service to monitor port bandwidth usage

check_local_mrtgtraf uses the Multil Router Traffic Grapher – MRTG. So, you need to install MRTG for this to work properly. The *.log file mentioned below should point to the MRTG log file on your system.
define service{
use           generic-service
host_name   core-switch
service_description Port 1 Bandwidth Usage
check_command  check_local_mrtgtraf!/var/lib/mrtg/192.168.1.11_1.log!AVG!1000000,2000000!5000000,5000000!10
}

6. Add service to monitor an active switch port

Use check_snmp to monitor the specific port as shown below. The following two services monitors port#1 and port#5. To add additional ports, change the value ifOperStatus.n accordingly. i.e n defines the port#.



# Monitor status of port number 1 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 1 Link Status
check_command        check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
}

# Monitor status of port number 5 on the Cisco core switch
define service{
use                  generic-service
host_name            core-switch
service_description  Port 5 Link Status
check_command        check_snmp!-C public -o ifOperStatus.5 -r 1 -m RFC1213-MIB
}

7. Add services to monitor multiple switch ports together

Sometimes you may need to monitor the status of multiple ports combined together. i.e Nagios should send you an alert, even if one of the port is down. In this case, define the following service to monitor multiple ports.
# Monitor ports 1 - 6 on the Cisco core switch.
define service{
use                   generic-service
host_name             core-switch
service_description   Ports 1-6 Link Status
check_command         check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB, -o ifOperStatus.2 -r 1 -m RFC1213-MIB, -o ifOperStatus.3 -r 1 -m RFC1213-MIB, -o ifOperStatus.4 -r 1 -m RFC1213-MIB, -o ifOperStatus.5 -r 1 -m RFC1213-MIB, -o ifOperStatus.6 -r 1 -m RFC1213-MIB
}

8. Validate configuration and restart nagios

Verify the nagios configuration to make sure there are no warnings and errors.
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
Restart the nagios server to start monitoring the VPN device.
# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

# /etc/rc.d/init.d/nagios start
Starting nagios: done.
Verify the status of the switch from the Nagios web UI: http://{nagios-server}/nagios as shown below:
[Nagios GUI for Network Switch]
FigNagios GUI displaying status of a Network Switch

9. Troubleshooting

Issue1: Nagios GUI displays “check_mrtgtraf: Unable to open MRTG log file” error message for the Port bandwidth usage
Solution1: make sure the *.log file defined in the check_local_mrtgtraf service is pointing to the correct location.

Issue2Nagios UI displays “Return code of 127 is out of bounds – plugin may be missing” error message for Port Link Status.
Solution2: Make sure both net-snmp and net-snmp-util packages are installed. In my case, I was missing the net-snmp-utils package and installing it resolved this issue as shown below.
[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2

[nagios-server]# rpm -ivh net-snmp-utils-5.1.2-11.EL4.10.i386.rpm
Preparing...       ########################################### [100%]
1:net-snmp-utils   ########################################### [100%]

[nagios-server]# rpm -qa | grep net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
net-snmp-utils-5.1.2-11.EL4.10
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 Steps to Define Nagios Contacts With Email and Pager Notification
Nagios is one of the best open source server and network monitoring solutions available.  Using the flexible nagios framework, you can monitor pretty much anything (including database and custom application). This article, using 4 simple steps, explains how to setup contact definitions who will get notification when a host or service has any issues.

1. Define Generic Contact Template in templates.cfg

Nagios installation gives a default generic contact template that can be used as a reference to build your contacts. Please note that all the directives mentioned in the generic-contact template below are mandatory. So, if you’ve decided not to use the generic-contact template definition in your contacts, you should define all these mandatory definitions inside your contacts yourself.  The following generic-contact is already available under /usr/local/nagios/etc/objects/templates.cfg. Also, the templates.cfg is included in the nagios.cfg by default as shown below.  Please note that any of these directives mentioned in the templates.cfg can be overridden when you define a real contact using this generic-template.
# grep templates /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg

Note: generic-contact is available under
      /usr/local/nagios/etc/objects/templates.cfg

define contact{
        name                            generic-contact
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r,f,s
        host_notification_options       d,u,r,f,s
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-email
        register                        0
        }
  • Name - This defines the name of the contact template (generic-contact).
  • service_notification_period – This defines when nagios can send notification about services issues (for example, Apache down). By default this is 24×7 timeperiod, which is defined under /usr/local/nagios/etc/objects/timeperiods.cfg
  • host_notification_period – This defines when nagios can send notification about host issues (for example, server crashed). By default, this is 24×7 timeperiod.
  • service_notification_options – This defines the type of service notification that can be sent out. By default this defines all possible service states including flapping events. This also includes the scheduled service downtime activities.
  • host_notification_options – This defines the type of host notifications that can be sent out. By default this defines all possible host states including flapping events. This also includes the scheduled host downtime activities.
  • service_notification_commands – By default this defines that the contact should get notification about service issues (for example, database down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-service-by-sms command.
  • host_notification_commands – By default this defines that the contact should get notification about host issues (for example, host down) via email. You can also define additional commands and add it to this directive. For example, you can define your own notify-host-by-sms command.

2. Define Individual Contacts in contacts.cfg

One you’ve confirmed that the generic-contact templates is defined properly, you can start defining individual contacts definition for all the people in your organization who would ever receive any notifications from nagios. Please note that just by defining a contact doesn’t mean that they’ll get notification. Later you have to associate this contact to either a service or host definition as shown in the later sections below. So, feel free to define all possible contacts here. (for example, Developers, DBAs, Sysadmins, IT-Manager, Customer Service Manager, Top Management etc.)  
Note: Define these contacts in /usr/local/nagios/etc/objects/contacts.cfg
define contact{
        contact_name                    sgupta
        use                             generic-contact
        alias                           Sanjay Gupta (Developer)
        email                           sgupta@sureshkumarpakalapati.in
        pager                           333-333@pager.sureshkumarpakalapati.in

        }
define contact{
        contact_name                    jbourne
        use                             generic-contact
        alias                           Jason Bourne (Sysadmin)
        email                           jbourne@sureshkumarpakalapati.in

        }

3. Define Contact Groups with Multiple Contacts in contacts.cfg

Once you’ve defined the individual contacts, you can also group them together to send the appropriate notifications. For example, only DBAs needs to be notified about the database down service definition. So, a db-admins group may be required. Also, may be only Unix system administrators needs to be notified when Apache goes down. So, a unix-admins group may be required. Feel free to define as many groups as you think is required. Later you can use these groups in the individual service and host definitions.  
Note: Define contact groups in /usr/local/nagios/etc/objects/contacts.cfg

define contactgroup{
contactgroup_name          db-admins
alias                      Database Administrators
members                    jsmith, jdoe, mraj
}

define contactgroup{
contactgroup_name          unix-admins
alias                      Linux System Administrator
members                    jbourne, dpatel, mshankar
}

4. Attach Contact Groups or Individual Contacts to Service and Host Definitions

Once you’ve defined the individual contacts and contact groups, it is time to start attaching them to a specific host or service definition as shown below.  
Note: Following host is defined under
     /usr/local/nagios/etc/objects/servers/email-server.cfg.
     This can be any host definition file.

define host{
use                     linux-server
host_name               email-server
alias                   Corporate Email Server
address                 192.168.1.14
contact_groups          unix-admins
}

Note: Following is defined under
      /usr/local/nagios/etc/objects/servers/db-server.cfg.
      This can be any host definition file.

define service{
use                             generic-service
host_name                       prod-db
service_description             CPU Load
contact_groups                  unix-admins
check_command                   check_nrpe!check_load
}

define service{
use                             generic-service
host_name                       prod-db
service_description             MySQL Database Status
contact_groups                  db-admins
check_command                   check_mysql_db
}
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
How To Monitor VPN Active Sessions and Temperature Using Nagios
 
Previously we discussed about how to use Nagios to monitor a Linux and Windows server. In this article, let us review how to monitor active sessions and temperature of VPN device using Nagios. You can monitor pretty much anything about a hardware using the nagios check_snmp plug-in.

1. Identify a cfg file to define host, hostgroup and services for VPN device

You can either create a new vpn.cfg file or re-use one of the existing .cfg file. In this article, I’ve added the VPN service and hostgroup definition to an existing switch.cfg file. Make sure the switch.cfg line in nagios.cfg file is not commented as shown below.
# grep switch.cfg /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/switch.cfg

2. Add new hostgroup for VPN device in switch.cfg

Add the following ciscovpn hostgroup to the /usr/local/nagios/etc/objects/switch.cfg file.
define hostgroup{
hostgroup_name  ciscovpn
alias           Cisco VPN Concentrator
}

3. Add new host for VPN device in switch.cfg

In this example, I’ve defined two hosts–one for primary and another for secondary Cisco VPN concentrator in the /usr/local/nagios/etc/objects/switch.cfg file. Change the address directive to your VPN device ip-address accordingly.
define host{
use                     generic-host
host_name               cisco-vpn-primary
alias                   Cisco VPN Concentrator Primary
address                 192.168.1.7
check_command           check-host-alive
max_check_attempts      10
notification_interval   120
notification_period     24x7
notification_options    d,r
contact_groups          admins
hostgroups              ciscovpn
}

define host{
use                     generic-host
host_name               cisco-vpn-secondary
alias                   Cisco VPN Concentrator Secondary
address                 192.168.1.9
check_command           check-host-alive
max_check_attempts      10
notification_interval   120
notification_period     24x7
notification_options    d,r
contact_groups          admins
hostgroups              ciscovpn
}

4. Add new services to monitor VPN active sessions and temperature in switch.cfg

Add the “Temperature” service and “Active VPN Sessions” service to the /usr/local/nagios/etc/objects/switch.cfg file.
define service{
use                             generic-service
hostgroup_name                  ciscovpn
service_description             Temperature
is_volatile                     0
check_period                    24x7
max_check_attempts              4
normal_check_interval           10
retry_check_interval            2
contact_groups                  admins
notification_interval           960
notification_period             24x7
check_command                   check_snmp!-l Temperature -o .1.3.6.1.4.1.3076.2.1.2.22.1.29.0,.1.3.6.1.4.1.3076.2.1.2.22.1.33.0 -w 37,:40 -c :40,:45
}

define service{
use                             generic-service
hostgroup_name                  ciscovpn
service_description             Active VPN Sessions
is_volatile                     0
check_period                    24x7
max_check_attempts              4
normal_check_interval           5
retry_check_interval            1
contact_groups                  admins
notification_interval           960
notification_period             24x7
check_command                   check_snmp!-l ActiveSessions -o 1.3.6.1.4.1.3076.2.1.2.17.1.7.0,1.3.6.1.4.1.3076.2.1.2.17.1.9.0 -w :70,:8 -c :75,:10
}

5. Validate the check_snmp from command line

Check_snmp plug-in uses the ‘snmpget’ command from the NET-SNMP package. Make sure the net-snmp is installed on your system as shown below. If not, download it from NET-SNMP website.
# rpm -qa | grep -i net-snmp
net-snmp-libs-5.1.2-11.el4_6.11.2
net-snmp-5.1.2-11.el4_6.11.2
net-snmp-utils-5.1.2-11.EL4.10
Make sure the check_snmp works from command line as shown below.
# /usr/local/nagios/libexec/check_snmp -H 192.168.1.7 \
-P 2c -l Temperature -w :35,:40 -c :40,:45 \
-o .1.3.6.1.4.1.3076.2.1.2.22.1.29.0,.1.3.6.1.4.1.3076.2.1.2.22.1.33.0

Temperature OK - 35 38 | iso.3.6.1.4.1.3076.2.1.2.22.1.29.0=35
                         iso.3.6.1.4.1.3076.2.1.2.22.1.33.0=38

# /usr/local/nagios/libexec/check_snmp -H 192.168.1.7 \
-P 2c -l ActiveSessions -w :80,:40 -c :100,:50 \
-o 1.3.6.1.4.1.3076.2.1.2.17.1.7.0,1.3.6.1.4.1.3076.2.1.2.17.1.9.0

ActiveSessions CRITICAL - *110* 20 | iso.3.6.1.4.1.3076.2.1.2.17.1.7.0=110
                                     iso.3.6.1.4.1.3076.2.1.2.17.1.9.0=20
In this example, following parameters are passed to the check_snmp:
  • -H, –hostname=ADDRESS Host name, IP Address, or unix socket (must be an absolute path)
  • -P, –protocol=[1|2c|3] SNMP protocol version
  • -l, –label=STRING Prefix label for output from plugin. i.e Temerature or ActiveSessions
  • -w, –warning=INTEGER_RANGE(s) Range(s) which will not result in a WARNING status
  • -c, –critical=INTEGER_RANGE(s) Range(s) which will not result in a CRITICAL status
  • -o, –oid=OID(s) Object identifier(s) or SNMP variables whose value you wish to query. Make sure to refer to the manual of your device to see all the supported and available oid’s for your equipment. If you have more than two oid’s, separate them with comma.
In the ActiveSessions example, two OID’s are getting monitored. i.e one for VPN LAN-2-LAN tunnels (iso.3.6.1.4.1.3076.2.1.2.17.1.7.0) and another for PPTP sessions (iso.3.6.1.4.1.3076.2.1.2.17.1.9.0). In the above example, VPN LAN-2-LAN active sessions has exceeded the critical limit of 100. Object Identifier (OID) is arranged in a hierarchical Management Information Base (MIB) tree with roots and branches based on the internet standard.

6. Validate configuration and restart nagios

Verify the nagios configuration to make sure there are no warnings and errors.
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the pre-flight check
Restart the nagios server to start monitoring the VPN device.
# /etc/rc.d/init.d/nagios stop
Stopping nagios: .done.

# /etc/rc.d/init.d/nagios start
Starting nagios: done.
Verify the status of the ActiveSessions and Temperature of the VPN device from the Nagios web UI (http://{nagios-server}/nagios) as shown below.
Nagios Web UI with Cisco VPN device Fig – Nagios Web UI showing VPN Device Status

7. Troubleshooting

Issue: check_snmp works without any issues from Linux command line, but Nagios web UI displays following error:
Status Information: SNMP problem - No data received from host
CMD: /usr/bin/snmpget -t 1 -r 5 -m '' -v 1 [authpriv] 192.168.1.7:161
Solution: Make sure the check_command definition for check_snmp plugin in the switch.cfg file is properly defined. The arguments to the check_snmp command should match the check_snmp definition in the /usr/local/nagios/etc/commands.cfg
check_command check_snmp!Temperature!.1.3.6.1.4.1.3076.2.1.2.22.1.29.0,.1.3.6.1.4.1.3076.2.1.2.22.1.33.0!37,:40!:40,:45
[Note: This is wrong, as it is passing 4 arguments to check_snmp command
The value after the exclamation is considered as one argument. !{argument1}!{argument2}]

check_command check_snmp!-l Temperature -o .1.3.6.1.4.1.3076.2.1.2.22.1.29.0,.1.3.6.1.4.1.3076.2.1.2.22.1.33.0 -w 37,:40 -c :40,:45
[Note: This is correct, as it is passing 1 argument to check_snmp command
The value after the exclamation is considered as one argument. !{argument1}]
In the check_snmp command definition shown below, there is only one $ARG1$ argument. So, in the switch.cfg, while defining the check_snmp, you need to pass only one argument as shown above.
# 'check_snmp' command definition
define command{
command_name    check_snmp
command_line    $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}

Recommended Reading

These are the two best book that covers the latest Nagios 3. I strongly recommend that you read both of these books to gain a detailed understanding on Nagios. http://nagios.sourceforge.net/docs/3_0/toc.html Introduction This guide is intended to provide you with simple instructions on how to install Nagios from source (code) on Fedora and have it monitoring your local machine inside of 20 minutes. No advanced installation options are discussed here - just the basics that will work for 95% of users who want to get started. These instructions were written based on a standard Fedora Core 6 Linux distribution. What You'll End Up With If you follow these instructions, here's what you'll end up with:
  • Nagios and the plugins will be installed underneath /usr/local/nagios
  • Nagios will be configured to monitor a few aspects of your local system (CPU load, disk usage, etc.)
  • The Nagios web interface will be accessible at http://localhost/nagios/
Prerequisites During portions of the installation you'll need to have root access to your machine. Make sure you've installed the following packages on your Fedora installation before continuing.
  • Apache
  • PHP
  • GCC compiler
  • GD development libraries
You can use yum to install these packages by running the following commands (as root):
yum install httpd php

yum install gcc glibc glibc-common

yum install gd gd-devel

1) Create Account Information Become the root user.
su -l

Create a new nagios user account and give it a password.
/usr/sbin/useradd -m nagios

passwd nagios

Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both thenagios user and the apache user to the group.
/usr/sbin/groupadd nagcmd

/usr/sbin/usermod -a -G nagcmd nagios

/usr/sbin/usermod -a -G nagcmd apache

2) Download Nagios and the Plugins Create a directory for storing the downloads.
mkdir ~/downloads

cd ~/downloads

Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions). These directions were tested with Nagios 3.1.1 and Nagios Plugins 1.4.11.
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.3.tar.gz

wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz

3) Compile and Install Nagios Extract the Nagios source code tarball.
cd ~/downloads

tar xzf nagios-3.2.3.tar.gz

cd nagios-3.2.3

Run the Nagios configure script, passing the name of the group you created earlier like so:
./configure --with-command-group=nagcmd

Compile the Nagios source code.
make all

Install binaries, init script, sample config files and set permissions on the external command directory.
make install

make install-init

make install-config

make install-commandmode

Don't start Nagios yet - there's still more that needs to be done... 4) Customize Configuration Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed... Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.
vi /usr/local/nagios/etc/objects/contacts.cfg

5) Configure the Web Interface Install the Nagios web config file in the Apache conf.d directory.
make install-webconf

Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Restart Apache to make the new settings take effect.
service httpd restart

Note Note: Consider implementing the ehanced CGI security measures described here to ensure that your web authentication credentials are not compromised. 6) Compile and Install the Nagios Plugins Extract the Nagios plugins source code tarball.
cd ~/downloads

tar xzf nagios-plugins-1.4.11.tar.gz

cd nagios-plugins-1.4.11

Compile and install the plugins.
./configure --with-nagios-user=nagios --with-nagios-group=nagios

make

make install

7) Start Nagios Add Nagios to the list of system services and have it automatically start when the system boots.
chkconfig --add nagios

chkconfig nagios on

Verify the sample Nagios configuration files.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If there are no errors, start Nagios.
service nagios start

8) Modify SELinux Settings Fedora ships with SELinux (Security Enhanced Linux) installed and in Enforcing mode by default. This can result in "Internal Server Error" messages when you attempt to access the Nagios CGIs. See if SELinux is in Enforcing mode.
getenforce

Put SELinux into Permissive mode.
setenforce 0

To make this change permanent, you'll have to modify the settings in /etc/selinux/config and reboot. Instead of disabling SELinux or setting it to permissive mode, you can use the following command to run the CGIs under SELinux enforcing/targeted mode:
chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/

chcon -R -t httpd_sys_content_t /usr/local/nagios/share/

For information on running the Nagios CGIs under Enforcing mode with a targeted policy, visit the Nagios Support Portal or Nagios Community Wiki. 9) Login to the Web Interface You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.
http://localhost/nagios/

Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time. 10) Other Modifications Make sure your machine's firewall rules are configured to allow access to the web server if you want to access theNagios interface remotely. Configuring email notifications is out of the scope of this documentation. While Nagios is currently configured to send you email notifications, your system may not yet have a mail program properly installed or configured. Refer to your system documentation, search the web, or look to the Nagios Support Portal or Nagios Community Wiki for specific instructions on configuring your system to send email messages to external addresses. More information on notifications can be found here. 11) You're Done Congratulations! You sucessfully installed Nagios. Your journey into monitoring is just beginning. You'll no doubt want to monitor more than just your local machine, so check out the following docs... Security Considerations Introduction Security This is intended to be a brief overview of some things you should keep in mind when installing Nagios, so as set it up in a secure manner. Your monitoring box should be viewed as a backdoor into your other systems. In many cases, the Nagiosserver might be allowed access through firewalls in order to monitor remote servers. In most all cases, it is allowed to query those remote servers for various information. Monitoring servers are always given a certain level of trust in order to query remote systems. This presents a potential attacker with an attractive backdoor to your systems. An attacker might have an easier time getting into your other systems if they compromise the monitoring server first. This is particularly true if you are making use of shared SSH keys in order to monitor remote systems. If an intruder has the ability to submit check results or external commands to the Nagios daemon, they have the potential to submit bogus monitoring data, drive you nuts you with bogus notifications, or cause event handler scripts to be triggered. If you have event handler scripts that restart services, cycle power, etc. this could be particularly problematic. Another area of concern is the ability for intruders to sniff monitoring data (status information) as it comes across the wire. If communication channels are not encrypted, attackers can gain valuable information by watching your monitoring information. Take as an example the following situation: An attacker captures monitoring data on the wire over a period of time and analyzes the typical CPU and disk load usage of your systems, along with the number of users that are typically logged into them. The attacker is then able to determine the best time to compromise a system and use its resources (CPU, etc.) without being noticed. Here are some tips to help ensure that you keep your systems secure when implementing a Nagios-based monitoring solution... Best Practices
  1. Use a Dedicated Monitoring Box. I would recommend that you install Nagios on a server that is dedicated to monitoring (and possibly other admin tasks). Protect your monitoring server as if it were one of the most important servers on your network. Keep running services to a minimum and lock down access to it via TCP wrappers, firewalls, etc. Since the Nagios server is allowed to talk to your servers and may be able to poke through your firewalls, allowing users access to your monitoring server can be a security risk. Remember, its always easier to gain root access through a system security hole if you have a local account on a box. Monitoring Box
  2. Don't Run Nagios As RootNagios doesn't need to run as root, so don't do it. You can tell Nagios to drop privileges after startup and run as another user/group by using the nagios_user and nagios_group directives in the main config file. If you need to execute event handlers or plugins which require root access, you might want to try using sudo.
  3. Lock Down The Check Result Directory. Make sure that only the nagios user is able to read/write in thecheck result path. If users other than nagios (or root) are able to write to this directory, they could send fake host/service check results to the Nagios daemon. This could result in annoyances (bogus notifications) or security problems (event handlers being kicked off).
  4. Lock Down The External Command File. If you enable external commands, make sure you set proper permissions on the /usr/local/nagios/var/rw directory. You only want the Nagios user (usually nagios) and the web server user (usually nobodyhttpdapache2, or www-data) to have permissions to write to the command file. If you've installed Nagios on a machine that is dedicated to monitoring and admin tasks and is not used for public accounts, that should be fine. If you've installed it on a public or multi-user machine (not recommended), allowing the web server user to have write access to the command file can be a security problem. After all, you don't want just any user on your system controlling Nagios through the external command file. In this case, I would suggest only granting write access on the command file to the nagiosuser and using something like CGIWrap to run the CGIs as the nagios user instead of nobody.
  5. Require Authentication In The CGIs. I would strongly suggest requiring authentication for accessing the CGIs. Once you do that, read the documentation on the default rights that authenticated contacts have, and only authorize specific contacts for additional rights as necessary. Instructions on setting up authentication and configuring authorization rights can be found here. If you disable the CGI authentication features using the use_authentication directive in the CGI config file, the command CGI will refuse to write any commands to the external command file. After all, you don't want the world to be able to control Nagios do you?
  6. Implement Enhanced CGI Security Measures. I would strongly suggest that you consider implementing enhanced security measures for the CGIs as described here. These measures can help ensure that the username/password you use to access the Nagios web interface are not intercepted by third parties.
  7. Use Full Paths In Command Definitions. When you define commands, make sure you specify the full path(not a relative one) to any scripts or binaries you're executing.
  8. Hide Sensitive Information With $USERn$ Macros. The CGIs read the main config file and object config file(s), so you don't want to keep any sensitive information (usernames, passwords, etc) in there. If you need to specify a username and/or password in a command definition use a $USERn$ macro to hide it. $USERn$ macros are defined in one or more resource files. The CGIs will not attempt to read the contents of resource files, so you can set more restrictive permissions (600 or 660) on them. See the sample resource.cfg file in the base of the Nagios distribution for an example of how to define $USERn$ macros.
  9. Strip Dangerous Characters From Macros. Use the illegal_macro_output_chars directive to strip dangerous characters from the $HOSTOUTPUT$, $SERVICEOUTPUT$, $HOSTPERFDATA$, and $SERVICEPERFDATA$ macros before they're used in notifications, etc. Dangerous characters can be anything that might be interpreted by the shell, thereby opening a security hole. An example of this is the presence of backtick (`) characters in the $HOSTOUTPUT$, $SERVICEOUTPUT$, $HOSTPERFDATA$, and/or $SERVICEPERFDATA$ macros, which could allow an attacker to execute an arbitrary command as the nagios user (one good reason not to run Nagios as the root user).
  10. Secure Access to Remote Agents. Make sure you lock down access to agents (NRPE, NSClient, SNMP, etc.) on remote systems using firewalls, access lists, etc. You don't want everyone to be able to query your systems for status information. This information could be used by an attacker to execute remote event handler scripts or to determine the best times to go unnoticed. Remote Agents
  11. Secure Communication Channels. Make sure you encrypt communication channels between differentNagios installations and between your Nagios servers and your monitoring agents whenever possible. You don't want someone to be able to sniff status information going across your network. This information could be used by an attacker to determine the best times to go unnoticed. Communication Channels

Monitoring Windows Machines

Introduction This document describes how you can monitor "private" services and attributes of Windows machines, such as:
  • Memory usage
  • CPU load
  • Disk usage
  • Service states
  • Running processes
  • etc.
Publicly available services that are provided by Windows machines (HTTP, FTP, POP3, etc.) can be monitored easily by following the documentation on monitoring publicly available services. Note Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample config files (commands.cfgtemplates.cfg, etc.) that are installed if you follow the quickstart. Overview Monitoring a Windows Machine Monitoring private services or attributes of a Windows machine requires that you install an agent on it. This agent acts as a proxy between the Nagios plugin that does the monitoring and the actual service or attribute of the Windows machine. Without installing an agent on the Windows box,Nagios would be unable to monitor private services or attributes of the Windows box. For this example, we will be installing the NSClient++ addon on the Windows machine and using the check_nt plugin to communicate with the NSClient++ addon. The check_nt plugin should already be installed on the Nagios server if you followed the quickstart guide. Other Windows agents (like NC_Net) could be used instead of NSClient++ if you wish - provided you change command and service definitions, etc. a bit. For the sake of simplicity I will only cover using the NSClient++ addon in these instructions. Steps There are several steps you'll need to follow in order to monitor a new Windows machine. They are:
  1. Perform first-time prerequisites
  2. Install a monitoring agent on the Windows machine
  3. Create new host and service definitions for monitoring the Windows machine
  4. Restart the Nagios daemon
What's Already Done For You To make your life a bit easier, a few configuration tasks have already been done for you:
  • check_nt command definition has been added to the commands.cfg file. This allows you to use thecheck_nt plugin to monitor Window services.
  • A Windows server host template (called windows-server) has already been created in the templates.cfg file. This allows you to add new Windows host definitions in a simple manner.
The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your Windows boxes in no time. Prerequisites The first time you configure Nagios to monitor a Windows machine, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* Windows machine you monitor. Edit the main Nagios config file.
vi /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

Save the file and exit. What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/windows.cfg to find additional object definitions. That's where you'll be adding Windows host and service definitions. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* Windows machine you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones. Installing the Windows Agent Before you can begin monitoring private services and attributes of Windows machines, you'll need to install an agent on those machines. I recommend using the NSClient++ addon, which can be found at http://sourceforge.net/projects/nscplus. These instructions will take you through a basic installation of the NSClient++ addon, as well as the configuration of Nagios for monitoring the Windows machine. 1. Download the latest stable version of the NSClient++ addon from http://sourceforge.net/projects/nscplus 2. Unzip the NSClient++ files into a new C:\NSClient++ directory 3. Open a command prompt and change to the C:\NSClient++ directory 4. Register the NSClient++ system service with the following command:
 nsclient++ /install

5. Install the NSClient++ systray with the following command ('SysTray' is case-sensitive):
 nsclient++ SysTray

6. Open the services manager and make sure the NSClientpp service is allowed to interact with the desktop (see the 'Log On' tab of the services manager). If it isn't already allowed to interact with the desktop, check the box to allow it to. NSClientpp 7. Edit the NSC.INI file (located in the C:\NSClient++ directory) and make the following changes:
  • Uncomment all the modules listed in the [modules] section, except for CheckWMI.dll and RemoteConfiguration.dll
  • Optionally require a password for clients by changing the 'password' option in the [Settings] section.
  • Uncomment the 'allowed_hosts' option in the [Settings] section. Add the IP address of the Nagios server to this line, or leave it blank to allow all hosts to connect.
  • Make sure the 'port' option in the [NSClient] section is uncommented and set to '12489' (the default port).
8. Start the NSClient++ service with the following command:
 nsclient++ /start

9. If installed properly, a new icon should appear in your system tray. It will be a yellow circle with a black 'M' inside. 10. Success! The Windows server can now be added to the Nagios monitoring configuration... Configuring Nagios Now it's time to define some object definitions in your Nagios configuration files in order to monitor the new Windows machine. Open the windows.cfg file for editing.
vi /usr/local/nagios/etc/objects/windows.cfg

Add a new host definition for the Windows machine that you're going to monitor. If this is the *first* Windows machine you're monitoring, you can simply modify the sample host definition in windows.cfg. Change the host_namealias, andaddress fields to appropriate values for the Windows box.
define host{

 use  windows-server ; Inherit default values from a Windows server template (make sure you keep this line!)

 host_name  winserver

 alias  My Windows Server

 address  192.168.1.2

 }

Good. Now you can add some service definitions (to the same configuration file) in order to tell Nagios to monitor different aspects of the Windows machine. If this is the *first* Windows machine you're monitoring, you can simply modify the sample service definitions in windows.cfg. Note Note: Replace "winserver" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added. Add the following service definition to monitor the version of the NSClient++ addon that is running on the Windows server. This is useful when it comes time to upgrade your Windows servers to a newer version of the addon, as you'll be able to tell which Windows machines still need to be upgraded to the latest version of NSClient++.
define service{

 use   generic-service

 host_name   winserver

 service_description NSClient++ Version

 check_command  check_nt!CLIENTVERSION

 }

Add the following service definition to monitor the uptime of the Windows server.
define service{

 use   generic-service

 host_name   winserver

 service_description Uptime

 check_command  check_nt!UPTIME

 }

Add the following service definition to monitor the CPU utilization on the Windows server and generate a CRITICAL alert if the 5-minute CPU load is 90% or more or a WARNING alert if the 5-minute load is 80% or greater.
define service{

 use   generic-service

 host_name   winserver

 service_description CPU Load

 check_command  check_nt!CPULOAD!-l 5,80,90

 }

Add the following service definition to monitor memory usage on the Windows server and generate a CRITICAL alert if memory usage is 90% or more or a WARNING alert if memory usage is 80% or greater.
define service{

 use   generic-service

 host_name   winserver

 service_description Memory Usage

 check_command  check_nt!MEMUSE!-w 80 -c 90

 }

Add the following service definition to monitor usage of the C:\ drive on the Windows server and generate a CRITICAL alert if disk usage is 90% or more or a WARNING alert if disk usage is 80% or greater.
define service{

 use   generic-service

 host_name   winserver

 service_description C:\ Drive Space

 check_command  check_nt!USEDDISKSPACE!-l c -w 80 -c 90

 }

Add the following service definition to monitor the W3SVC service state on the Windows machine and generate a CRITICAL alert if the service is stopped.
define service{

 use   generic-service

 host_name   winserver

 service_description W3SVC

 check_command  check_nt!SERVICESTATE!-d SHOWALL -l W3SVC

 }

Add the following service definition to monitor the Explorer.exe process on the Windows machine and generate a CRITICAL alert if the process is not running.
define service{

 use   generic-service

 host_name   winserver

 service_description Explorer

 check_command  check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe

 }

That's it for now. You've added some basic services that should be monitored on the Windows box. Save the configuration file. Password Protection If you specified a password in the NSClient++ configuration file on the Windows machine, you'll need to modify thecheck_nt command definition to include the password. Open the commands.cfg file for editing.
vi /usr/local/nagios/etc/objects/commands.cfg

Change the definition of the check_nt command to include the "-s " argument (where PASSWORD is the password you specified on the Windows machine) like this:
define command{

 command_name check_nt

 command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s PASSWORD -v $ARG1$ $ARG2$

 }

Save the file. Restarting Nagios You're done with modifying the Nagios configuration, so you'll need to verify your configuration files and restart Nagios. If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

Monitoring Publicly Available Services

Introduction This document describes how you can monitor publicly available services, applications and protocols. By "public" I mean services that are accessible across the network - either the local network or the greater Internet. Examples of public services include HTTP, POP3, IMAP, FTP, and SSH. There are many more public services that you probably use on a daily basis. These services and applications, as well as their underlying protocols, can usually be monitored by Nagios without any special access requirements. Private services, in contrast, cannot be monitored with Nagios without an intermediary agent of some kind. Examples of private services associated with hosts are things like CPU load, memory usage, disk usage, current user count, process information, etc. These private services or attributes of hosts are not usually exposed to external clients. This situation requires that an intermediary monitoring agent be installed on any host that you need to monitor such information on. More information on monitoring private services on different types of hosts can be found in the documentation on: Tip Tip: Occassionally you will find that information on private services and applications can be monitored with SNMP. The SNMP agent allows you to remotely monitor otherwise private (and inaccessible) information about the host. For more information about monitoring services using SNMP, check out the documentation on monitoring switches and routers. Note Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample commands.cfg and localhost.cfg config files. Plugins For Monitoring Services When you find yourself needing to monitor a particular application, service, or protocol, chances are good that a pluginexists to monitor it. The official Nagios plugins distribution comes with plugins that can be used to monitor a variety of services and protocols. There are also a large number of contributed plugins that can be found in the contrib/subdirectory of the plugin distribution. The NagiosExchange.org website hosts a number of additional plugins that have been written by users, so check it out when you have a chance. If you don't happen to find an appropriate plugin for monitoring what you need, you can always write your own. Plugins are easy to write, so don't let this thought scare you off. Read the documentation on developing plugins for more information. I'll walk you through monitoring some basic services that you'll probably use sooner or later. Each of these services can be monitored using one of the plugins that gets installed as part of the Nagios plugins distribution. Let's get started... Creating A Host Definition Before you can monitor a service, you first need to define a host that is associated with the service. You can place host definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by acfg_dir directive. If you have already created a host definition, you can skip this step. For this example, lets say you want to monitor a variety of services on a remote host. Let's call that host remotehost. The host definition can be placed in its own file or added to an already exiting object configuration file. Here's what the host definition for remotehost might look like:
define host{

 use  generic-host  ; Inherit default values from a template

 host_name  remotehost  ; The name we're giving to this host

 alias  Some Remote Host ; A longer name associated with the host

 address  192.168.1.50  ; IP address of the host

 hostgroups  allhosts  ; Host groups this host is associated with

 }

Now that a definition has been added for the host that will be monitored, we can start defining services that should be monitored. As with host definitions, service definitions can be placed in any object configuration file. Creating Service Definitions For each service you want to monitor, you need to define a service in Nagios that is associated with the host definition you just created. You can place service definitions in any object configuration file specified by a cfg_file directive or placed in a directory specified by a cfg_dir directive. Some example service definitions for monitoring common public service (HTTP, FTP, etc.) are given below. Monitoring HTTP Chances are you're going to want to monitor web servers at some point - either yours or someone else's. Thecheck_http plugin is designed to do just that. It understands the HTTP protocol and can monitor response time, error codes, strings in the returned HTML, server certificates, and much more. The commands.cfg file contains a command definition for using the check_http plugin. It looks like this:
define command{

 name  check_http

 command_name check_http

 command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

 }

A simple service definition for monitoring the HTTP service on the remotehost machine might look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description HTTP

 check_command check_http

 }

This simple service definition will monitor the HTTP service running on remotehost. It will produce alerts if the web server doesn't respond within 10 seconds or if it returns HTTP errors codes (403, 404, etc.). That's all you need for basic monitoring. Pretty simple, huh? Tip Tip: For more advanced monitoring, run the check_http plugin manually with --help as a command-line argument to see all the options you can give the plugin. This --help syntax works with all of the plugins I'll cover in this document. A more advanced definition for monitoring the HTTP service is shown below. This service definition will check to see if the /download/index.php URI contains the string "latest-version.tar.gz". It will produce an error if the string isn't found, the URI isn't valid, or the web server takes longer than 5 seconds to respond.
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description Product Download Link

 check_command check_http!-u /download/index.php -t 5 -s "latest-version.tar.gz"

 }

Monitoring FTP When you need to monitor FTP servers, you can use the check_ftp plugin. The commands.cfg file contains a command definition for using the check_ftp plugin, which looks like this:
define command{

 command_name check_ftp

 command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$

 }

A simple service definition for monitoring the FTP server on remotehost would look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description FTP

 check_command check_ftp

 }

This service definition will monitor the FTP service and generate alerts if the FTP server doesn't respond within 10 seconds. A more advanced service definition is shown below. This service will check the FTP server running on port 1023 onremotehost. It will generate an alert if the server doesn't respond within 5 seconds or if the server response doesn't contain the string "Pure-FTPd [TLS]".
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description Special FTP 

 check_command check_ftp!-p 1023 -t 5 -e "Pure-FTPd [TLS]"

 }

Monitoring SSH When you need to monitor SSH servers, you can use the check_ssh plugin. The commands.cfg file contains a command definition for using the check_ssh plugin, which looks like this:
define command{

 command_name check_ssh

 command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$

 }

A simple service definition for monitoring the SSH server on remotehost would look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description SSH

 check_command check_ssh

 }

This service definition will monitor the SSH service and generate alerts if the SSH server doesn't respond within 10 seconds. A more advanced service definition is shown below. This service will check the SSH server and generate an alert if the server doesn't respond within 5 seconds or if the server version string string doesn't match "OpenSSH_4.2".
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description SSH Version Check 

 check_command check_ssh!-t 5 -r "OpenSSH_4.2"

 }

Monitoring SMTP The check_smtp plugin can be using for monitoring your email servers. The commands.cfg file contains a command definition for using the check_smtp plugin, which looks like this:
define command{

 command_name check_smtp

 command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$

 }

A simple service definition for monitoring the SMTP server on remotehost would look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description SMTP

 check_command check_smtp

 }

This service definition will monitor the SMTP service and generate alerts if the SMTP server doesn't respond within 10 seconds. A more advanced service definition is shown below. This service will check the SMTP server and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description SMTP Response Check 

 check_command check_smtp!-t 5 -e "mygreatmailserver.com"

 }

Monitoring POP3 The check_pop plugin can be using for monitoring the POP3 service on your email servers. The commands.cfg file contains a command definition for using the check_pop plugin, which looks like this:
define command{

 command_name check_pop

 command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$

 }

A simple service definition for monitoring the POP3 service on remotehost would look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description POP3

 check_command check_pop

 }

This service definition will monitor the POP3 service and generate alerts if the POP3 server doesn't respond within 10 seconds. A more advanced service definition is shown below. This service will check the POP3 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description POP3 Response Check 

 check_command check_pop!-t 5 -e "mygreatmailserver.com"

 }

Monitoring IMAP The check_imap plugin can be using for monitoring IMAP4 service on your email servers. The commands.cfg file contains a command definition for using the check_imap plugin, which looks like this:
define command{

 command_name check_imap

 command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$

 }

A simple service definition for monitoring the IMAP4 service on remotehost would look like this:
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description IMAP

 check_command check_imap

 }

This service definition will monitor the IMAP4 service and generate alerts if the IMAP server doesn't respond within 10 seconds. A more advanced service definition is shown below. This service will check the IMAP4 service and generate an alert if the server doesn't respond within 5 seconds or if the response from the server doesn't contain "mygreatmailserver.com".
define service{

 use  generic-service  ; Inherit default values from a template

 host_name  remotehost

 service_description IMAP4 Response Check 

 check_command check_imap!-t 5 -e "mygreatmailserver.com"

 }

Restarting Nagios Once you've added the new host and service definitions to your object configuration file(s), you're ready to start monitoring them. To do this, you'll need to verify your configuration and restart Nagios. If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

Monitoring Linux/Unix Machines

Introduction This document describes how you can monitor "private" services and attributes of Linux/UNIX servers, such as:
  • CPU load
  • Memory usage
  • Disk usage
  • Logged in users
  • Running processes
  • etc.
Publicly available services that are provided by Linux servers (HTTP, FTP, SSH, SMTP, etc.) can be monitored easily by following the documentation on monitoring publicly available services. Note Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample config files (commands.cfgtemplates.cfg, etc.) that are installed if you follow the quickstart. Overview [Note: This document has not been completed. I would recommend you read the documentation on the NRPE addon for instructions on how to monitor a remote Linux/Unix server.] There are several different ways to monitor attributes or remote Linux/Unix servers. One is by using shared SSH keys and the check_by_ssh plugin to execute plugins on remote servers. This method will not be covered here, but can result in high load on your monitoring server if you are monitoring hundreds or thousands of services. The overhead of setting up/destroying SSH connections is the cause of this. NRPE Another common method of monitoring remote Linux/Unix hosts is to use the NRPE addon. NRPE allows you to execute plugins on remote Linux/Unix hosts. This is useful if you need to monitor local resources/attributes like disk usage, CPU load, memory usage, etc. on a remote host.  

Monitoring Routers and Switches

Introduction Switch This document describes how you can monitor the status of network switches and routers. Some cheaper "unmanaged" switches and hubs don't have IP addresses and are essentially invisible on your network, so there's not any way to monitor them. More expensive switches and routers have addresses assigned to them and can be monitored by pinging them or using SNMP to query status information. I'll describe how you can monitor the following things on managed switches, hubs, and routers:
  • Packet loss, round trip average
  • SNMP status information
  • Bandwidth / traffic rate
Note Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample config files (commands.cfgtemplates.cfg, etc.) that are installed when you follow the quickstart. Overview Monitoring a Router or Switch Monitoring switches and routers can either be easy or more involved - depending on what equipment you have and what you want to monitor. As they are critical infrastructure components, you'll no doubt want to monitor them in at least some basic manner. Switches and routers can be monitored easily by "pinging" them to determine packet loss, RTA, etc. If your switch supports SNMP, you can monitor port status, etc. with thecheck_snmp plugin and bandwidth (if you're using MRTG) with the check_mrtgtraf plugin. The check_snmp plugin will only get compiled and installed if you have the net-snmp and net-snmp-utils packages installed on your system. Make sure the plugin exists in /usr/local/nagios/libexec before you continue. If it doesn't, install net-snmp and net-snmp-utils and recompile/reinstall the Nagios plugins. Steps There are several steps you'll need to follow in order to monitor a new router or switch. They are:
  1. Perform first-time prerequisites
  2. Create new host and service definitions for monitoring the device
  3. Restart the Nagios daemon
What's Already Done For You To make your life a bit easier, a few configuration tasks have already been done for you:
  • Two command definitions (check_snmp and check_local_mrtgtraf) have been added to the commands.cfgfile. These allows you to use the check_snmp and check_mrtgtraf plugins to monitor network routers.
  • A switch host template (called generic-switch) has already been created in the templates.cfg file. This allows you to add new router/switch host definitions in a simple manner.
The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your network routers/switches in no time. Prerequisites The first time you configure Nagios to monitor a network switch, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* switch you monitor. Edit the main Nagios config file.
vi /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg

Save the file and exit. What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/switch.cfg to find additional object definitions. That's where you'll be adding host and service definitions for routers and switches. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* router/switch you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones. Configuring Nagios You'll need to create some object definitions in order to monitor a new router/switch. Open the switch.cfg file for editing.
vi /usr/local/nagios/etc/objects/switch.cfg

Add a new host definition for the switch that you're going to monitor. If this is the *first* switch you're monitoring, you can simply modify the sample host definition in switch.cfg. Change the host_namealias, and address fields to appropriate values for the switch.
define host{

 use  generic-switch  ; Inherit default values from a template

 host_name  linksys-srw224p  ; The name we're giving to this switch

 alias  Linksys SRW224P Switch ; A longer name associated with the switch

 address  192.168.1.253  ; IP address of the switch

 hostgroups allhosts,switches   ; Host groups this switch is associated with

 }

Monitoring Services Now you can add some service definitions (to the same configuration file) to monitor different aspects of the switch. If this is the *first* switch you're monitoring, you can simply modify the sample service definition in switch.cfg. Note Note: Replace "linksys-srw224p" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added. Monitoring Packet Loss and RTA Add the following service definition in order to monitor packet loss and round trip average between the Nagios host and the switch every 5 minutes under normal conditions.
define service{

 use   generic-service ; Inherit values from a template

 host_name   linksys-srw224p ; The name of the host the service is associated with

 service_description PING  ; The service description

 check_command  check_ping!200.0,20%!600.0,60% ; The command used to monitor the service

 normal_check_interval 5 ; Check the service every 5 minutes under normal conditions

 retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined

 }

This service will be:
  • CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more
  • WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more
  • OK if the RTA is less than 200 ms and the packet loss is less than 20%
Monitoring SNMP Status Information If your switch or router supports SNMP, you can monitor a lot of information by using the check_snmp plugin. If it doesn't, skip this section. Add the following service definition to monitor the uptime of the switch.
define service{

 use   generic-service ; Inherit values from a template

 host_name   linksys-srw224p

 service_description Uptime 

 check_command  check_snmp!-C public -o sysUpTime.0

 }

In the check_command directive of the service definition above, the "-C public" tells the plugin that the SNMP community name to be used is "public" and the "-o sysUpTime.0" indicates which OID should be checked. If you want to ensure that a specific port/interface on the switch is in an up state, you could add a service definition like this:
define service{

 use   generic-service ; Inherit values from a template

 host_name   linksys-srw224p

 service_description Port 1 Link Status

 check_command  check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB

 }

In the example above, the "-o ifOperStatus.1" refers to the OID for the operational status of port 1 on the switch. The "-r 1" option tells the check_snmp plugin to return an OK state if "1" is found in the SNMP result (1 indicates an "up" state on the port) and CRITICAL if it isn't found. The "-m RFC1213-MIB" is optional and tells the check_snmp plugin to only load the "RFC1213-MIB" instead of every single MIB that's installed on your system, which can help speed things up. That's it for the SNMP monitoring example. There are a million things that can be monitored via SNMP, so its up to you to decide what you need and want to monitor. Good luck! Tip Tip: You can usually find the OIDs that can be monitored on a switch by running the following command (replace192.168.1.253 with the IP address of the switch): snmpwalk -v1 -c public 192.168.1.253 -m ALL .1 Monitoring Bandwidth / Traffic Rate If you're monitoring bandwidth usage on your switches or routers using MRTG, you can have Nagios alert you when traffic rates exceed thresholds you specify. The check_mrtgtraf plugin (which is included in the Nagios plugins distribution) allows you to do this. You'll need to let the check_mrtgtraf plugin know what log file the MRTG data is being stored in, along with thresholds, etc. In my example, I'm monitoring one of the ports on a Linksys switch. The MRTG log file is stored in/var/lib/mrtg/192.168.1.253_1.log. Here's the service definition I use to monitor the bandwidth data that's stored in the log file...
define service{

 use   generic-service ; Inherit values from a template

 host_name   linksys-srw224p

 service_description Port 1 Bandwidth Usage

 check_command  check_local_mrtgtraf!/var/lib/mrtg/192.168.1.253_1.log!AVG!1000000,2000000!5000000,5000000!10

 }

In the example above, the "/var/lib/mrtg/192.168.1.253_1.log" option that gets passed to the check_local_mrtgtrafcommand tells the plugin which MRTG log file to read from. The "AVG" option tells it that it should use average bandwidth statistics. The "1000000,2000000" options are the warning thresholds (in bytes) for incoming traffic rates. The "5000000,5000000" are critical thresholds (in bytes) for outgoing traffic rates. The "10" option causes the plugin to return a CRITICAL state if the MRTG log file is older than 10 minutes (it should be updated every 5 minutes). Save the file. Restarting Nagios Once you've added the new host and service definitions to the switch.cfg file, you're ready to start monitoring the router/switch. To do this, you'll need to verify your configuration and restart Nagios. If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

Monitoring Network Printers

Introduction Printer This document describes how you can monitor the status of networked printers. Specifically, HP printers that have internal/external JetDirect cards/devices, or other print servers (like the Troy PocketPro 100S or the Netgear PS101) that support the JetDirect protocol. The check_hpjd plugin (which is part of the standard Nagios plugins distribution) allows you to monitor the status of JetDirect-capable printers which have SNMP enabled. The plugin is capable of detecting the following printer states:
  • Paper Jam
  • Out of Paper
  • Printer Offline
  • Intervention Required
  • Toner Low
  • Insufficient Memory
  • Open Door
  • Output Tray is Full
  • and more...
Note Note: These instructions assume that you've installed Nagios according to the quickstart guide. The sample configuration entries below reference objects that are defined in the sample config files (commands.cfgtemplates.cfg, etc.) that are installed if you follow the quickstart. Overview Monitoring a Network Printer Monitoring the status of a networked printer is pretty simple. JetDirect-enabled printers usually have SNMP enabled, which allows Nagios to monitor their status using thecheck_hpjd plugin. The check_hpjd plugin will only get compiled and installed if you have the net-snmp and net-snmp-utils packages installed on your system. Make sure the plugin exists in /usr/local/nagios/libexec before you continue. If it doesn't, install net-snmp and net-snmp-utils and recompile/reinstall the Nagios plugins. Steps There are several steps you'll need to follow in order to monitor a new network printer. They are:
  1. Perform first-time prerequisites
  2. Create new host and service definitions for monitoring the printer
  3. Restart the Nagios daemon
What's Already Done For You To make your life a bit easier, a few configuration tasks have already been done for you:
  • check_hpjd command definition has been added to the commands.cfg file. This allows you to use thecheck_hpjd plugin to monitor network printers.
  • A printer host template (called generic-printer) has already been created in the templates.cfg file. This allows you to add new printer host definitions in a simple manner.
The above-mentioned config files can be found in the /usr/local/nagios/etc/objects/ directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your network printers in no time. Prerequisites The first time you configure Nagios to monitor a network printer, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* printer you monitor. Edit the main Nagios config file.
vi /usr/local/nagios/etc/nagios.cfg

Remove the leading pound (#) sign from the following line in the main configuration file:
#cfg_file=/usr/local/nagios/etc/objects/printer.cfg

Save the file and exit. What did you just do? You told Nagios to look to the /usr/local/nagios/etc/objects/printer.cfg to find additional object definitions. That's where you'll be adding host and service definitions for the printer. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* printer you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones. Configuring Nagios You'll need to create some object definitions in order to monitor a new printer. Open the printer.cfg file for editing.
vi /usr/local/nagios/etc/objects/printer.cfg

Add a new host definition for the networked printer that you're going to monitor. If this is the *first* printer you're monitoring, you can simply modify the sample host definition in printer.cfg. Change the host_namealias, and addressfields to appropriate values for the printer.
define host{

 use  generic-printer ; Inherit default values from a template

 host_name  hplj2605dn ; The name we're giving to this printer

 alias  HP LaserJet 2605dn ; A longer name associated with the printer

 address  192.168.1.30 ; IP address of the printer

 hostgroups allhosts  ; Host groups this printer is associated with

 }

Now you can add some service definitions (to the same configuration file) to monitor different aspects of the printer. If this is the *first* printer you're monitoring, you can simply modify the sample service definition in printer.cfg. Note Note: Replace "hplj2605dn" in the example definitions below with the name you specified in the host_namedirective of the host definition you just added. Add the following service definition to check the status of the printer. The service uses the check_hpjd plugin to check the status of the printer every 10 minutes by default. The SNMP community string used to query the printer is "public" in this example.
define service{

 use   generic-service  ; Inherit values from a template

 host_name   hplj2605dn  ; The name of the host the service is associated with

 service_description Printer Status  ; The service description

 check_command  check_hpjd!-C public ; The command used to monitor the service

 normal_check_interval 10 ; Check the service every 10 minutes under normal conditions

 retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined

 }

Add the following service definition to ping the printer every 10 minutes by default. This is useful for monitoring RTA, packet loss, and general network connectivity.
define service{

        use                     generic-service

        host_name               hplj2605dn

        service_description     PING

        check_command           check_ping!3000.0,80%!5000.0,100%

        normal_check_interval   10

        retry_check_interval    1

        }

Save the file. Restarting Nagios Once you've added the new host and service definitions to the printer.cfg file, you're ready to start monitoring the printer. To do this, you'll need to verify your configuration and restart Nagios. If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!

Upgrading Nagios

Upgrading From Previous Nagios 3.x Releases As newer alpha, beta, and stable releases of Nagios 3.x are released, you should strongly consider upgrading as soon as possible. Newer releases usually contain critical bug fixes, so its important to stay up to date. Assuming you've already installed Nagios from source code as described in the quickstart guide, you can install newer versions ofNagios 3.x easily. You don't even need root access to do it, as everything that needed to be done as root was done during the initial install. Here's the upgrade process... Make sure you have a good backup of your existing Nagios installation and configuration files. If anything goes wrong or doesn't work, this will allow you to rollback to your old version. Become the nagios user. Debian/Ubuntu users should use sudo -s nagios.
su -l nagios

Removed the following old HTML files that were used by the web frontend. They have been replaced by PHP equivalents.
rm /usr/local/nagios/share/{main,side,index}.html
Download the source code tarball of the latest version of Nagios (visit http://www.nagios.org/download/ for the link to the latest version).
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.x.tar.gz

Extract the Nagios source code tarball.
tar xzf nagios-3.x.tar.gz

cd nagios-3.x

Run the Nagios configure script, passing the name of the group used to control external command file permissions like so:
./configure --with-command-group=nagcmd

Compile the Nagios source code.
make all

Install updated binaries, documentation, and web web interface. Your existing configuration files will not be overwritten by this step.
make install

Verify your configuration files. Correct any errors shown here before proceeding with the next step.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Restart Nagios. Debian/Ubuntu users should use /etc/init.d/nagios restart.
/sbin/service nagios restart

That's it - you're done! Upgrading From Nagios 2.x It shouldn't be too difficult to upgrade from Nagios 2.x to Nagios 3. The upgrade is essentially the same as what is described above for upgrading to newer 3.x releases. You will, however, have to change your configuration files a bit so they work with Nagios 3:
  • The old service_reaper_frequency variable in the main config file has been renamed tocheck_result_reaper_frequency.
  • The old $NOTIFICATIONNUMBER$ macro has been deprecated in favor of new$HOSTNOTIFICATIONNUMBER$ and $SERVICENOTIFICATIONNUMBER$ macros.
  • The old parallelize directive in service definitions is now deprecated and no longer used, as all service checks are run in parallel.
  • The old aggregate_status_updates option has been removed. All status file updates are now aggregated at a minimum interval of 1 second.
  • Extended host and extended service definitions have been deprecated. They are still read and processed byNagios, but it is recommended that you move the directives found in these definitions to your host and service definitions, respectively.
  • The old downtime_file file variable in the main config file is no longer supported, as scheduled downtime entries are now saved in the retention file. To preserve existing downtime entries, stop Nagios 2.x and append the contents of your old downtime file to the retention file.
  • The old comment_file file variable in the main config file is no longer supported, as comments are now saved in the retention file. To preserve existing comments, stop Nagios 2.x and append the contents of your old comment file to the retention file.
Also make sure to read the "What's New" section of the documentation. It describes all the changes that were made to the Nagios 3 code since the latest stable release of Nagios 2.x. Quite a bit has changed, so make sure you read it over. Upgrading From an RPM Installation If you currently have an RPM- or Debian/Ubuntu APT package-based installation of Nagios and you would like to transition to installing Nagios from the official source code distribution, here's the basic process you should follow:
  1. Stop Nagios
  2. Backup your existing Nagios installation
    • Configuration files
      • Main config file (usually nagios.cfg)
      • Resource config file (usually resource.cfg)
      • CGI config file (usually cgi.cfg)
      • All your object definition files
    • Retention file (usually retention.dat)
    • Current Nagios log file (usually nagios.log)
    • Archived Nagios log files
  3. Uninstall the original RPM or APT package
  4. Install Nagios from source by following the quickstart guide
  5. Restore your original Nagios configuration files, retention file, and log files
  6. Verify your configuration and start Nagios
Note that different RPMs or APT packages may install Nagios in different ways and in different locations. Make sure you've backed up all your critical Nagios files before removing the original RPM or APT package, so you can revert back if you encounter problems.