Monday, August 2, 2010

Nagios: Setting up Nagios on RHEL 5.3

Last week I thought of setting up Nagios on my Linux Box.I installed a fresh piece of RHEL on my Virtualbox and everything went fine. I thought of putting this complete setup on my blog and here it is : "A Complete Monitoring Tool for your Linux Box"

Here is my Machine Configuration:

[root@irc ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)
[root@irc ~]#

[root@irc ~]# uname -arn
Linux irc.chatserver.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@irc ~]#

1) Create Account Information 

Become the root user.


su -l


Create a new nagios user account and give it a password.


/usr/sbin/useradd -m nagios

passwd nagios


Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.


/usr/sbin/groupadd nagcmd

/usr/sbin/usermod -a -G nagcmd nagios

/usr/sbin/usermod -a -G nagcmd apache

2) Download Nagios and the Plugins

Create a directory for storing the downloads.


mkdir ~/downloads

cd ~/downloads


Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions). These directions were tested with Nagios 3.1.1 and Nagios Plugins 1.4.11.


wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz

wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz


3) Compile and Install Nagios 

Extract the Nagios source code tarball.


cd ~/downloads

tar xzf nagios-3.2.0.tar.gz

cd nagios-3.2.0


Run the Nagios configure script, passing the name of the group you created earlier like so:


./configure --with-command-group=nagcmd


Compile the Nagios source code.


make all


Install binaries, init script, sample config files and set permissions on the external command directory.


make install

make install-init

make install-config

make install-commandmode


Don't start Nagios yet - there's still more that needs to be done...

4) Customize Configuration

Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed...

Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.


vi /usr/local/nagios/etc/objects/contacts.cfg


5) Configure the Web Interface 

Install the Nagios web config file in the Apache conf.d directory.


make install-webconf


Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.


htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin


Restart Apache to make the new settings take effect.


service httpd restart


Note: Consider implementing the ehanced CGI security measures described here to ensure that your web authentication credentials are not compromised.

6) Compile and Install the Nagios Plugins

Extract the Nagios plugins source code tarball.


cd ~/downloads

tar xzf nagios-plugins-1.4.11.tar.gz

cd nagios-plugins-1.4.11


Compile and install the plugins.


./configure --with-nagios-user=nagios --with-nagios-group=nagios

make

make install


7) Start Nagios 

Add Nagios to the list of system services and have it automatically start when the system boots.


chkconfig --add nagios

chkconfig nagios on


Verify the sample Nagios configuration files.


/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


If there are no errors, start Nagios.


service nagios start


8) Modify SELinux Settings 

Fedora ships with SELinux (Security Enhanced Linux) installed and in Enforcing mode by default. This can result in "Internal Server Error" messages when you attempt to access the Nagios CGIs.

See if SELinux is in Enforcing mode.


getenforce


Put SELinux into Permissive mode.


setenforce 0


To make this change permanent, you'll have to modify the settings in /etc/selinux/config and reboot.

Instead of disabling SELinux or setting it to permissive mode, you can use the following command to run the CGIs under SELinux enforcing/targeted mode:


chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/

chcon -R -t httpd_sys_content_t /usr/local/nagios/share/


For information on running the Nagios CGIs under Enforcing mode with a targeted policy, visit the Nagios Support Portal or Nagios Community Wiki.

9) Login to the Web Interface 

You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.


http://localhost/nagios/


Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time.

10) Other Modifications 

Make sure your machine's firewall rules are configured to allow access to the web server if you want to access the Nagios interface remotely.

Configuring email notifications is out of the scope of this documentation. While Nagios is currently configured to send you email notifications, your system may not yet have a mail program properly installed or configured. Refer to your system documentation, search the web, or look to the Nagios Support Portal or Nagios Community Wiki for specific instructions on configuring your system to send email messages to external addresses. More information on notifications can be found here.

11) You're Done

Congratulations! You sucessfully installed Nagios. Your journey into monitoring is just beginning.


Example:

Say, If You Nagios Server is 10.14.236.140. You need to monitor the Linux Machine with IP: 10.14.236.70. You need to follow up like this:

[root@irc objects]# pwd
/usr/local/nagios/etc/objects
[root@irc objects]#
[root@irc objects]# ls
commands.cfg localhost.cfg printer.cfg switch.cfg timeperiods.cfg
contacts.cfg localhost.cfg.orig remotehost.cfg templates.cfg windows.cfg
[root@irc objects]#

The File should looks like:


# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the local machine

define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variab les that are defined
; in (or inherited by) the linux-server host t emplate definition.
host_name localhost
alias localhost
address 127.0.0.1
}

define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variab les that are defined
; in (or inherited by) the linux-server host t emplate definition.
host_name ideath.logic.com
alias ideath
address 10.14.236.140
}


###############################################################################
###############################################################################
#
# HOST GROUP DEFINITION
#
###############################################################################
###############################################################################

# Define an optional hostgroup for Linux machines

define hostgroup{
hostgroup_name linux-server ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members localhost ; Comma separated list of hosts that belong to this group
}



###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################


# Define a service to "ping" the local machine

define service{
use local-service ; Name of service template to use
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

define service{
use local-service ; Name of service template to use
host_name ideath.logica.com
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if # < 10% free space on partition. define service{ use local-service ; Name of service template to use host_name localhost service_description Root Partition check_command check_local_disk!20%!10%!/ } define service{ use local-service ; Name of service template to use host_name ideath.logic.com service_description Root Partition check_command check_local_disk!20%!10%!/ } # Define a service to check the number of currently logged in # users on the local machine. Warning if > 20 users, critical
# if > 50 users.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Users
check_command check_local_users!20!50
}

define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description Current Users
check_command check_local_users!20!50
}


# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}


define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
# Define a service to check the load on the local machine.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}

define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}

# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10
}

define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description Swap Usage
check_command check_local_swap!20!10
}

# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0
}

define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description SSH
check_command check_ssh
check_period 24x7
notifications_enabled 0
is_volatile 0
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_options w,c,u,r
notification_interval 960
notification_period 24x7
check_command check_ssh
}



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description HTTP
check_command check_http
notifications_enabled 0
}

define service{
use local-service ; Name of service template to use
host_name ideath.logic.com
service_description HTTP
check_command check_http
notifications_enabled 0
is_volatile 0
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_options w,c,u,r
notification_interval 960
notification_period 24x7
check_command check_http
}


Ideath.logic.com is the hostname of 10.14.236.70.
Do make entry in /etc/hosts if it is unable to resolve the IP(or else check the DNS).