Wednesday, November 16, 2011

Installing and configuring Graphite

Here are some notes I jotted down while installing and configuring Graphite, which isn't a trivial task, although the official documentation isn't too bad. The next step is to turn them into a Chef recipe. These instructions apply to Ubuntu 10.04 32-bit with Python 2.6.5 so YMMV.

Install pre-requisites

# apt-get install python-setuptools
# apt-get install python-memcache python-sqlite
# apt-get install apache2 libapache2-mod-python pkg-config
# easy_install-2.6 django

Install pixman, cairo and pycairo

# wget http://cairographics.org/releases/pixman-0.20.2.tar.gz
# tar xvfz pixman-0.20.2.tar.gz
# cd pixman-0.20.2
# ./configure; make; make install

# wget http://cairographics.org/releases/cairo-1.10.2.tar.gz
# tar xvfz cairo-1.10.2.tar.gz
# cd cairo-1.10.2
# ./configure; make; make install

BTW, the pycairo install was the funkiest I've seen so far for a Python package, and that says a lot:

# wget http://cairographics.org/releases/py2cairo-1.8.10.tar.gz
# tar xvfz py2cairo-1.8.10.tar.gz
# cd pycairo-1.8.10
# ./configure --prefix=/usr
# make; make install
# echo ‘/usr/local/lib’ > /etc/ld.so.conf.d/pycairo.conf
# ldconfig

Install graphite packages (carbon, whisper, graphite webapp)

# wget http://launchpad.net/graphite/0.9/0.9.8/+download/graphite-web-0.9.8.tar.gz
# wget http://launchpad.net/graphite/0.9/0.9.8/+download/carbon-0.9.8.tar.gz
# wget http://launchpad.net/graphite/0.9/0.9.8/+download/whisper-0.9.8.tar.gz

# tar xvfz whisper-0.9.8.tar.gz
# cd whisper-0.9.8
# python setup.py install

# tar xvfz carbon-0.9.8.tar.gz
# cd carbon-0.9.8
# python setup.py install
# cd /opt/graphite/conf
# cp carbon.conf.example carbon.conf
# cp storage-schemas.conf.example storage-schemas.conf

# tar xvfz graphite-web-0.9.8.tar.gz
# cd graphite-web-0.9.8
# python check-dependencies.py
# python setup.py install

Configure Apache virtual host for graphite webapp

Although the Graphite source distribution comes with an example vhost configuration for Apache, it didn't quite work for me. Here's what ended up working -- many thanks to my colleague Marco Garcia for figuring this out.

# cd /etc/apache2/sites-available/
# cat graphite

        
ServerName graphite.mysite.com        
DocumentRoot "/opt/graphite/webapp"        
ErrorLog /opt/graphite/storage/log/webapp/error.log        
CustomLog /opt/graphite/storage/log/webapp/access.log common        
                
SetHandler python-program                
PythonPath "['/opt/graphite/webapp'] + sys.path"                
PythonHandler django.core.handlers.modpython                
SetEnv DJANGO_SETTINGS_MODULE graphite.settings                
PythonDebug Off                
PythonAutoReload Off                

SetHandler None        

SetHandler None
Alias /media/ "/usr/local/lib/python2.6/dist-packages/Django-1.3-py2.6.egg/django/contrib/admin/media/"

# cd /etc/apache2/sites-enabled/
# ln -s ../sites-available/graphite 001-graphite

Make sure mod_python is enabled:

# ls -la /etc/apache2/mods-enabled/python.load

Create Django database for graphite webapp

# cd /opt/graphite/webapp/graphite
# python manage.py syncdb

Apply permissions on storage directory

# chown -R www-data:www-data /opt/graphite/storage/

Restart Apache

# service apache2 restart

Start data collection server (carbon-cache)

# cd /opt/graphite/bin
# ./carbon-cache.py start

At this point, if you go to graphite.mysite.com, you should see the dashboard of the Graphite web app.

Test data collection

The Graphite source distribution comes with an example client written in Python that sends data to the Carbon collecting server every minute. You can find it in graphite-web-0.9.8/examples/example-client.py.

Sending data is very easy -- like we say in Devops, just open a socket!

import sys
import time
import os
import platform
import subprocess
from socket import socket

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003
delay = 60 
if len(sys.argv) > 1:  
    delay = int( sys.argv[1] )

def get_loadavg():    
    # For more details, "man proc" and "man uptime"      
        if platform.system() == "Linux":
            return open('/proc/loadavg').read().strip().split()[:3]    
        else:
            command = "uptime"
            process = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)                      
            os.waitpid(process.pid, 0)
            output = process.stdout.read().replace(',', ' ').strip().split()          
            length = len(output)
            return output[length - 3:length]


sock = socket()
try:
    sock.connect((CARBON_SERVER,CARBON_PORT))
except:
    print "Couldn't connect to %(server)s on port %(port)d" % {'server':CARBON_SERVER, 'port':CARBON_PORT}
    sys.exit(1)

while True:
    now = int( time.time() )
    lines = []
    # We're gonna report all three loadavg values
    loadavg = get_loadavg()
    lines.append("system.loadavg_1min %s %d" % (loadavg[0],now))    
    lines.append("system.loadavg_5min %s %d" % (loadavg[1],now))    
    lines.append("system.loadavg_15min %s %d" % (loadavg[2],now))    

    message = '\n'.join(lines) + '\n' 
    #all lines must end in a newline
    print "sending message\n"
    print '-' * 80
    print message
    print
    sock.sendall(message)
    time.sleep(delay)

Some observations about the above code snippet:

the format of a message to be sent to a Graphite/Carbon server is very simple: "metric_path value timestamp\n"
metric_path is a completely arbitrary name -- it is a string containing substrings delimited by dots. Think of it as an SNMP OID, where the most general name is at the left and the most specific is at the right

in the example above, the 3 metric_path strings are system.loadavg_1min, system.loadavg_5min and system.loadavg_15min

Establish retention policies

This is explained very well in the 'Getting your data into Graphite' portion of the docs. What you want to do is to specify a retention configuration for each set of metrics that you send to Graphite. This is accomplished by editing the /opt/graphite/storage/schemas file. For the example above which send the load average for 1, 5 and 15 min to Graphite every minute, we can specify the following retention policy:

[loadavg]

priority = 100

pattern = ^system\.loadavg*

retentions = 60:43200,900:350400

This tells graphite that all metric_paths starting with system.loadavg should be stored with a retention policy that keeps per minute (60 seconds) precision data for 30 days(43,200 seconds), and per-15 min (900 sec) precision data for 10 years (350,400 seconds).

Go wild with stats!

At this point, if you run the example client, you should be able to go to the Graphite dashboard and expand the Graphite->system path and see the 3 metrics being captured: loadavg_1min, loadavg_5min and loadavg_15min. Clicking on each one will populate the graph with the corresponding data line. If you're logged in into the dashboard, you can also save a given graph.

The sky is the limit at this point in terms of the data you can capture and visualize with Graphite. As an example, I parse a common maillog file that captures all email sent out through our system. I 'tail' the file every minute and I count how many message were sent out total, and per mail server in our mail cluster. I send this data to Graphite and I watch it in near-realtime (the retention policy in my case is similar to the loadavg one above).

Here's how the Graphite graph looks like:

FAI - Fully Automatic Installation

It's a tool for unattended mass deployment of Linux. You can take one or more virgin PC's, turn on the power, and after a few minutes, the systems are installed, and completely configured to your exact needs, without any interaction necessary.

Features

Installs and updates Debian, Ubuntu, CentOS, RHEL, SuSe, ...
Centralized deployment and configuration management
Installs XEN domains, VirtualBox and Vserver
Easy set up of software RAID and LVM
Full remote control via ssh during installation
Integrated disaster recovery system
Every stage can be customized via hooks

What is FAI? Main Features

A tool for automated unattended installation. Lazy system administrators like it.
Remote network installation of different Linux flavors
Easy-to-use centralized management system for your Linux deployment.
It's fast. It only takes a few minutes for a complete installation.
Scalable. FAI users manage their computer infrastructures starting from a few computers up to several thousands of machines.
Different hardware and different configuration requirements are easy to establish using FAI. You do not need to repeat information that is shared among several machines.
Using the FAI class concept, you can group a bunch of similar machines.
Installation targets: desktops, servers, notebooks, Beowulf cluster, rendering or web server farm, Linux laboratory or classroom.
Linux rollout, mass installation and automated server provisioning are additional topics of FAI.
FAI is lightweight. No special daemons are running, no database setup is needed. It's architecture independent, since it consists only of shell, Perl and Cfengine scripts.
Besides initial installations, it is used for daily maintenance, and can set up chroot environments.
Compared to tools like kickstart or cobbler for Red Hat, autoyast for SUSE or Jumpstart for SUN Solaris, FAI is much more flexible. You can tune every small part of your configuration to your local needs using hooks.
More technical information in the flyer and poster

FAI Installation Steps

Network boot via PXE
Receive configuration data via HTTP, NFS, svn or git
Run scripts to determine FAI classes and variables
Partition local hard disks and create RAID, LVM configuration and the file systems
Install and configure software packages
Customize OS and software to your local needs
Reboot freshly installed machine

http://fai-project.org/features/

Top 5 Open Source Linux Server Provisioning Software

Server provisioning is nothing but load the Linux or UNIX like operating systems automatically with actual operating systems, device drivers, data, and make a server ready for network operation without any user input. Typically you select a server from a pool of available servers, load the operating systems (such as RHEL, Fedora, FreeBSD, Debian), and finally customize storage, network (IP, gateway, bounding etc), drivers, applications, users etc. Using the following tools you can perform automated unattended operating system installation, configuration, set virtual machines and much more. These software can be used to install a lot (say thousands) of Linux and UNIX systems at the same time.

Kickstart

From the official Redhat guide:

Many system administrators would prefer to use an automated installation method to install Red Hat / CentOS / Fedora Linux on their machines. To answer this need, Red Hat created the kickstart installation method. Using kickstart, a system administrator can create a single file containing the answers to all the questions that would normally be asked during a typical Red Hat Linux installation. Kickstart provides a way for users to automate a Red Hat Enterprise Linux installation.

Kickstart Configurator allows you to create or modify a kickstart file using a graphical user interface, so that you do not have to remember the correct syntax of the file.

Fig.01: RHEL - Kickstart Configurator

Kickstart Installations Guide
KVM: Install CentOS / RHEL Using Kickstart File (Automated Installation)

Fully Automatic Installation (FAI)

FAI is a non-interactive system to install, customize and manage Linux systems and software configurations on computers as well as virtual machines and chroot environments, from small networks to large-scale infrastructures and clusters. It is a tool for fully automatic installation of Debian and other Linux Distributions such as Suse, Redhat, Solaris via network, custom install cd, or into a chroot environment. Some people also use it to install Windows.

FAI Features

Installs and updates Debian, Ubuntu, SuSe, RHEL, CentOS, Fedora, Mandriva, Solaris, etc
Centralized deployment and configuration management
Integrated disaster recovery system
Easy set up of software RAID and LVM
Installs XEN domains, VirtualBox and Vserve
Every stage can be customized via hooks
Full remote control via ssh during installation

See the official project website and wiki for more information.

Cobbler

Cobbler is a Linux provisioning server that centralizes and simplifies control of services including DHCP, TFTP, and DNS for the purpose of performing network-based operating systems installs. It can be configured for PXE, reinstallations, and virtualized guests using Xen, KVM or VMware. Again it is mainly used by Redhat and friends, but you can configure a PXE server to boot various non-RPM boot images such as Knoppix and other flavors of Debian such as Ubuntu.

There is also a lightweight built-in configuration management system, as well as support for integrating with configuration management systems like Puppet. Cobbler has a command line interface, a web interface, and also several API access options.

Fig.02: Cobbler WebUI (image credit: Fedora project)

See the official Cobbler project home page and wiki for more information.

Spacewalk

From the official website:

Spacewalk is an open source (GPLv2) Linux systems management solution. It is the upstream community project from which the Red Hat Network Satellite product is derived. Spacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.

Features:

Inventory your systems (hardware and software information)
Install and update software on your systems
Collect and distribute your custom software packages into manageable groups
Provision (kickstart) your systems
Manage and deploy configuration files to your systems
Monitor your systems
Provision and start/stop/configure virtual guests
Distribute content across multiple geographical sites in an efficient manner.

Fig.03: Spacewalk Server Provisioning System

See the official project website for more information.

OpenQRM

From the official website:

openQRM is the next generation, open-source Data-center management platform. Its fully pluggable architecture focuses on automatic, rapid- and appliance-based deployment, monitoring, high-availability, cloud computing and especially on supporting and conforming multiple virtualization technologies. openQRM is a single-management console for the complete IT-infra structure and provides a well defined API which can be used to integrate third-party tools as additional plugins.

Features

Complete separation of "hardware" (physical servers and virtual machines) from "software" (server-images)
Support for different virtualization technologies
Fully automatic Nagios configuration (single click) to monitor all systems and services
High-availability : "N to 1" fail-over
Integrated storage management
Distribution support - openQRM 4.x comes with a solid support for different linux distribution like Debian, Ubuntu, CentOS and openSuse. A single openQRM server can manage the provisioning of servers from those different linux distributions seamlessly.

Fig.04: OpenQRM Dashboard (image credit: OpenQRM project)

See the official project website for more information.

DIY: Provisioning Server

You can build your own server using PXE, TFTP server, and DHCP software. PXE allows you to boot up a system and have it automatically get an IP address via DHCP and start booting a kernel over the network. See the following articles for more information:

Conclusion

There are many proprietary software solutions available to automate the provisioning of servers, services and end-user devices from vendors such as BladeLogic, IBM, or HP. But open source software gives you more freedom to automate the installation of the Linux server. Some of the above software support UNIX and Windows operating systems too.

I'm wondering if you use Server Provisioning Software regularly. Drop your discussion below and share what works for you in the comments.