Wednesday, November 16, 2011

Installing and configuring Graphite

Here are some notes I jotted down while installing and configuring Graphite, which isn't a trivial task, although the official documentation isn't too bad. The next step is to turn them into a Chef recipe. These instructions apply to Ubuntu 10.04 32-bit with Python 2.6.5 so YMMV.

Install pre-requisites

# apt-get install python-setuptools
# apt-get install python-memcache python-sqlite
# apt-get install apache2 libapache2-mod-python pkg-config
# easy_install-2.6 django

Install pixman, cairo and pycairo

# wget http://cairographics.org/releases/pixman-0.20.2.tar.gz
# tar xvfz pixman-0.20.2.tar.gz
# cd pixman-0.20.2
# ./configure; make; make install

# wget http://cairographics.org/releases/cairo-1.10.2.tar.gz
# tar xvfz cairo-1.10.2.tar.gz
# cd cairo-1.10.2
# ./configure; make; make install

BTW, the pycairo install was the funkiest I've seen so far for a Python package, and that says a lot:

# wget http://cairographics.org/releases/py2cairo-1.8.10.tar.gz
# tar xvfz py2cairo-1.8.10.tar.gz
# cd pycairo-1.8.10
# ./configure --prefix=/usr
# make; make install
# echo ‘/usr/local/lib’ > /etc/ld.so.conf.d/pycairo.conf
# ldconfig

Install graphite packages (carbon, whisper, graphite webapp)

# wget http://launchpad.net/graphite/0.9/0.9.8/+download/graphite-web-0.9.8.tar.gz
# wget http://launchpad.net/graphite/0.9/0.9.8/+download/carbon-0.9.8.tar.gz
# wget http://launchpad.net/graphite/0.9/0.9.8/+download/whisper-0.9.8.tar.gz

# tar xvfz whisper-0.9.8.tar.gz
# cd whisper-0.9.8
# python setup.py install

# tar xvfz carbon-0.9.8.tar.gz
# cd carbon-0.9.8
# python setup.py install
# cd /opt/graphite/conf
# cp carbon.conf.example carbon.conf
# cp storage-schemas.conf.example storage-schemas.conf

# tar xvfz graphite-web-0.9.8.tar.gz
# cd graphite-web-0.9.8
# python check-dependencies.py
# python setup.py install

Configure Apache virtual host for graphite webapp

Although the Graphite source distribution comes with an example vhost configuration for Apache, it didn't quite work for me. Here's what ended up working -- many thanks to my colleague Marco Garcia for figuring this out.
# cd /etc/apache2/sites-available/
# cat graphite
        
ServerName graphite.mysite.com        
DocumentRoot "/opt/graphite/webapp"        
ErrorLog /opt/graphite/storage/log/webapp/error.log        
CustomLog /opt/graphite/storage/log/webapp/access.log common        
                
SetHandler python-program                
PythonPath "['/opt/graphite/webapp'] + sys.path"                
PythonHandler django.core.handlers.modpython                
SetEnv DJANGO_SETTINGS_MODULE graphite.settings                
PythonDebug Off                
PythonAutoReload Off                

SetHandler None        

SetHandler None
Alias /media/ "/usr/local/lib/python2.6/dist-packages/Django-1.3-py2.6.egg/django/contrib/admin/media/"

# cd /etc/apache2/sites-enabled/
# ln -s ../sites-available/graphite 001-graphite

Make sure mod_python is enabled:

# ls -la /etc/apache2/mods-enabled/python.load

Create Django database for graphite webapp

# cd /opt/graphite/webapp/graphite
# python manage.py syncdb

Apply permissions on storage directory

# chown -R www-data:www-data /opt/graphite/storage/

Restart Apache

# service apache2 restart

Start data collection server (carbon-cache)

# cd /opt/graphite/bin
# ./carbon-cache.py start

At this point, if you go to graphite.mysite.com, you should see the dashboard of the Graphite web app.

Test data collection

The Graphite source distribution comes with an example client written in Python that sends data to the Carbon collecting server every minute. You can find it in graphite-web-0.9.8/examples/example-client.py.

Sending data is very easy -- like we say in Devops, just open a socket!

import sys
import time
import os
import platform
import subprocess
from socket import socket

CARBON_SERVER = '127.0.0.1'
CARBON_PORT = 2003
delay = 60 
if len(sys.argv) > 1:  
    delay = int( sys.argv[1] )

def get_loadavg():    
    # For more details, "man proc" and "man uptime"      
        if platform.system() == "Linux":
            return open('/proc/loadavg').read().strip().split()[:3]    
        else:
            command = "uptime"
            process = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)                      
            os.waitpid(process.pid, 0)
            output = process.stdout.read().replace(',', ' ').strip().split()          
            length = len(output)
            return output[length - 3:length]


sock = socket()
try:
    sock.connect((CARBON_SERVER,CARBON_PORT))
except:
    print "Couldn't connect to %(server)s on port %(port)d" % {'server':CARBON_SERVER, 'port':CARBON_PORT}
    sys.exit(1)

while True:
    now = int( time.time() )
    lines = []
    # We're gonna report all three loadavg values
    loadavg = get_loadavg()
    lines.append("system.loadavg_1min %s %d" % (loadavg[0],now))    
    lines.append("system.loadavg_5min %s %d" % (loadavg[1],now))    
    lines.append("system.loadavg_15min %s %d" % (loadavg[2],now))    

    message = '\n'.join(lines) + '\n' 
    #all lines must end in a newline
    print "sending message\n"
    print '-' * 80
    print message
    print
    sock.sendall(message)
    time.sleep(delay)


Some observations about the above code snippet:
  • the format of a message to be sent to a Graphite/Carbon server is very simple: "metric_path value timestamp\n"
  • metric_path is a completely arbitrary name -- it is a string containing substrings delimited by dots. Think of it as an SNMP OID, where the most general name is at the left and the most specific is at the right
    • in the example above, the 3 metric_path strings are system.loadavg_1min, system.loadavg_5min and system.loadavg_15min
Establish retention policies

This is explained very well in the 'Getting your data into Graphite' portion of the docs. What you want to do is to specify a retention configuration for each set of metrics that you send to Graphite. This is accomplished by editing the /opt/graphite/storage/schemas file. For the example above which send the load average for 1, 5 and 15 min to Graphite every minute, we can specify the following retention policy:

[loadavg]
priority = 100
pattern = ^system\.loadavg*
retentions = 60:43200,900:350400

This tells graphite that all metric_paths starting with system.loadavg should be stored with a retention policy that keeps per minute (60 seconds) precision data for 30 days(43,200 seconds), and per-15 min (900 sec) precision data for 10 years (350,400 seconds).

Go wild with stats!

At this point, if you run the example client, you should be able to go to the Graphite dashboard and expand the Graphite->system path and see the 3 metrics being captured: loadavg_1min, loadavg_5min and loadavg_15min. Clicking on each one will populate the graph with the corresponding data line. If you're logged in into the dashboard, you can also save a given graph.

The sky is the limit at this point in terms of the data you can capture and visualize with Graphite. As an example, I parse a common maillog file that captures all email sent out through our system. I 'tail' the file every minute and I count how many message were sent out total, and per mail server in our mail cluster. I send this data to Graphite and I watch it in near-realtime (the retention policy in my case is similar to the loadavg one above).

Here's how the Graphite graph looks like:



Chef


Chef is an open-source systems integration framework built specifically for automating the cloud. No matter how complex the realities of your business, Chef makes it easy to deploy servers and scale applications throughout your entire infrastructure.


Chef installation and minimal configuration


I started to play with Chef the other day. The instructions on the wiki are a bit confusing, but help on twitter (thanks @jtimberman) and on the #chef IRC channel (thanks @kallistec) has been great. I am at the very minimal stage of having a chef client talking to a chef server. I hasten to write down what I've done so far, both for my own sake and for others who might want to do the same. My OS is Ubuntu 10.04 32-bit on both machines.

First of all: as the chef wiki says, make sure you have FQDNs correctly set up on both client and server, and that they can ping each other at a minimum using the FQDN. I added the FQDN to the local IP address line in /etc/hosts, so that 'hostname -f' returned the FQDN correctly. In what follows, my Chef server machine is called chef.example.com and my Chef client machine is called client.example.com.

Installing the Chef server


Here I went the Ruby Gems route, because the very latest Chef (0.9.4) had not been captured in the Ubuntu packages yet when I tried to install it.

a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef server by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:


file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"

d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::server]" ]
}

d3) run chef-solo to bootstrap the Chef server install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

e) create an initial admin client with the Knife utility, to interact with the API

#knife configure -i
Where should I put the config file? [~/.chef/knife.rb]
Please enter the chef server URL: [http://localhost:4000] http://chef.example.com
Please enter a clientname for the new client: [root]
Please enter the existing admin clientname: [chef-webui]
Please enter the location of the existing admin client's private key: [/etc/chef/webui.pem]
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem]
Please enter the path to a chef repository (or leave blank): 

f) create an intial Chef repository

I created a directory called /srv/chef/repos , cd-ed to it and ran this command:

# git clone git://github.com/opscode/chef-repo.git

At this point, you should have a functional Chef server, although it won't help you much unless you configure some clients.

Installing a Chef client

Here's the bare minimum I did to get a Chef client to just talk to the Chef server configured above, without actually performing any cookbook recipe yet (I leave that for another post).

The first steps are very similar to the ones I followed when I installed the Chef server.


a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef client by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:

file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"
Caveat for this stepCaveat for this step
d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::client]" ]
}

Note that the only difference so far between the Chef server and the Chef client bootstrap files is the one directive at the end of chef.json, which is bootstrap::server for the server and bootstrap::client for the client.

Caveat for this step: if you mess up and bootstrap the client using the wrong chef.json file containing the bootstrap::server directive, you will end up with a server and not a client. I speak from experience -- I did exactly this, then when I tried to run chef-client on this box, I got:

WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!

/usr/lib/ruby/1.8/net/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)

d3) run chef-solo to bootstrap the Chef client install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

At this point, you should have a file called client.rb in /etc/chef on your client machine, with contents similar to:

#
# Chef Client Config File
#
# Dynamically generated by Chef - local modifications will be replaced
#

log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.example.com:4000"

validation_client_name "chef-validator"
validation_key   "/etc/chef/validation.pem"
client_key   "/etc/chef/client.pem"

file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"

Mixlib::Log::Formatter.show_time = false
Caveat for this stepCaveat for this stepCaveat for this step
e) validate the client against the server

e1) copy /etc/chef/validation.pem from the server to /etc/chef on the client
e2) run chef-client on the client; for debug purposes you can use:

# chef-client -l debug

If everything goes well, you should see a message of the type:

# chef-client
INFO: Starting Chef Run
INFO: Client key /etc/chef/client.pem is not present - registering
WARN: Node client.example.com has an empty run list.
INFO: Chef Run complete in 1.209376 sec WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!
31 < ggheo > 30 /usr/lib/ruby/1.8/nCaveat for this stepet/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)onds
INFO: Running report handlers
INFO: Report handlers complete

You should also have a file called client.pem containing a private key that the client will be using when talking to the server. At this point, you should remove validation.pem from /etc/chef on the client, as it is not needed any more.

You can also run this command on the server to see if the client got registered with it:

# knife client list -c /etc/chef/knife.rb

The output should be something like:

[
"chef-validator",
"chef-webui",
"chef.example.com",
"root",
"client.example.com"
]

That's it for now. As I warned you, nothing exciting happened here except for having a Chef client that talks to a server but doesn't actually DO anything. Stay tuned for more installments in my continuing chef saga though...

FAI - Fully Automatic Installation

FAI is a non-interactive system to install, customize and manage Linux systems and software configurations on computers as well as virtual machines and chroot environments, from small networks to large-scale infrastructures and clusters.
It's a tool for unattended mass deployment of Linux. You can take one or more virgin PC's, turn on the power, and after a few minutes, the systems are installed, and completely configured to your exact needs, without any interaction necessary.

Features

  • Installs and updates Debian, Ubuntu, CentOS, RHEL, SuSe, ...
  • Centralized deployment and configuration management
  • Installs XEN domains, VirtualBox and Vserver
  • Easy set up of software RAID and LVM
  • Full remote control via ssh during installation
  • Integrated disaster recovery system
  • Every stage can be customized via hooks

What is FAI? Main Features

  • A tool for automated unattended installation. Lazy system administrators like it.
  • Remote network installation of different Linux flavors
  • Easy-to-use centralized management system for your Linux deployment.
  • It's fast. It only takes a few minutes for a complete installation.
  • Scalable. FAI users manage their computer infrastructures starting from a few computers up to several thousands of machines.
  • Different hardware and different configuration requirements are easy to establish using FAI. You do not need to repeat information that is shared among several machines.
  • Using the FAI class concept, you can group a bunch of similar machines.
  • Installation targets: desktops, servers, notebooks, Beowulf cluster, rendering or web server farm, Linux laboratory or classroom.
  • Linux rollout, mass installation and automated server provisioning are additional topics of FAI.
  • FAI is lightweight. No special daemons are running, no database setup is needed. It's architecture independent, since it consists only of shell, Perl and Cfengine scripts.
  • Besides initial installations, it is used for daily maintenance, and can set up chroot environments.
  • Compared to tools like kickstart or cobbler for Red Hat, autoyast for SUSE or Jumpstart for SUN Solaris, FAI is much more flexible. You can tune every small part of your configuration to your local needs using hooks.
  • More technical information in the flyer and poster

FAI Installation Steps

  • Network boot via PXE
  • Receive configuration data via HTTP, NFS, svn or git
  • Run scripts to determine FAI classes and variables
  • Partition local hard disks and create RAID, LVM configuration and the file systems
  • Install and configure software packages
  • Customize OS and software to your local needs
  • Reboot freshly installed machine

Top 5 Open Source Linux Server Provisioning Software



Server provisioning is nothing but load the Linux or UNIX like operating systems automatically with actual operating systems, device drivers, data, and make a server ready for network operation without any user input. Typically you select a server from a pool of available servers, load the operating systems (such as RHEL, Fedora, FreeBSD, Debian), and finally customize storage, network (IP, gateway, bounding etc), drivers, applications, users etc. Using the following tools you can perform automated unattended operating system installation, configuration, set virtual machines and much more. These software can be used to install a lot (say thousands) of Linux and UNIX systems at the same time.

Kickstart

From the official Redhat guide:
Many system administrators would prefer to use an automated installation method to install Red Hat / CentOS / Fedora Linux on their machines. To answer this need, Red Hat created the kickstart installation method. Using kickstart, a system administrator can create a single file containing the answers to all the questions that would normally be asked during a typical Red Hat Linux installation. Kickstart provides a way for users to automate a Red Hat Enterprise Linux installation.
Kickstart Configurator allows you to create or modify a kickstart file using a graphical user interface, so that you do not have to remember the correct syntax of the file.
Fig.01: RHEL - Kickstart Configurator
Fig.01: RHEL - Kickstart Configurator

Fully Automatic Installation (FAI)

FAI is a non-interactive system to install, customize and manage Linux systems and software configurations on computers as well as virtual machines and chroot environments, from small networks to large-scale infrastructures and clusters. It is a tool for fully automatic installation of Debian and other Linux Distributions such as Suse, Redhat, Solaris via network, custom install cd, or into a chroot environment. Some people also use it to install Windows.

FAI Features

  1. Installs and updates Debian, Ubuntu, SuSe, RHEL, CentOS, Fedora, Mandriva, Solaris, etc
  2. Centralized deployment and configuration management
  3. Integrated disaster recovery system
  4. Easy set up of software RAID and LVM
  5. Installs XEN domains, VirtualBox and Vserve
  6. Every stage can be customized via hooks
  7. Full remote control via ssh during installation
See the official project website and wiki for more information.

Cobbler

Cobbler is a Linux provisioning server that centralizes and simplifies control of services including DHCP, TFTP, and DNS for the purpose of performing network-based operating systems installs. It can be configured for PXE, reinstallations, and virtualized guests using Xen, KVM or VMware. Again it is mainly used by Redhat and friends, but you can configure a PXE server to boot various non-RPM boot images such as Knoppix and other flavors of Debian such as Ubuntu.
There is also a lightweight built-in configuration management system, as well as support for integrating with configuration management systems like Puppet. Cobbler has a command line interface, a web interface, and also several API access options.
Fig.02: Cobbler WebUI (image credit: Fedora project)
Fig.02: Cobbler WebUI (image credit: Fedora project)
See the official Cobbler project home page and wiki for more information.

Spacewalk

From the official website:
Spacewalk is an open source (GPLv2) Linux systems management solution. It is the upstream community project from which the Red Hat Network Satellite product is derived. Spacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.

Features:

  1. Inventory your systems (hardware and software information)
  2. Install and update software on your systems
  3. Collect and distribute your custom software packages into manageable groups
  4. Provision (kickstart) your systems
  5. Manage and deploy configuration files to your systems
  6. Monitor your systems
  7. Provision and start/stop/configure virtual guests
  8. Distribute content across multiple geographical sites in an efficient manner.
Fig.03: Spacewalk Server Provisioning System
Fig.03: Spacewalk Server Provisioning System
See the official project website for more information.

OpenQRM

From the official website:
openQRM is the next generation, open-source Data-center management platform. Its fully pluggable architecture focuses on automatic, rapid- and appliance-based deployment, monitoring, high-availability, cloud computing and especially on supporting and conforming multiple virtualization technologies. openQRM is a single-management console for the complete IT-infra structure and provides a well defined API which can be used to integrate third-party tools as additional plugins.

Features

  1. Complete separation of "hardware" (physical servers and virtual machines) from "software" (server-images)
    Support for different virtualization technologies
  2. Fully automatic Nagios configuration (single click) to monitor all systems and services
  3. High-availability : "N to 1" fail-over
  4. Integrated storage management
  5. Distribution support - openQRM 4.x comes with a solid support for different linux distribution like Debian, Ubuntu, CentOS and openSuse. A single openQRM server can manage the provisioning of servers from those different linux distributions seamlessly.
Fig.04: OpenQRM Dashboard
Fig.04: OpenQRM Dashboard (image credit: OpenQRM project)
See the official project website for more information.

DIY: Provisioning Server

You can build your own server using PXE, TFTP server, and DHCP software. PXE allows you to boot up a system and have it automatically get an IP address via DHCP and start booting a kernel over the network. See the following articles for more information:

Conclusion

There are many proprietary software solutions available to automate the provisioning of servers, services and end-user devices from vendors such as BladeLogic, IBM, or HP. But open source software gives you more freedom to automate the installation of the Linux server. Some of the above software support UNIX and Windows operating systems too.
I'm wondering if you use Server Provisioning Software regularly. Drop your discussion below and share what works for you in the comments.