Thursday, July 9, 2015

How to Set Up Hadoop Multi-Node Cluster on CentOS 7/6

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Our earlier article about hadoop was describing to how to setup single node cluster. This article will help you for step by step installing and configuring Hadoop Multi-Node Cluster on CentOS/RHEL 6.

Setup Details:

Hadoop Master: ( hadoop-master )
Hadoop Slave : ( hadoop-slave-1 )
Hadoop Slave : ( hadoop-slave-2 )

Step 1. Install Java

Before installing hadoop make sure you have java installed on all nodes of hadoop cluster systems.
# java -version

java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
If you do not have java installed use following article to install Java.

Step 2. Create User Account

Create a system user account on both master and slave systems to use for hadoop installation
# useradd hadoop
# passwd hadoop
Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Step 3: Add FQDN Mapping

Edit /etc/hosts file on all master and slave servers and add following entries.
# vim /etc/hosts hadoop-master hadoop-slave-1 hadoop-slave-2

Step 4. Configuring Key Based Login

It’s required to set up hadoop user to ssh itself without password. Use following commands to configure auto login between all hadoop cluster servers..
# su - hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-master
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-slave-2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit

Step 5. Download and Extract Hadoop Source

Download hadoop latest available version from its official site at hadoop-master server only.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/

Step 6: Configure Hadoop

First edit hadoop configuration files and make following changes.
6.1 Edit core-site.xml
# vim conf/core-site.xml
#Add the following inside the configuration tag


6.2 Edit hdfs-site.xml
# vim conf/hdfs-site.xml
# Add the following inside the configuration tag


6.3 Edit mapred-site.xml
# vim conf/mapred-site.xml
# Add the following inside the configuration tag


6.4 Edit
# vim conf/
export JAVA_HOME=/opt/jdk1.7.0_75
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Set JAVA_HOME path as per your system configuration for java.

Step 7: Copy Hadoop Source to Slave Servers

After updating above configuration, we need to copy the source files to all slaves servers.
# su - hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave-1:/opt/hadoop
$ scp -r hadoop hadoop-slave-2:/opt/hadoop

Step 8: Configure Hadoop on Master Server Only

Go to hadoop source folder on hadoop-master and do following settings.
# su - hadoop
$ cd /opt/hadoop/hadoop
$ vim conf/masters

$ vim conf/slaves

Format Name Node on Hadoop Master only
# su - hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode -format
13/07/13 10:58:07 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.0
STARTUP_MSG:   build = -r 1479473; compiled by 'hortonfo' on Mon May  6 06:59:37 UTC 2013
STARTUP_MSG:   java = 1.7.0_25
13/07/13 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap
13/07/13 10:58:08 INFO util.GSet: VM type       = 32-bit
13/07/13 10:58:08 INFO util.GSet: 2.0% max memory = 1013645312
13/07/13 10:58:08 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/07/13 10:58:08 INFO util.GSet: recommended=4194304, actual=4194304
13/07/13 10:58:08 INFO namenode.FSNamesystem: fsOwner=hadoop
13/07/13 10:58:08 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/13 10:58:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/13 10:58:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/13 10:58:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/13 10:58:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
13/07/13 10:58:08 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/13 10:58:08 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/13 10:58:08 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted.
13/07/13 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/

Step 9: Start Hadoop Services

Use the following command to start all hadoop services on Hadoop-Master
$ bin/

How to Setup MySQL (Master-Slave) Replication in RHEL, CentOS, Fedora

The following tutorial aims to provide you a simple step-by-step guide for setting up MySQL (Master-Slave)Replication in RHEL 6.3/6.2/6.1/6/5.8CentOS 6.3/6.2/6.1/6/5.8 and Fedora 17,16,15,14,13,12 using latestMySQL version. This guide is specially written for CentOS 6.3 Operating System, but also work with older version of Linux distributions with MySQL 5.x.
mysql replication in Linux
MySQL Master-Slave Replication in RedHat / CentOS / Fedora
The MySQL Replication is very useful in terms of Data SecurityFail-over SolutionDatabase Backup from SlaveAnalytics etc. We use the following things to carry the replication process. In your scenario it would be different.
  1. Working Linux OS like CentOS 6.3RedHat 6.3 or Fedora 17
  2. Master and Slave are CentOS 6.3 Linux Servers.
  3. Master IP Address is:
  4. Slave IP Address is:
  5. Master and Slave are on the same LAN network.
  6. Master and Slave has MySQL version installed.
  7. Master allow remote MySQL connections on port 3306.
We have two servers, one is Master with IP ( and other is Slave as ( We have divided the setup process in two phases to make things easier for you, In Phase I we will configure Master server and inPhase II with Slave server. Let’s start the replication setup process.

Phase I: Configure Master Server ( for Replication

In Phase I, we will see the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Master Server
First, proceed with MySQL installation using YUM command. If you already have MySQL installation, you can skip this step.
# yum install mysql-server mysql
Configure a MySQL in Master Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace tecmint with database name that you would like to replicate on Slave.
server-id = 1
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/
relay-log-info-file = /var/lib/mysql/
log-bin = /var/lib/mysql/mysql-bin
Restart the MySQL service.
# /etc/init.d/mysqld restart
Login into MySQL as root user and create the slave user and grant privileges for replication. Replace slave_userwith user and your_password with password.
# mysql -u root -p
mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'%' IDENTIFIED BY 'your_password';

| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
| mysql-bin.000003 | 11128001 | tecmint   |                  |
1 row in set (0.00 sec)

mysql> quit;
Please write down the File (mysql-bin.000003) and Position (11128001) numbers, we required these numbers later on Slave server. Next apply READ LOCK to databases to export all the database and master database information with mysqldump command.
#  mysqldump -u root -p --all-databases --master-data > /root/dbdump.db
Once you’ve dump all the databases, now again connect to mysql as root user and unlcok tables.
mysql> quit;
Upload the database dump file on Slave Server ( using SCP command.
scp /root/dbdump.db root@
That’s it we have successfully configured Master server, let’s proceed to Phase II section.

Phase II: Configure Slave Server ( for Replication

In Phase II, we do the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Slave Server
If you don’t have MySQL installed, then install it using YUM command.
# yum install mysql-server mysql
Configure a MySQL in Slave Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace IP address of Master server,tecmint with database name etc, that you would like to replicate with Master.
server-id = 2
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/
relay-log-info-file = /var/lib/mysql/
log-bin = /var/lib/mysql/mysql-bin
Now import the dump file that we exported in earlier command and restart the MySQL service.
# mysql -u root -p < /root/dbdump.db
# /etc/init.d/mysqld restart
Login into MySQL as root user and stop the slave. Then tell the slave to where to look for Master log file, that we have write down on master with SHOW MASTER STATUS; command as File (mysql-bin.000003) and Position (11128001) numbers. You must change to the IP address of the Master Server, and change the user and password accordingly.
# mysql -u root -p
mysql> slave stop;
mysql> CHANGE MASTER TO MASTER_HOST='', MASTER_USER='slave_user', MASTER_PASSWORD='yourpassword', MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=11128001;
mysql> slave start;
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: slave_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 12345100
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 11381900
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: tecmint
                   Last_Errno: 0
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 12345100
              Relay_Log_Space: 11382055
              Until_Condition: None
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
               Last_SQL_Errno: 0
1 row in set (0.00 sec)

Verifying MySQL Replication on Master and Slave Server

It's really very important to know that the replication is working perfectly. On Master server create table and insert some values in it.
On Master Server
mysql> create database tecmint;
mysql> use tecmint;
mysql> CREATE TABLE employee (c int);
mysql> INSERT INTO employee (c) VALUES (1);
mysql> SELECT * FROM employee;
|  c  |
|  1  |
1 row in set (0.00 sec)
On Slave Server
Verifying the SLAVE, by running the same command, it will return the same values in the slave too.
mysql> use tecmint;
mysql> SELECT * FROM employee;
|  c  |
|  1  |
1 row in set (0.00 sec)
That's it, finally you've configured MySQL Replication in a few simple steps. More information can be found atMySQL Replication Guide.

How to Setup High-Availability Load Balancer with ‘HAProxy’ to Control Web Server Traffic

HAProxy stands for High Availability proxy. It is a Free and open source application written in C programming Language. HAProxy application is used as TCP/HTTP Load Balancer and for proxy Solutions. The most common use of the HAProxy application is to distribute the workload across multiple servers e.g., web server, database server, etc thus improving the overall performance and reliability of server environment.
The highly efficient and fast application is used by many of the world’s reputed organization which includes but not limited to – Twitter, Reddit, GitHub and Amazon. It is available for Linux, BSD, Solaris and AIX platform.
Install HAProxy in Linux
Install HAProxy Load Balancer in Linux
In this tutorial, we will discuss the process of setting up a high availability load balancer using HAProxy to control the traffic of HTTP-based applications (web servers) by separating requests across multiple servers.
For this article, we’re using the most recent stable release of HAProxy version i.e. 1.5.10 released on December 31st 2014. And also we’re using CentOS 6.5 for this setup, but the below given instructions also works on CentOS/RHEL/Fedora and Ubuntu/Debian distributions.

My Environment Setup

Here our load-balancer HAProxy server having hostname as with IP address192.168.0.125.
HAProxy Server Setup
Operating System : CentOS 6.5
IP Address  :
Hostname  :
Client Web Servers Setup
The other four machines are up and running with web servers such as Apache.
Web Server #1 : CentOS 6.5 [IP:] - [hostname:]
Web Server #2 : CentOS 6.5 [IP:] - [hostname:]
Web Server #3 : CentOS 6.5 [IP:] - [hostname:]
Web Server #4 : CentOS 6.5 [IP:] - [hostname:]

Step 1: Installing Apache on Client Machines

1. First we have to install Apache in all four server’s and share any one of site, for installing Apache in all four server’s here we going to use following command.
# yum install httpd  [On RedHat based Systems]
# apt-get install apache2 [On Debian based Systems]
2. After installing Apache web server on all four client machines, you can verify anyone of the server whether Apache is running by accessing it via IP address in browser.
Check Apache Status
Check Apache Status

Step 2: Installing HAProxy Server

3. In most of the today’s modern Linux distributions, HAPRoxy can be easily installed from the default base repository using default package manager yum or apt-get.
For example, to install HAProxy on RHEL/CentOS/Fedora and Debian/Ubuntu versions, run the following command. Here I’ve included openssl package too, because we’re going to setup HAProxy with SSL and NON-SSL support.
# yum install haproxy openssl-devel [On RedHat based Systems]
# apt-get install haproxy  [On Debian based Systems]
Note: On Debian Whezzy 7.0, we need to enable the backports repository by adding a new file backports.listunder “/etc/apt/sources.list.d/” directory with the following content.
# echo "deb wheezy-backports main" >> /etc/apt/sources.list.d/backports.list
Next, update the repository database and install HAProxy.
# apt-get update
# apt-get install haproxy -t wheezy-backports

Step 3: Configure HAProxy Logs

4. Next, we need to enable logging feature in HAProxy for future debugging. Open the main HAProxy configuration file ‘/etc/haproxy/haproxy.cfg‘ with your choice of editor.
# vim /etc/haproxy/haproxy.cfg
Next, follow the distro-specific instructions to configure logging feature in HAProxy.
On RHEL/CentOS/Fedora
Under #Global settings, enable the following line.
log local2
On Ubuntu/Debian
Under #Global settings, replace the following lines,
log /dev/log        local0
log /dev/log        local1 notice 
log local2
Enable HAProxy Logging
Enable HAProxy Logging
5. Next, we need to enable UDP syslog reception in ‘/etc/rsyslog.conf‘ configuration file to separate log files for HAProxy under /var/log directory. Open your your ‘rsyslog.conf‘ file with your choice of editor.
# vim /etc/rsyslog.conf
Uncommnet ModLoad and UDPServerRun, Here our Server will listen to Port 514 to collect the logs into syslog.
# Provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514
Configure HAProxy Logging
Configure HAProxy Logging
6. Next, we need to create a separate file ‘haproxy.conf‘ under ‘/etc/rsyslog.d/‘ directory to configure separate log files.
# vim /etc/rsyslog.d/haproxy.conf
Append following line to the newly create file.
local2.* /var/log/haproxy.log
HAProxy Logs
HAProxy Logs
Finally, restart the rsyslog service to update the new changes.
# service rsyslog restart 

Step 4: Configuring HAProxy Global Settings

7. Now, here we need to set default variables in ‘/etc/haproxy/haproxy.cfg‘ for HAProxy. The changes needs to make for default under default section as follows, Here some of the changes like timeout for queue, connect, client, server and max connections need to be defined.
In this case, I suggest you to go through the HAProxy man pages and tweak it as per your requirements.
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except
    option                  redispatch
    retries                 3
    timeout http-request    20
    timeout queue           86400
    timeout connect         86400
    timeout client          86400
    timeout server          86400
    timeout http-keep-alive 30
    timeout check           20
    maxconn                 50000
HAProxy Default Settings
HAProxy Default Settings
8. Then we need to define front-end and back-end as shown below for Balancer in ‘/etc/haproxy/haproxy.cfg‘ global configuration file. Make sure to replace the IP addresses, hostnames and HAProxy login credentials as per your requirements.
frontend LB
   reqadd X-Forwarded-Proto:\ http
   default_backend LB

backend LB
   mode http
   stats enable
   stats hide-version
   stats uri /stats
   stats realm Haproxy\ Statistics
   stats auth haproxy:redhat  # Credentials for HAProxy Statistic report page.
   balance roundrobin   # Load balancing will work in round-robin process.
   option httpchk
   option  httpclose
   option forwardfor
   cookie LB insert
   server web1-srv cookie web1-srv check  # backend server.
   server web2-srv cookie web2-srv check  # backend server.
   server web3-srv cookie web3-srv check  # backend server.
   server web4-srv check backup   # backup fail-over Server, If three of the above fails this will be activated.
HAProxy Global Configuration
HAProxy Global Configuration
9. After adding above settings, our load balancer can be accessed at ‘‘ with HTTP authentication using login name as ‘haproxy‘ and password ‘redhat‘ as mentioned in the above settings, but you can replace them with your own credentials.
10. After you’ve done with the configuration, make sure to restrat the HAProxy and make it persistent at system startup on RedHat based systems.
# service haproxy restart
# chkconfig haproxy on
# chkconfig --list haproxy
Start HAProxy
Start HAProxy
For Ubuntu/Debian users to need to set “ENABLED” option to “1” in ‘/etc/default/haproxy‘ file.

Step 5: Verify HAProxy Load Balancer

11. Now it’s time to access our Load balancer URL/IP and verify for the site whether loading. Let me put one HTML file in all four servers. Create a file index.html in all four servers in web servers document root directory and add the following content to it.

  Tecmint HAProxy Test Page

My HAProxy Test Page

Welcome to HA Proxy test page! There should be more here, but I don't know what to be write :p.
Made 11 January 2015 by Babin Lonston.
12. After creating ‘index.html‘ file, now try to access the site and see whether I can able access the copied html file.
Verify HAProxy Load Balancer
Verify HAProxy Load Balancer
Site has been successfully accessed.

Step 6: Verify Statistic of Load Balancer

13. To get the statistic page of HAProxy, you can use the following link. While asking for Username and password we have to provide the haproxy/redhat.
HAProxy Statistics Login
HAProxy Statistics Login
HAProxy Statistics
HAProxy Statistics

Step 7: Enabling SSL in HAProxy

14. To enable SSL in HAProxy, you need to install mod_ssl package for creating SSL Certificate for HAProxy.
On RHEL/CentOS/Fedora
To install mod_ssl run the following command
# yum install mod_ssl -y
On Ubuntu/Debian
By default under Ubuntu/Debian SSL support comes standard with Apache package. We just need to enable it..
# a2enmod ssl
After you’ve enabled SSL, restart the Apache server for the change to be recognized.
# service apache2 restart
15. After restarting, Navigate to the SSL directory and create SSL certificate using following commands.
# cd /etc/ssl/
# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/tecmint.key -out /etc/ssl/tecmint.crt
# cat tecmint.crt tecmint.key > tecmint.pem
Create SSL for HAProxy
Create SSL for HAProxy
SSL Certificate for HAProxy
SSL Certificate for HAProxy
16. Open and edit the haproxy configuration and add the SSL front-end as below.
# vim /etc/haproxy/haproxy.cfg 
Add the following configuration as frontend.
frontend LBS
   bind ssl crt /etc/ssl/tecmint.pem
   reqadd X-Forwarded-Proto:\ https
   default_backend LB
17. Next, add the redirect rule in backend configuration.
redirect scheme https if !{ ssl_fc }
Enable SSL on HAProxy
Enable SSL on HAProxy
18. After making above changes, make sure to restart the haproxy service.
# service haproxy restart
While restarting if we get the below warning, we can fix it by adding a parameter in Global Section of  haproxy.
SSL HAProxy Error
SSL HAProxy Error
tune.ssl.default-dh-param 2048
19. After restarting, try to access the site, Now it will forward to https.
Verify SSL HAProxy
Verify SSL HAProxy
SSL Enabled HAProxy
SSL Enabled HAProxy
20. Next, verify the haproxy.log under ‘/var/log/‘ directory.
# tail -f /var/log/haproxy.log
Check HAProxy Logs
Check HAProxy Logs

Step 8: Open HAProxy Ports on Firewall

21. Open the port’s for web service and Log reception UDP port using below rules.
On CentOS/RHEL 6
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -i eth0 -p udp --dport 514 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 80 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 443 -m state --state NEW,ESTABLISHED -j ACCEPT
On CentOS/RHEL 7 and Fedora 21
# firewall­cmd ­­permanent ­­zone=public ­­add­port=514/tcp
# firewall­cmd ­­permanent ­­zone=public ­­add­port=80/tcp
# firewall­cmd ­­permanent ­­zone=public ­­add­port=443/tcp
# firewall­cmd ­­reload 
On Debian/Ubuntu
Add the following line to ‘/etc/iptables.up.rules‘ to enable ports on firewall.
A INPUT ­p tcp ­­dport 514 ­j ACCEPT 
A INPUT ­p tcp ­­dport 80 ­j ACCEPT 
A INPUT ­p tcp ­­dport 443 ­j ACCEPT 


In this article, we’ve installed Apache in 4 server’s and shared a website for reducing the traffic load. I Hope this article will help you to setup a Load Balancer for web server’s using HAProxy and make your applications more stable and available