Thursday, July 9, 2015

How to Set Up Hadoop Multi-Node Cluster on CentOS 7/6

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Our earlier article about hadoop was describing to how to setup single node cluster. This article will help you for step by step installing and configuring Hadoop Multi-Node Cluster on CentOS/RHEL 6.

Setup Details:

Hadoop Master: ( hadoop-master )
Hadoop Slave : ( hadoop-slave-1 )
Hadoop Slave : ( hadoop-slave-2 )

Step 1. Install Java

Before installing hadoop make sure you have java installed on all nodes of hadoop cluster systems.
# java -version

java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
If you do not have java installed use following article to install Java.

Step 2. Create User Account

Create a system user account on both master and slave systems to use for hadoop installation
# useradd hadoop
# passwd hadoop
Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Step 3: Add FQDN Mapping

Edit /etc/hosts file on all master and slave servers and add following entries.
# vim /etc/hosts hadoop-master hadoop-slave-1 hadoop-slave-2

Step 4. Configuring Key Based Login

It’s required to set up hadoop user to ssh itself without password. Use following commands to configure auto login between all hadoop cluster servers..
# su - hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-master
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/ hadoop@hadoop-slave-2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit

Step 5. Download and Extract Hadoop Source

Download hadoop latest available version from its official site at hadoop-master server only.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/

Step 6: Configure Hadoop

First edit hadoop configuration files and make following changes.
6.1 Edit core-site.xml
# vim conf/core-site.xml
#Add the following inside the configuration tag


6.2 Edit hdfs-site.xml
# vim conf/hdfs-site.xml
# Add the following inside the configuration tag


6.3 Edit mapred-site.xml
# vim conf/mapred-site.xml
# Add the following inside the configuration tag


6.4 Edit
# vim conf/
export JAVA_HOME=/opt/jdk1.7.0_75
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Set JAVA_HOME path as per your system configuration for java.

Step 7: Copy Hadoop Source to Slave Servers

After updating above configuration, we need to copy the source files to all slaves servers.
# su - hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave-1:/opt/hadoop
$ scp -r hadoop hadoop-slave-2:/opt/hadoop

Step 8: Configure Hadoop on Master Server Only

Go to hadoop source folder on hadoop-master and do following settings.
# su - hadoop
$ cd /opt/hadoop/hadoop
$ vim conf/masters

$ vim conf/slaves

Format Name Node on Hadoop Master only
# su - hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode -format
13/07/13 10:58:07 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.0
STARTUP_MSG:   build = -r 1479473; compiled by 'hortonfo' on Mon May  6 06:59:37 UTC 2013
STARTUP_MSG:   java = 1.7.0_25
13/07/13 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap
13/07/13 10:58:08 INFO util.GSet: VM type       = 32-bit
13/07/13 10:58:08 INFO util.GSet: 2.0% max memory = 1013645312
13/07/13 10:58:08 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/07/13 10:58:08 INFO util.GSet: recommended=4194304, actual=4194304
13/07/13 10:58:08 INFO namenode.FSNamesystem: fsOwner=hadoop
13/07/13 10:58:08 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/13 10:58:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/13 10:58:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/13 10:58:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/13 10:58:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
13/07/13 10:58:08 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/13 10:58:08 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/13 10:58:08 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted.
13/07/13 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/

Step 9: Start Hadoop Services

Use the following command to start all hadoop services on Hadoop-Master
$ bin/

How to Setup Hadoop 2.6.0 (Single Node Cluster) on CentOS/RHEL and Ubuntu

Apache Hadoop 2.6 significant improvements over the previous stable 2.X.Y releases. This version has many improvements in HDFS and MapReduce. This how to guide will help you to install Hadoop 2.6 on CentOS/RHEL 7/6/5 and Ubuntu System. This article doesn’t includes overall configuration of hadoop, we have only basic configuration required to start working with hadoop.
Hadoop on Linux

Step 1: Installing Java

Java is the primary requirement for running hadoop on any system, So make sure you have Java installed on your system using following command.
# java -version 

java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
If you don’t have Java installed on your system, use one of following link to install it first.

Step 2: Creating Hadoop User

We recommend to create a normal (nor root) account for hadoop working. So create a system account using following command.
# useradd hadoop
# passwd hadoop
After creating account, it also required to set up key based ssh to its own account. To do this use execute following commands.
# su - hadoop
$ ssh-keygen -t rsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Lets verify key based login. Below command should not ask for password but first time it will prompt for adding RSA to the list of known hosts.
$ ssh localhost
$ exit

Step 3. Downloading Hadoop 2.6.0

Now download hadoop 2.6.0 source archive file using below command. You can also select alternate download mirror for increasing download speed.
$ cd ~
$ wget
$ tar xzf hadoop-2.6.0.tar.gz
$ mv hadoop-2.6.0 hadoop

Step 4. Configure Hadoop Pseudo-Distributed Mode

4.1. Setup Environment Variables

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.
export HADOOP_HOME=/home/hadoop/hadoop
Now apply the changes in current running environment
$ source ~/.bashrc
Now edit $HADOOP_HOME/etc/hadoop/ file and set JAVA_HOME environment variable
export JAVA_HOME=/opt/jdk1.8.0_31/

4.2. Edit Configuration Files

Hadoop has many of configuration files, which need to configure as per requirements of your hadoop infrastructure. Lets start with the configuration with basic hadoop single node cluster setup. first navigate to below location
$ cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml

Edit hdfs-site.xml


Edit mapred-site.xml

Edit yarn-site.xml


4.3. Format Namenode

Now format the namenode using following command, make sure that Storage directory is
$ hdfs namenode -format
Sample output:
15/02/04 09:58:43 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host =
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
15/02/04 09:58:57 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
15/02/04 09:58:57 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/04 09:58:57 INFO util.ExitUtil: Exiting with status 0
15/02/04 09:58:57 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

Step 5. Start Hadoop Cluster

Lets start your hadoop cluster using the scripts provides by hadoop. Just navigate to your hadoop sbin directory and execute scripts one by one.
$ cd $HADOOP_HOME/sbin/
Now run script.
Sample output:
15/02/04 10:00:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/
Starting secondary namenodes []
The authenticity of host ' (' can't be established.
RSA key fingerprint is 3c:c4:f6:f1:72:d9:84:f9:71:73:4a:0d:55:2c:f9:43.
Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '' (RSA) to the list of known hosts. starting secondarynamenode, logging to /home/hadoop/hadoop/logs/
15/02/04 10:01:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Now run script.
Sample output:
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/
localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/

Step 6. Access Hadoop Services in Browser

Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser.
hadoop single node namenode
Now access port 8088 for getting the information about cluster and all applications
hadoop single node applications
Access port 50090 for getting details about secondary namenode.
Hadoop single node secondary namenode
Access port 50075 to get details about DataNode

Step 7. Test Hadoop Single Node Setup

7.1 – Make the HDFS directories required using following commands.
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hadoop
7.2 – Now copy all files from local file system /var/log/httpd to hadoop distributed file system using below command
$ bin/hdfs dfs -put /var/log/httpd logs
7.3 – Now browse hadoop distributed file system by opening below url in browser.
7.4 – Now copy logs directory for hadoop distributed file system to local file system.
$ bin/hdfs dfs -get logs /tmp/logs
$ ls -l /tmp/logs/

How to Setup MySQL (Master-Slave) Replication in RHEL, CentOS, Fedora

The following tutorial aims to provide you a simple step-by-step guide for setting up MySQL (Master-Slave)Replication in RHEL 6.3/6.2/6.1/6/5.8CentOS 6.3/6.2/6.1/6/5.8 and Fedora 17,16,15,14,13,12 using latestMySQL version. This guide is specially written for CentOS 6.3 Operating System, but also work with older version of Linux distributions with MySQL 5.x.
mysql replication in Linux
MySQL Master-Slave Replication in RedHat / CentOS / Fedora
The MySQL Replication is very useful in terms of Data SecurityFail-over SolutionDatabase Backup from SlaveAnalytics etc. We use the following things to carry the replication process. In your scenario it would be different.
  1. Working Linux OS like CentOS 6.3RedHat 6.3 or Fedora 17
  2. Master and Slave are CentOS 6.3 Linux Servers.
  3. Master IP Address is:
  4. Slave IP Address is:
  5. Master and Slave are on the same LAN network.
  6. Master and Slave has MySQL version installed.
  7. Master allow remote MySQL connections on port 3306.
We have two servers, one is Master with IP ( and other is Slave as ( We have divided the setup process in two phases to make things easier for you, In Phase I we will configure Master server and inPhase II with Slave server. Let’s start the replication setup process.

Phase I: Configure Master Server ( for Replication

In Phase I, we will see the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Master Server
First, proceed with MySQL installation using YUM command. If you already have MySQL installation, you can skip this step.
# yum install mysql-server mysql
Configure a MySQL in Master Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace tecmint with database name that you would like to replicate on Slave.
server-id = 1
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/
relay-log-info-file = /var/lib/mysql/
log-bin = /var/lib/mysql/mysql-bin
Restart the MySQL service.
# /etc/init.d/mysqld restart
Login into MySQL as root user and create the slave user and grant privileges for replication. Replace slave_userwith user and your_password with password.
# mysql -u root -p
mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'%' IDENTIFIED BY 'your_password';

| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
| mysql-bin.000003 | 11128001 | tecmint   |                  |
1 row in set (0.00 sec)

mysql> quit;
Please write down the File (mysql-bin.000003) and Position (11128001) numbers, we required these numbers later on Slave server. Next apply READ LOCK to databases to export all the database and master database information with mysqldump command.
#  mysqldump -u root -p --all-databases --master-data > /root/dbdump.db
Once you’ve dump all the databases, now again connect to mysql as root user and unlcok tables.
mysql> quit;
Upload the database dump file on Slave Server ( using SCP command.
scp /root/dbdump.db root@
That’s it we have successfully configured Master server, let’s proceed to Phase II section.

Phase II: Configure Slave Server ( for Replication

In Phase II, we do the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Slave Server
If you don’t have MySQL installed, then install it using YUM command.
# yum install mysql-server mysql
Configure a MySQL in Slave Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace IP address of Master server,tecmint with database name etc, that you would like to replicate with Master.
server-id = 2
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/
relay-log-info-file = /var/lib/mysql/
log-bin = /var/lib/mysql/mysql-bin
Now import the dump file that we exported in earlier command and restart the MySQL service.
# mysql -u root -p < /root/dbdump.db
# /etc/init.d/mysqld restart
Login into MySQL as root user and stop the slave. Then tell the slave to where to look for Master log file, that we have write down on master with SHOW MASTER STATUS; command as File (mysql-bin.000003) and Position (11128001) numbers. You must change to the IP address of the Master Server, and change the user and password accordingly.
# mysql -u root -p
mysql> slave stop;
mysql> CHANGE MASTER TO MASTER_HOST='', MASTER_USER='slave_user', MASTER_PASSWORD='yourpassword', MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=11128001;
mysql> slave start;
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: slave_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 12345100
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 11381900
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: tecmint
                   Last_Errno: 0
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 12345100
              Relay_Log_Space: 11382055
              Until_Condition: None
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
               Last_SQL_Errno: 0
1 row in set (0.00 sec)

Verifying MySQL Replication on Master and Slave Server

It's really very important to know that the replication is working perfectly. On Master server create table and insert some values in it.
On Master Server
mysql> create database tecmint;
mysql> use tecmint;
mysql> CREATE TABLE employee (c int);
mysql> INSERT INTO employee (c) VALUES (1);
mysql> SELECT * FROM employee;
|  c  |
|  1  |
1 row in set (0.00 sec)
On Slave Server
Verifying the SLAVE, by running the same command, it will return the same values in the slave too.
mysql> use tecmint;
mysql> SELECT * FROM employee;
|  c  |
|  1  |
1 row in set (0.00 sec)
That's it, finally you've configured MySQL Replication in a few simple steps. More information can be found atMySQL Replication Guide.