Thursday, July 9, 2015

How to Set Up Hadoop Multi-Node Cluster on CentOS 7/6

http://tecadmin.net/set-up-hadoop-multi-node-cluster-on-centos-redhat/

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
Our earlier article about hadoop was describing to how to setup single node cluster. This article will help you for step by step installing and configuring Hadoop Multi-Node Cluster on CentOS/RHEL 6.
hadoop-st

Setup Details:

Hadoop Master: 192.168.1.15 ( hadoop-master )
Hadoop Slave : 192.168.1.16 ( hadoop-slave-1 )
Hadoop Slave : 192.168.1.17 ( hadoop-slave-2 )

Step 1. Install Java

Before installing hadoop make sure you have java installed on all nodes of hadoop cluster systems.
# java -version

java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
If you do not have java installed use following article to install Java.

Step 2. Create User Account

Create a system user account on both master and slave systems to use for hadoop installation
# useradd hadoop
# passwd hadoop
Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Step 3: Add FQDN Mapping

Edit /etc/hosts file on all master and slave servers and add following entries.
# vim /etc/hosts
192.168.1.15 hadoop-master
192.168.1.16 hadoop-slave-1
192.168.1.17 hadoop-slave-2

Step 4. Configuring Key Based Login

It’s required to set up hadoop user to ssh itself without password. Use following commands to configure auto login between all hadoop cluster servers..
# su - hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop-slave-2
$ chmod 0600 ~/.ssh/authorized_keys
$ exit

Step 5. Download and Extract Hadoop Source

Download hadoop latest available version from its official site at hadoop-master server only.
# mkdir /opt/hadoop
# cd /opt/hadoop/
# wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.0/hadoop-1.2.0.tar.gz
# tar -xzf hadoop-1.2.0.tar.gz
# mv hadoop-1.2.0 hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/

Step 6: Configure Hadoop

First edit hadoop configuration files and make following changes.
6.1 Edit core-site.xml
# vim conf/core-site.xml
#Add the following inside the configuration tag

    fs.default.name
    hdfs://hadoop-master:9000/


    dfs.permissions
    false

6.2 Edit hdfs-site.xml
# vim conf/hdfs-site.xml
# Add the following inside the configuration tag

 dfs.data.dir
 /opt/hadoop/hadoop/dfs/name/data
 true


 dfs.name.dir
 /opt/hadoop/hadoop/dfs/name
 true


 dfs.replication
 1

6.3 Edit mapred-site.xml
# vim conf/mapred-site.xml
# Add the following inside the configuration tag

        mapred.job.tracker
 hadoop-master:9001

6.4 Edit hadoop-env.sh
# vim conf/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_75
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export HADOOP_CONF_DIR=/opt/hadoop/hadoop/conf
Set JAVA_HOME path as per your system configuration for java.

Step 7: Copy Hadoop Source to Slave Servers

After updating above configuration, we need to copy the source files to all slaves servers.
# su - hadoop
$ cd /opt/hadoop
$ scp -r hadoop hadoop-slave-1:/opt/hadoop
$ scp -r hadoop hadoop-slave-2:/opt/hadoop

Step 8: Configure Hadoop on Master Server Only

Go to hadoop source folder on hadoop-master and do following settings.
# su - hadoop
$ cd /opt/hadoop/hadoop
$ vim conf/masters

hadoop-master
$ vim conf/slaves

hadoop-slave-1
hadoop-slave-2
Format Name Node on Hadoop Master only
# su - hadoop
$ cd /opt/hadoop/hadoop
$ bin/hadoop namenode -format
13/07/13 10:58:07 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/192.168.1.15
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.0
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May  6 06:59:37 UTC 2013
STARTUP_MSG:   java = 1.7.0_25
************************************************************/
13/07/13 10:58:08 INFO util.GSet: Computing capacity for map BlocksMap
13/07/13 10:58:08 INFO util.GSet: VM type       = 32-bit
13/07/13 10:58:08 INFO util.GSet: 2.0% max memory = 1013645312
13/07/13 10:58:08 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/07/13 10:58:08 INFO util.GSet: recommended=4194304, actual=4194304
13/07/13 10:58:08 INFO namenode.FSNamesystem: fsOwner=hadoop
13/07/13 10:58:08 INFO namenode.FSNamesystem: supergroup=supergroup
13/07/13 10:58:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/07/13 10:58:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/07/13 10:58:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/07/13 10:58:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
13/07/13 10:58:08 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/07/13 10:58:08 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/07/13 10:58:08 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/hadoop/hadoop/dfs/name/current/edits
13/07/13 10:58:08 INFO common.Storage: Storage directory /opt/hadoop/hadoop/dfs/name has been successfully formatted.
13/07/13 10:58:08 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.1.15
************************************************************/

Step 9: Start Hadoop Services

Use the following command to start all hadoop services on Hadoop-Master
$ bin/start-all.sh

How to Setup Hadoop 2.6.0 (Single Node Cluster) on CentOS/RHEL and Ubuntu

http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/

Apache Hadoop 2.6 significant improvements over the previous stable 2.X.Y releases. This version has many improvements in HDFS and MapReduce. This how to guide will help you to install Hadoop 2.6 on CentOS/RHEL 7/6/5 and Ubuntu System. This article doesn’t includes overall configuration of hadoop, we have only basic configuration required to start working with hadoop.
Hadoop on Linux

Step 1: Installing Java

Java is the primary requirement for running hadoop on any system, So make sure you have Java installed on your system using following command.
# java -version 

java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
If you don’t have Java installed on your system, use one of following link to install it first.

Step 2: Creating Hadoop User

We recommend to create a normal (nor root) account for hadoop working. So create a system account using following command.
# useradd hadoop
# passwd hadoop
After creating account, it also required to set up key based ssh to its own account. To do this use execute following commands.
# su - hadoop
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Lets verify key based login. Below command should not ask for password but first time it will prompt for adding RSA to the list of known hosts.
$ ssh localhost
$ exit

Step 3. Downloading Hadoop 2.6.0

Now download hadoop 2.6.0 source archive file using below command. You can also select alternate download mirror for increasing download speed.
$ cd ~
$ wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
$ tar xzf hadoop-2.6.0.tar.gz
$ mv hadoop-2.6.0 hadoop

Step 4. Configure Hadoop Pseudo-Distributed Mode

4.1. Setup Environment Variables

First we need to set environment variable uses by hadoop. Edit ~/.bashrc file and append following values at end of file.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Now apply the changes in current running environment
$ source ~/.bashrc
Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable
export JAVA_HOME=/opt/jdk1.8.0_31/

4.2. Edit Configuration Files

Hadoop has many of configuration files, which need to configure as per requirements of your hadoop infrastructure. Lets start with the configuration with basic hadoop single node cluster setup. first navigate to below location
$ cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml



  fs.default.name
    hdfs://localhost:9000


Edit hdfs-site.xml



 dfs.replication
 1



  dfs.name.dir
    file:///home/hadoop/hadoopdata/hdfs/namenode



  dfs.data.dir
    file:///home/hadoop/hadoopdata/hdfs/datanode


Edit mapred-site.xml


 
  mapreduce.framework.name
   yarn
 

Edit yarn-site.xml


 
  yarn.nodemanager.aux-services
    mapreduce_shuffle
 

4.3. Format Namenode

Now format the namenode using following command, make sure that Storage directory is
$ hdfs namenode -format
Sample output:
15/02/04 09:58:43 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = svr1.tecadmin.net/192.168.1.133
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.0
...
...
15/02/04 09:58:57 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
15/02/04 09:58:57 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/02/04 09:58:57 INFO util.ExitUtil: Exiting with status 0
15/02/04 09:58:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at svr1.tecadmin.net/192.168.1.133
************************************************************/

Step 5. Start Hadoop Cluster

Lets start your hadoop cluster using the scripts provides by hadoop. Just navigate to your hadoop sbin directory and execute scripts one by one.
$ cd $HADOOP_HOME/sbin/
Now run start-dfs.sh script.
$ start-dfs.sh
Sample output:
15/02/04 10:00:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-svr1.tecadmin.net.out
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-svr1.tecadmin.net.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 3c:c4:f6:f1:72:d9:84:f9:71:73:4a:0d:55:2c:f9:43.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-svr1.tecadmin.net.out
15/02/04 10:01:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Now run start-yarn.sh script.
$ start-yarn.sh
Sample output:
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-svr1.tecadmin.net.out
localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-svr1.tecadmin.net.out

Step 6. Access Hadoop Services in Browser

Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser.
http://svr1.tecadmin.net:50070/
hadoop single node namenode
Now access port 8088 for getting the information about cluster and all applications
http://svr1.tecadmin.net:8088/
hadoop single node applications
Access port 50090 for getting details about secondary namenode.
http://svr1.tecadmin.net:50090/
Hadoop single node secondary namenode
Access port 50075 to get details about DataNode
http://svr1.tecadmin.net:50075/
hadoop-2-6-single-node-datanode

Step 7. Test Hadoop Single Node Setup

7.1 – Make the HDFS directories required using following commands.
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hadoop
7.2 – Now copy all files from local file system /var/log/httpd to hadoop distributed file system using below command
$ bin/hdfs dfs -put /var/log/httpd logs
7.3 – Now browse hadoop distributed file system by opening below url in browser.
 http://svr1.tecadmin.net:50070/explorer.html#/user/hadoop/logs
hadoop-test-uploaded-files
7.4 – Now copy logs directory for hadoop distributed file system to local file system.
$ bin/hdfs dfs -get logs /tmp/logs
$ ls -l /tmp/logs/

How to Setup MySQL (Master-Slave) Replication in RHEL, CentOS, Fedora

http://www.tecmint.com/how-to-setup-mysql-master-slave-replication-in-rhel-centos-fedora/

The following tutorial aims to provide you a simple step-by-step guide for setting up MySQL (Master-Slave)Replication in RHEL 6.3/6.2/6.1/6/5.8CentOS 6.3/6.2/6.1/6/5.8 and Fedora 17,16,15,14,13,12 using latestMySQL version. This guide is specially written for CentOS 6.3 Operating System, but also work with older version of Linux distributions with MySQL 5.x.
mysql replication in Linux
MySQL Master-Slave Replication in RedHat / CentOS / Fedora
The MySQL Replication is very useful in terms of Data SecurityFail-over SolutionDatabase Backup from SlaveAnalytics etc. We use the following things to carry the replication process. In your scenario it would be different.
  1. Working Linux OS like CentOS 6.3RedHat 6.3 or Fedora 17
  2. Master and Slave are CentOS 6.3 Linux Servers.
  3. Master IP Address is: 192.168.1.1.
  4. Slave IP Address is: 192.168.1.2.
  5. Master and Slave are on the same LAN network.
  6. Master and Slave has MySQL version installed.
  7. Master allow remote MySQL connections on port 3306.
We have two servers, one is Master with IP (192.168.1.1) and other is Slave as (192.168.1.2). We have divided the setup process in two phases to make things easier for you, In Phase I we will configure Master server and inPhase II with Slave server. Let’s start the replication setup process.

Phase I: Configure Master Server (192.168.1.1) for Replication

In Phase I, we will see the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Master Server
First, proceed with MySQL installation using YUM command. If you already have MySQL installation, you can skip this step.
# yum install mysql-server mysql
Configure a MySQL in Master Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace tecmint with database name that you would like to replicate on Slave.
server-id = 1
binlog-do-db=tecmint
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/mysql-master.info
relay-log-info-file = /var/lib/mysql/mysql-relay-log.info
log-bin = /var/lib/mysql/mysql-bin
Restart the MySQL service.
# /etc/init.d/mysqld restart
Login into MySQL as root user and create the slave user and grant privileges for replication. Replace slave_userwith user and your_password with password.
# mysql -u root -p
mysql> GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'%' IDENTIFIED BY 'your_password';
mysql> FLUSH PRIVILEGES;
mysql> FLUSH TABLES WITH READ LOCK;
mysql> SHOW MASTER STATUS;

+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 11128001 | tecmint   |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> quit;
Please write down the File (mysql-bin.000003) and Position (11128001) numbers, we required these numbers later on Slave server. Next apply READ LOCK to databases to export all the database and master database information with mysqldump command.
#  mysqldump -u root -p --all-databases --master-data > /root/dbdump.db
Once you’ve dump all the databases, now again connect to mysql as root user and unlcok tables.
mysql> UNLOCK TABLES;
mysql> quit;
Upload the database dump file on Slave Server (192.168.1.2) using SCP command.
scp /root/dbdump.db root@192.168.1.2:/root/
That’s it we have successfully configured Master server, let’s proceed to Phase II section.

Phase II: Configure Slave Server (192.168.1.2) for Replication

In Phase II, we do the installation of MySQL, setting up Replication and then verifying replication.
Install a MySQL in Slave Server
If you don’t have MySQL installed, then install it using YUM command.
# yum install mysql-server mysql
Configure a MySQL in Slave Server
Open my.cnf configuration file with VI editor.
# vi /etc/my.cnf
Add the following entries under [mysqld] section and don’t forget to replace IP address of Master server,tecmint with database name etc, that you would like to replicate with Master.
server-id = 2
master-host=192.168.1.1
master-connect-retry=60
master-user=slave_user
master-password=yourpassword
replicate-do-db=tecmint
relay-log = /var/lib/mysql/mysql-relay-bin
relay-log-index = /var/lib/mysql/mysql-relay-bin.index
log-error = /var/lib/mysql/mysql.err
master-info-file = /var/lib/mysql/mysql-master.info
relay-log-info-file = /var/lib/mysql/mysql-relay-log.info
log-bin = /var/lib/mysql/mysql-bin
Now import the dump file that we exported in earlier command and restart the MySQL service.
# mysql -u root -p < /root/dbdump.db
# /etc/init.d/mysqld restart
Login into MySQL as root user and stop the slave. Then tell the slave to where to look for Master log file, that we have write down on master with SHOW MASTER STATUS; command as File (mysql-bin.000003) and Position (11128001) numbers. You must change 192.168.1.1 to the IP address of the Master Server, and change the user and password accordingly.
# mysql -u root -p
mysql> slave stop;
mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.1', MASTER_USER='slave_user', MASTER_PASSWORD='yourpassword', MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=11128001;
mysql> slave start;
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.1
                  Master_User: slave_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 12345100
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 11381900
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: tecmint
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 12345100
              Relay_Log_Space: 11382055
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

Verifying MySQL Replication on Master and Slave Server

It's really very important to know that the replication is working perfectly. On Master server create table and insert some values in it.
On Master Server
mysql> create database tecmint;
mysql> use tecmint;
mysql> CREATE TABLE employee (c int);
mysql> INSERT INTO employee (c) VALUES (1);
mysql> SELECT * FROM employee;
+------+
|  c  |
+------+
|  1  |
+------+
1 row in set (0.00 sec)
On Slave Server
Verifying the SLAVE, by running the same command, it will return the same values in the slave too.
mysql> use tecmint;
mysql> SELECT * FROM employee;
+------+
|  c  |
+------+
|  1  |
+------+
1 row in set (0.00 sec)
That's it, finally you've configured MySQL Replication in a few simple steps. More information can be found atMySQL Replication Guide.