

Table of Contents
- Installing Hadoop on Ubuntu Linux (Single Node Cluster)
- Installation Environment
- Step 1: Java-JDK Installation:
- Step 2: Install SSH
- Step 3: Configuring SSH
- Step 4: Testing SSH configuration
- Optional Step : Debugging SSH configuration
- Step 5: Download Hadoop
- Step 6: Create folder using mkdir command for saving hadoop installation files
- Step 7: Extract tar file content using tar command
- Step 8: Move folder content using mv command
- Step 9: Set permission using chmod command on Hadoop folder
- Step 10: Configuring Hadoop related environment variables
- Step 11: Run/Apply bashrc file changes
- Step 12: Create directories to save Hadoop data
- Step 13: Configure Hadoop core-site.xml file
- Step 14: Configure Hadoop hdfs-site.xml file
- Step 15: Configure Hadoop mapred-site.xml file
- Step 16: Configure hadoop yarn-site.xml file
- Step 17: Configure JAVA_HOME in hadoop-env.sh file
- Step 18: Format Hadoop namenode
- Step 19: Start Hadoop components using commands
- Step 20: Check whether Hadoop components are running or not using jps command
- Step 21: Verify Hadoop installation using pi example
Installing Hadoop on Ubuntu Linux (Single Node Cluster)
Now a days, Bigdata is a buzzword, The prominent technology behind this jargon is Hadoop.
It is a good to have skill in developer’s resume. In order to learn Hadoop, it is mandatory to have a single node Hadoop cluster ready to play with Hadoop.
So, In this article, We are going to explain you a way of Installing Hadoop on Ubuntu Linux.
Installation Environment
I have tested the following tutorial in below environment.
OS : Ubuntu Linux(14.04 LTS) – 64bit
Hadoop : Hadoop-2.2.0
Prerequisites : Oracle Java 8.
Step 1: Java-JDK Installation:
The primary requirement of Hadoop installation is Java.
We can use Oracle JDK, OpenJDK or IBM JDK as per our requirements.
Compatibility of Hadoop with each JDK flavour can be found at Hadoop Wiki.
In one of our previous article, I have provided step by step guide to install Java in Ubuntu Linux, so we can skip this Java installation step here.
Step 2: Install SSH
It is required to install SSH so that Hadoop namenode(s) and datanodes can communicate with each other using SSH.
So if it is not already installed, we can install SSH by using following commands:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
Step 3: Configuring SSH
After installing ssh on localhost, We need to use following commands to generate SSH key for current user with empty password.
Hadoop uses SSH to access and manage its nodes running Hadoop and we are keeping passwords as an empty string because otherwise user has to enter his/her password each time when Hadoop interacts with other node.
$ ssh-keygen -t rsa -P ""
Once SSH key is generated, we need to execute following command:
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Step 4: Testing SSH configuration
Our SSH configuration for localhost is now complete, so we can test SSH installation using the command:
$ ssh localhost
Optional Step : Debugging SSH configuration
Up to this step, if all steps are completed successfuly, then SSH is configured and running on localhost.
If SSH connection fails, enable SSH debugging using ssh -vvv localhost
command and get an error in detail for troubleshooting.
Step 5: Download Hadoop
Download Apache hadoop from Apache Download Mirrors,
I have used link to download Hadoop 2.2.0 from Apache mirrors.
Once the file is downloaded, we have to select appropriate location in the system to save the Hadoop.
I have selected the path like, /usr/local/hadoop/<Different_Hadoop_Versions>
.
Step 6: Create folder using mkdir command for saving hadoop installation files
So, in order to save Hadoop files, we have to create a folder under /usr/local,
I have created a folder using command:
$ sudo mkdir /usr/local/hadoop
Step 7: Extract tar file content using tar command
Once the folder is created, we have to extract the file that we have downloaded from Apache Hadoop’s website.
I have downloaded and placed that file to /home/javadeveloperzone/Desktop,
So I have to execute command for extracting the file:
$ tar -xvf hadoop-2.2.0.tar.gz
Step 8: Move folder content using mv command
Now we have to move the extracted folder to the location at which we want to save (in our case /usr/local/hadoop/
).
I have used below command to move the extracted files to /usr/local/hadoop/
.
$ sudo mv hadoop-2.2.0 /usr/local/hadoop
Step 9: Set permission using chmod command on Hadoop folder
Once the files are moved, it is time to get appropriate permissions on the folder,
I have used following command to acquire appropriate permission.
$ sudo chmod -R 777 /usr/local/hadoop/hadoop-2.2.0/
Now the setup part is almost done, we will start the configuration part,
In the configuration step, we have to set multiple HADOOP variables, In order to set the variables, we have to add them in $HOME/.bashrc
file.
I have used gedit $HOME/.bashrc
command to open the file, you can use any of your favorite editor to open the file and append the below lines to that file.
# Set Hadoop-related environment variables export HADOOP_PREFIX=/usr/local/hadoop/hadoop-2.2.0 export HADOOP_HOME=/usr/local/hadoop/hadoop-2.2.0 export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop # Native Path export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Once the modifications are completed, save that file.
After modifications, my .bashrc file looks like,
Step 11: Run/Apply bashrc file changes
Now we have to execute the below command so that the Hadoop variables will take effect,
$ source $HOME/.bashrc
Step 12: Create directories to save Hadoop data
Now in next step, we have to create 3 directories where Hadoop will save its namenode and datanode data and temp data.
Create following directories under /home folder.
name, data and temp directories.
As we are using single node installation, it won’t be a permission problem if the same user creates directory under /home/{USER}
folder and provide them as namenode, datanode and temp data directory in Hadoop configurations.
Step 13: Configure Hadoop core-site.xml file
Now we have to specify those directories in Hadoop configuration files,
Open the core-site.xml
file available under /usr/local/hadoop/hadoop-2.2.0/etc/hadoop
location.
Place the below content in between <configuration></configuration>
tag.
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/javadeveloperzone/hadoop/hadoop-2.2.0/temp</value> </property>
After modifications, my core-site.xml file looks like,
Step 14: Configure Hadoop hdfs-site.xml file
Open the hdfs-site.xml
file available under /usr/local/hadoop/hadoop-2.2.0/etc/hadoop
location.
Place the below content in between <configuration></configuration>
tag.
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/javadeveloperzone/hadoop/hadoop-2.2.0/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/javadeveloperzone/hadoop/hadoop-2.2.0/data</value> </property>
Step 15: Configure Hadoop mapred-site.xml file
By default mapred-site.xml
file will be present as, mapred-site.xml.template
.
Rename the file mapred-site.xml.template
to mapred-site.xml
.
Open the mapred-site.xml
file and place the below content in between <configuration></configuration>
tag.
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Step 16: Configure hadoop yarn-site.xml file
Open the yarn-site.xml
file available under /usr/local/hadoop/hadoop-2.2.0/etc/hadoop
location.
Place the below content in between <configuration></configuration>
tag.
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
Step 17: Configure JAVA_HOME in hadoop-env.sh file
Open the hadoop-env.sh
file available under /usr/local/hadoop/hadoop-2.2.0/etc/hadoop
location and update JAVA_HOME
variable pointing to your java installation location.
I have updated like,
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_45
Step 18: Format Hadoop namenode
Now all the configurations are completed, we have to format namenode using below command.
$ hadoop namenode -format
Step 19: Start Hadoop components using commands
Now if our namenode is formatted without any errors, we can start hadoop using below commands,
$ start-dfs.sh
$ start-yarn.sh
Step 20: Check whether Hadoop components are running or not using jps command
If the above scripts runs properly, it will start different hadoop processes listed below,
you can check them using jps command.
$ jps
It will list below processes,
9008 ResourceManager 8674 DataNode 8530 NameNode 8854 SecondaryNameNode 9308 NodeManager 10206 Jps
The process ids will be different in every case.
Step 21: Verify Hadoop installation using pi example
Now if all of the above processes are started, then our Hadoop installation is proper, we can try Hadoop installation by using example code which is shipped with Hadoop.
Use the following command to run a pi example,
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 4 1000
You can use below urls for HDFS and hadoop cluster.
http://localhost:50070/
http://localhost:8088
Now, your Hadoop single node cluster is running, you can run Hadoop map reduce programs. In this way, we can install Hadoop on ubuntu Linux.
Happy Hadooping…
Related Links:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html