Overview

In this article, we are going to cover one of the most import installation topics, i.e Installing Apache Spark on Ubuntu Linux.
Installing Apache Spark on Ubuntu Linux is a relatively simple procedure as compared to other Bigdata
tools.

Installation Environment & Software Prerequisites

OS       :  Ubuntu Linux(14.04 LTS) – 64bit
Java     :  Oracle JDK 1.8
Scala    : Scala-2.11.7
Spark   : Apache Spark 2.0.0-bin-hadoop2.6

Step 1: Java-JDK Installation:

In one of our previous article, We have provided step by step guide to Install Java in Ubuntu Linux, so we can skip this Java installation step here.

Step 2: Scala Installation:

For Scala installation, we need to select appropriate Scala version which is compatible with Apache Spark.
In our case we are interested in installing Spark 2.0.0, So we are going to install Scala-2.11.7.
In one of our previous article, we have explained steps for Installing Scala on Ubuntu Linux.

Step 3: Download Apache Spark:

Download Spark 2.2.0-prebuilt for Hadoop 2.6 from Apache Spark website.

Once the file is downloaded, we have to select appropriate location in the system to save the Spark files.
I have selected the path like, /usr/local/spark/<Different_Spark_Versions>.

Step 4: Create a folder using mkdir command to save Spark installation files

So, in order to save Spark files, we have to create a folder under /usr/local,

In order to create a folder, I have used following command,

$ sudo mkdir /usr/local/spark

Step 5: Extract tar file content using tar command

After creating a folder, we need to extract the file that we have downloaded from Apache Spark’s website.

I have downloaded and placed the file at,/home/javadeveloperzone/Desktop

So I have to execute the following command to extract the file,

$ tar -xvf spark-2.2.0-bin-hadoop2.6.tgz

Step 6: Move folder content using mv command

Now we have to move the extracted folder to the location at which we want to save (in our case /usr/local/spark/).

I have used below command to move the extracted files to./usr/local/spark/

$ sudo mv 2.0.0-bin-hadoop2.6 /usr/local/spark

Step 7: Set permission using chmod command on spark folder

Once the files are moved, it is time to get appropriate permissions on the folder,

I have used following command to set appropriate permission.

$ sudo chmod -R 777 /usr/local/spark/2.0.0-bin-hadoop2.6

Now the setup part is almost done, we will start the configuration part,

Step 8: Configuring Apache Spark related environment variable:

In the configuration step, we have to set SPARK_HOMEvariable. In order to set the variable, we have to add them in $HOME/.bashrcfile.

I have used gedit $HOME/.bashrccommand to open the file, you can use any of your favorite editors to open the file and append the below lines to that file.

#SPARK_HOME

export SPARK_HOME=/usr/local/spark/2.0.0-bin-hadoop2.6

export PATH=$PATH:$SPARK_HOME/bin

Once the modifications are completed, save that file.
After performing modifications, my .bashrcfile looks like,

Step 9: Run/Apply bashrc file changes

Now we have to execute the below command so that the SPARK_HOMEvariable will take effect,

$source $HOME/.bashrc

Step 10: Verify Spark installation by running a sample example

To verify spark installation, we are going to run a pre-built spark example. i.e SparkPi
you can run the example using following command.

$SPARK_HOME/bin/run-example SparkPi 4 10

After execution of SparkPi example, Pi’s values will be printed as shown in the following snapshot.

Installing-Spark-Ubuntu-Linux

Installing-Spark-Ubuntu-Linux

You may also use Spark Word Count example to verify the Spark installation.

Was this post helpful?
Let us know, if you liked the post. Only in this way, we can improve us.
Yes
No
Tags: ,

Leave a Reply

Your email address will not be published. Required fields are marked *