Table of Contents
- Installation Environment & Software Prerequisites
- Step 1: Java-JDK Installation:
- Step 2: Scala Installation:
- Step 3: Download Apache Spark:
- Step 4: Create a folder using mkdir command to save Spark installation files
- Step 5: Extract tar file content using tar command
- Step 6: Move folder content using mv command
- Step 7: Set permission using chmod command on spark folder
- Step 8: Configuring Apache Spark related environment variable:
- Step 9: Run/Apply bashrc file changes
- Step 10: Verify Spark installation by running a sample example
In this article, we are going to cover one of the most import installation topics, i.e Installing Apache Spark on Ubuntu Linux.
Installing Apache Spark on Ubuntu Linux is a relatively simple procedure as compared to other Bigdata
Installation Environment & Software Prerequisites
OS : Ubuntu Linux(14.04 LTS) – 64bit
Java : Oracle JDK 1.8
Scala : Scala-2.11.7
Spark : Apache Spark 2.0.0-bin-hadoop2.6
Step 1: Java-JDK Installation:
In one of our previous article, We have provided step by step guide to Install Java in Ubuntu Linux, so we can skip this Java installation step here.
Step 2: Scala Installation:
For Scala installation, we need to select appropriate Scala version which is compatible with Apache Spark.
In our case we are interested in installing Spark 2.0.0, So we are going to install Scala-2.11.7.
In one of our previous article, we have explained steps for Installing Scala on Ubuntu Linux.
Step 3: Download Apache Spark:
Download Spark 2.2.0-prebuilt for Hadoop 2.6 from Apache Spark website.
Once the file is downloaded, we have to select appropriate location in the system to save the Spark files.
I have selected the path like,
Step 4: Create a folder using mkdir command to save Spark installation files
So, in order to save Spark files, we have to create a folder under /usr/local,
In order to create a folder, I have used following command,
$ sudo mkdir /usr/local/spark
Step 5: Extract tar file content using tar command
After creating a folder, we need to extract the file that we have downloaded from Apache Spark’s website.
I have downloaded and placed the file at,
So I have to execute the following command to extract the file,
$ tar -xvf spark-2.2.0-bin-hadoop2.6.tgz
Step 6: Move folder content using mv command
Now we have to move the extracted folder to the location at which we want to save (in our case
I have used below command to move the extracted files to.
$ sudo mv 2.0.0-bin-hadoop2.6 /usr/local/spark
Step 7: Set permission using chmod command on spark folder
Once the files are moved, it is time to get appropriate permissions on the folder,
I have used following command to set appropriate permission.
$ sudo chmod -R 777 /usr/local/spark/2.0.0-bin-hadoop2.6
Now the setup part is almost done, we will start the configuration part,
In the configuration step, we have to set
SPARK_HOMEvariable. In order to set the variable, we have to add them in
I have used
gedit $HOME/.bashrccommand to open the file, you can use any of your favorite editors to open the file and append the below lines to that file.
#SPARK_HOME export SPARK_HOME=/usr/local/spark/2.0.0-bin-hadoop2.6 export PATH=$PATH:$SPARK_HOME/bin
Once the modifications are completed, save that file.
After performing modifications, my
.bashrcfile looks like,
Step 9: Run/Apply bashrc file changes
Now we have to execute the below command so that the
SPARK_HOMEvariable will take effect,
Step 10: Verify Spark installation by running a sample example
To verify spark installation, we are going to run a pre-built spark example. i.e SparkPi
you can run the example using following command.
$SPARK_HOME/bin/run-example SparkPi 4 10
After execution of SparkPi example, Pi’s values will be printed as shown in the following snapshot.
You may also use Spark Word Count example to verify the Spark installation.