Debug Hadoop Map Reduce Code

Table of Contents

1. Overview
2. Development Environment
3. Steps To Debug Code locally
4. Example
5. Build & Debug
6. Output
7. References
8. Source Code
- Was this post helpful?

1. Overview

“Hadoop is a framework which allows us to distributed processing of large data sets across clusters of computers.” As we know Hadoop job submitted to cluster for further execution to achieve our organizational goals. Sometimes we as a Big Data Developer requires to debug our logic. There are many ways to debug our logic like include job counters to track required pieces of information, Print some error messages on console or logs to check where the things go wrong.

What about if you are able to debug your Hadoop map reduce job as a normal code in your code editor. It’s easy and more productive compared to other approaches.

In this article, we will discuss how to debug Hadoop map reduce code in a local environment and get the output in the local file itself. Here we have used IntelliJ idea to debug.

2. Development Environment

Hadoop: 3.1.1

Java: Oracle JDK 1.8

IDE: IntelliJ Idea 2018.3

3. Steps To Debug Code locally

3.1 Add hadoop-mapreduce-client-jobclient maven dependency

The very first step to debug Hadoop map reduce code locally is to add hadoop-mapreduce-client-jobclient maven dependency.

3.2 Set local file system

Set eitherlocal or file:///in fs.defaultFS job configuration parameters.

conf.set("fs.defaultFS", "local");
conf.set("fs.defaultFS", "file:///");

3.2 Set Number of mappers and reducers

The final step is to set the number of mappers and reducers to 1. These properties are used to launch only a single mapper and reducer of our job.

conf.set("mapreduce.job.maps","1");
conf.set("mapreduce.job.reduces","1");

4. Example

Here is the complete example of Multiple Outputs with locally debug enable.

4.1 pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>HadoopMapReduceDebugExample</groupId>
    <artifactId>HadoopMapReduceDebugExample</artifactId>
    <version>1.0-SNAPSHOT</version>
    <description>Hadoop MapReduce Debug Example</description>
    <build>
        <finalName>HadoopMapReduceDebugExample</finalName>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <useSystemClassLoader>false</useSystemClassLoader>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.1.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>3.1.1</version>
        </dependency>
    </dependencies>
</project>

4.2 MultipleOutpusDebugDriver.java

package com.javadeveloperzone;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MultipleOutputsDebugDriver extends Configured implements Tool {

    public static final String OTHER = "OTHER";
    public static final String MUMBAI = "MUMBAI";
    public static final String DELHI = "DELHI";
    public static final String AHMEDABAD = "AHMEDABAD";
    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new Configuration(),
                new MultipleOutputsDebugDriver(), args);
        System.exit(exitCode);
    }
    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.out.println("Please provid two arguments :");
            System.out.println("[ 1 ] Input dir path");
            System.out.println("[ 2 ] Output dir path");
            return -1;
        }
        Configuration c=new Configuration();
        String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
        Path input=new Path(files[0]);
        Path output=new Path(files[1]);
        Configuration conf=new Configuration();
        conf.set("fs.defaultFS", "local");
//        conf.set("fs.defaultFS", "file:///");
        conf.set("mapreduce.job.maps","1");
        conf.set("mapreduce.job.reduces","1");
        Job job=Job.getInstance(conf,"Debug Hadoop MapReduce Code Example");
        job.setJarByClass(MultipleOutputsDebugDriver.class);
        job.setMapperClass(MultipleOutputsMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setNumReduceTasks(0);
        FileInputFormat.addInputPath(job, input);
        FileOutputFormat.setOutputPath(job, output);
        MultipleOutputs.addNamedOutput(job,"AHMEDABAD", TextOutputFormat.class,Text.class,Text.class);
        MultipleOutputs.addNamedOutput(job,"DELHI", TextOutputFormat.class,Text.class,Text.class);
        MultipleOutputs.addNamedOutput(job,"MUMBAI", TextOutputFormat.class,Text.class,Text.class);
        MultipleOutputs.addNamedOutput(job,"OTHER", TextOutputFormat.class,Text.class,Text.class);
        boolean success = job.waitForCompletion(true);
        return (success?0:1);
    }
}

4.3 MultipleOutpusMapper.java

Refer our Previous MultipleOutputsMapper Example

5. Build & Debug

Our sample code is ready. Set Hadoop Job Input Path and Output Path as a command line arguments.

"sample_input.txt" "HDFS/output"

Refer our previous article for sample input.

It’s time to debug our Hadoop map reduce code for debugging complex logic which helps us to improve productivity.

Set Debugger points in line numbers from where you want to check logic.

Click on Debug Icon on your IntelliJ idea project. It will start debugging the project. If Everything is going will IntelliJ idea will hold the Hadoop map-reduce code in your first debug point.

6. Output

Here I have set two debug points in my project. one is in Driver class and one is in mapper class. Refer below screens.

6.1 Debugger Screen of Driver class

6.2 Debugger Screen of Mapper class

6.3 Job Output locally

Once Hadoop map reduce job completed we will get output in our local file system under job output directory.

7. References

8. Source Code

Hadoop-Map-Reduce-Debug-Example

You can also check our Git repository for Debug Hadoop Map Reduce Code and other useful examples.

Was this post helpful?

Let us know if you liked the post. That’s the only way we can improve.

Tags: bigdata, debug-hadoop-code, debug-hadoop-source-code, debugging-of-mapreduce-jobs, hadoop, map-reduce-debug

Java Developer Zone

http://javadeveloperzone.com

JavaDeveloperZone is a group of innovative software developers. We are experienced in, ● Java Software Development ● Java web development ● Big Data development ● Data analytics ● Artificial Intelligence Development Our contributions will help Java developers and make development journey easy. Feel free to ask any questions and suggestions. Always have space for improvement! Feel free to Contact us for any software development services.