1. Overview

In this article, we will discuss parse/ load large JSON files using Jackson streaming APIs. Jackson provides facility to improve application performance by using it’s streaming API which uses very less memory and low CPU overhead. JsonParser and JsonGenerator class is used to read and write JSON content.

2. Jackson Streaming API

Jackson support Streaming processing also called as Incremental Processing. It uses very less memory and low processing overhead. We need to create parsers to read JSON file. Parsers in Jackson library are objects used to tokenize JSON content and associate it.

2.1 Jackson maven dependency

<dependency>
   <groupId>org.codehaus.jackson</groupId>
   <artifactId>jackson-xc</artifactId>
   <version>1.9.12</version>
</dependency>

2.2 Crate JSON Parser

createJsonParser method of JsonFactory class provides facility to create JsonParser from an input file. Here is the code snippets

JsonFactory jsonfactory = new JsonFactory();
JsonParser jsonParser = jsonfactory.createJsonParser(jsonFile);

3. Sample Input

I have 1.5GB of JSON file which contains document and it’s related metadata information. This file contains around 2.5M records and I want to index this file in Elastic Search and Solr to do some analytics. Here are few records from an input file.

[ 
    { 
      "documentId" : "1",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Java Blog",
    "isParent" : true,
    "parentDocId" : 0,
      "docLanguage" : ["en","fr"]
    },
    { 
      "documentId" : "2",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Spring boot Blog",
    "isParent" : true,
    "parentDocId" : 0,
      "docLanguage" : ["en","fr"]
    },
  { 
      "documentId" : "5",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Solr Blog",
    "isParent" : false,
    "parentDocId" : 1,
      "docLanguage" : ["fr","slovak"]
    },
    { 
      "documentId" : "8",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Elastic Search Blog",
    "isParent" : false,
    "parentDocId" : 1,
      "docLanguage" : ["en","czech"]
    }
]

4. Example

4.1 JsonStreamingJacksonExample

package com.javadeveloperzone;


import org.codehaus.jackson.JsonFactory;
import org.codehaus.jackson.JsonParser;
import org.codehaus.jackson.JsonToken;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class JsonStreamingJacksonExample {

    public static final String DOC_LANGUAGE = "docLanguage";
    public static final String PARENT_DOC_ID = "parentDocId";
    public static final String DOC_TITLE = "docTitle";
    public static final String IS_PARENT = "isParent";
    public static final String DOC_AUTHOR = "docAuthor";
    public static final String DOC_TYPE = "docType";
    public static final String DOCUMENT_ID = "documentId";

    public static void main(String[] args) {
        JsonStreamingJacksonExample jsonStreamingJacksonExample = new JsonStreamingJacksonExample();
        String jsonFilePath = "H:\\Work\\Data\\sample.json";
        jsonStreamingJacksonExample.process(jsonFilePath);
    }

    public void process(String jsonFilePath){
        File jsonFile = new File(jsonFilePath);
        JsonFactory jsonfactory = new JsonFactory(); //init factory
        try {
            int numberOfRecords = 0;
            JsonParser jsonParser = jsonfactory.createJsonParser(jsonFile); //create JSON parser
            Document document = new Document();
            JsonToken jsonToken = jsonParser.nextToken();
            while (jsonToken!= JsonToken.END_ARRAY){ //Iterate all elements of array
                String fieldname = jsonParser.getCurrentName(); //get current name of token
                if (DOCUMENT_ID.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken(); //read next token
                    document.setDocumentId(Integer.parseInt(jsonParser.getText()));
                }

                if (DOC_TYPE.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken();
                    document.setDocType(jsonParser.getText());
                }

                if (DOC_AUTHOR.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken();
                    document.setDocAuthor(jsonParser.getText());
                }

                if (DOC_TITLE.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken();
                    document.setDocTitle(jsonParser.getText());
                }
                if (IS_PARENT.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken();
                    document.setParent(jsonParser.getBooleanValue());
                }
                if (PARENT_DOC_ID.equals(fieldname)) {
                    jsonToken = jsonParser.nextToken();
                    document.setParentDocId(jsonParser.getIntValue());
                }

                if (DOC_LANGUAGE.equals(fieldname)) {  //array type field
                    jsonToken = jsonParser.nextToken();
                    List<String> docLangs = new ArrayList<>(); //read all elements and store into list
                    while (jsonParser.nextToken() != JsonToken.END_ARRAY) {
                        docLangs.add(jsonParser.getText());
                    }
                    document.setDocLanguage(docLangs);
                }
                if(jsonToken==JsonToken.END_OBJECT){

                    //do some processing, Indexing, saving in DB etc..
                    document = new Document();
                    numberOfRecords++;
                }
                jsonToken = jsonParser.nextToken();
            }

            System.out.println("Total Records Found : "+numberOfRecords);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

4.2 Document

package com.javadeveloperzone;

import java.util.List;

public class Document {
    int documentId,parentDocId;
    String docType,docAuthor,docTitle;
    boolean isParent;
    List<String> docLanguage;

    public int getDocumentId() {
        return documentId;
    }

    @Override
    public String toString() {
        return "Document{" +
                "documentId=" + documentId +
                ", parentDocId=" + parentDocId +
                ", docType='" + docType + '\'' +
                ", docAuthor='" + docAuthor + '\'' +
                ", docTitle='" + docTitle + '\'' +
                ", isParent=" + isParent +
                ", docLanguage=" + docLanguage +
                '}';
    }

    public void setDocumentId(int documentId) {
        this.documentId = documentId;
    }

    public int getParentDocId() {
        return parentDocId;
    }

    public void setParentDocId(int parentDocId) {
        this.parentDocId = parentDocId;
    }

    public String getDocType() {
        return docType;
    }

    public void setDocType(String docType) {
        this.docType = docType;
    }

    public String getDocAuthor() {
        return docAuthor;
    }

    public void setDocAuthor(String docAuthor) {
        this.docAuthor = docAuthor;
    }

    public String getDocTitle() {
        return docTitle;
    }

    public void setDocTitle(String docTitle) {
        this.docTitle = docTitle;
    }

    public boolean isParent() {
        return isParent;
    }

    public void setParent(boolean parent) {
        isParent = parent;
    }

    public List<String> getDocLanguage() {
        return docLanguage;
    }

    public void setDocLanguage(List<String> docLanguage) {
        this.docLanguage = docLanguage;
    }
}

4.3 Output

Total Records Found : 2587800
Total Time Taken : 280 secs

5. Conclusion

In this article, we have discussed Jackson Streaming API and parse large JSON file and process it. JsonFactory and JsonParser class are used to read JSON file as a stream.

6. References

7. Source Code

Parse Large Json File Jackson Example

Was this post helpful?
Let us know, if you liked the post. Only in this way, we can improve us.
Yes
No

Leave a Reply

Your email address will not be published. Required fields are marked *