

Table of Contents
1. Overview
In this article, we will discuss parse/ load large JSON files using Jackson streaming APIs. Jackson provides facility to improve application performance by using it’s streaming API which uses very less memory and low CPU overhead. JsonParser
and JsonGenerator
class is used to read and write JSON content.
2. Jackson Streaming API
Jackson support Streaming
processing also called as Incremental
Processing. It uses very less memory and low processing overhead. We need to create parsers to read JSON file. Parsers in Jackson library are objects used to tokenize JSON content and associate it.
2.1 Jackson maven dependency
<dependency> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-xc</artifactId> <version>1.9.12</version> </dependency>
2.2 Crate JSON Parser
createJsonParser method of JsonFactory class provides facility to create JsonParser from an input file. Here is the code snippets
JsonFactory jsonfactory = new JsonFactory(); JsonParser jsonParser = jsonfactory.createJsonParser(jsonFile);
3. Sample Input
I have 1.5GB of JSON file which contains document and it’s related metadata information. This file contains around 2.5M records and I want to index this file in Elastic Search and Solr to do some analytics. Here are few records from an input file.
[ { "documentId" : "1", "docType" : "pdf", "docAuthor" : "Java Developer Zone", "docTitle" : "Java Blog", "isParent" : true, "parentDocId" : 0, "docLanguage" : ["en","fr"] }, { "documentId" : "2", "docType" : "pdf", "docAuthor" : "Java Developer Zone", "docTitle" : "Spring boot Blog", "isParent" : true, "parentDocId" : 0, "docLanguage" : ["en","fr"] }, { "documentId" : "5", "docType" : "pdf", "docAuthor" : "Java Developer Zone", "docTitle" : "Solr Blog", "isParent" : false, "parentDocId" : 1, "docLanguage" : ["fr","slovak"] }, { "documentId" : "8", "docType" : "pdf", "docAuthor" : "Java Developer Zone", "docTitle" : "Elastic Search Blog", "isParent" : false, "parentDocId" : 1, "docLanguage" : ["en","czech"] } ]
4. Example
4.1 JsonStreamingJacksonExample
package com.javadeveloperzone; import org.codehaus.jackson.JsonFactory; import org.codehaus.jackson.JsonParser; import org.codehaus.jackson.JsonToken; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; public class JsonStreamingJacksonExample { public static final String DOC_LANGUAGE = "docLanguage"; public static final String PARENT_DOC_ID = "parentDocId"; public static final String DOC_TITLE = "docTitle"; public static final String IS_PARENT = "isParent"; public static final String DOC_AUTHOR = "docAuthor"; public static final String DOC_TYPE = "docType"; public static final String DOCUMENT_ID = "documentId"; public static void main(String[] args) { JsonStreamingJacksonExample jsonStreamingJacksonExample = new JsonStreamingJacksonExample(); String jsonFilePath = "H:\\Work\\Data\\sample.json"; jsonStreamingJacksonExample.process(jsonFilePath); } public void process(String jsonFilePath){ File jsonFile = new File(jsonFilePath); JsonFactory jsonfactory = new JsonFactory(); //init factory try { int numberOfRecords = 0; JsonParser jsonParser = jsonfactory.createJsonParser(jsonFile); //create JSON parser Document document = new Document(); JsonToken jsonToken = jsonParser.nextToken(); while (jsonToken!= JsonToken.END_ARRAY){ //Iterate all elements of array String fieldname = jsonParser.getCurrentName(); //get current name of token if (DOCUMENT_ID.equals(fieldname)) { jsonToken = jsonParser.nextToken(); //read next token document.setDocumentId(Integer.parseInt(jsonParser.getText())); } if (DOC_TYPE.equals(fieldname)) { jsonToken = jsonParser.nextToken(); document.setDocType(jsonParser.getText()); } if (DOC_AUTHOR.equals(fieldname)) { jsonToken = jsonParser.nextToken(); document.setDocAuthor(jsonParser.getText()); } if (DOC_TITLE.equals(fieldname)) { jsonToken = jsonParser.nextToken(); document.setDocTitle(jsonParser.getText()); } if (IS_PARENT.equals(fieldname)) { jsonToken = jsonParser.nextToken(); document.setParent(jsonParser.getBooleanValue()); } if (PARENT_DOC_ID.equals(fieldname)) { jsonToken = jsonParser.nextToken(); document.setParentDocId(jsonParser.getIntValue()); } if (DOC_LANGUAGE.equals(fieldname)) { //array type field jsonToken = jsonParser.nextToken(); List<String> docLangs = new ArrayList<>(); //read all elements and store into list while (jsonParser.nextToken() != JsonToken.END_ARRAY) { docLangs.add(jsonParser.getText()); } document.setDocLanguage(docLangs); } if(jsonToken==JsonToken.END_OBJECT){ //do some processing, Indexing, saving in DB etc.. document = new Document(); numberOfRecords++; } jsonToken = jsonParser.nextToken(); } System.out.println("Total Records Found : "+numberOfRecords); } catch (IOException e) { e.printStackTrace(); } } }
4.2 Document
package com.javadeveloperzone; import java.util.List; public class Document { int documentId,parentDocId; String docType,docAuthor,docTitle; boolean isParent; List<String> docLanguage; public int getDocumentId() { return documentId; } @Override public String toString() { return "Document{" + "documentId=" + documentId + ", parentDocId=" + parentDocId + ", docType='" + docType + '\'' + ", docAuthor='" + docAuthor + '\'' + ", docTitle='" + docTitle + '\'' + ", isParent=" + isParent + ", docLanguage=" + docLanguage + '}'; } public void setDocumentId(int documentId) { this.documentId = documentId; } public int getParentDocId() { return parentDocId; } public void setParentDocId(int parentDocId) { this.parentDocId = parentDocId; } public String getDocType() { return docType; } public void setDocType(String docType) { this.docType = docType; } public String getDocAuthor() { return docAuthor; } public void setDocAuthor(String docAuthor) { this.docAuthor = docAuthor; } public String getDocTitle() { return docTitle; } public void setDocTitle(String docTitle) { this.docTitle = docTitle; } public boolean isParent() { return isParent; } public void setParent(boolean parent) { isParent = parent; } public List<String> getDocLanguage() { return docLanguage; } public void setDocLanguage(List<String> docLanguage) { this.docLanguage = docLanguage; } }
4.3 Output
Total Records Found : 2587800 Total Time Taken : 280 secs
5. Conclusion
In this article, we have discussed Jackson Streaming API and parse large JSON file and process it. JsonFactory and JsonParser class are used to read JSON file as a stream.
6. References
7. Source Code
Parse Large Json File Jackson Example
You can also check our Git repository for Parse Large JSON File Jackson Example and other useful examples.