1. Overview

In this article, we will discuss parse/load JSON file using GSON streaming API.  GSON Streaming api provide facility to read and write large json objects using JsonReader and JsonWriter classes which is available from GSON version 1.6 and above. This streaming approach is very useful in situations where it is not desirable to load a complete object model in memory to avoid out of memory exception when reading large JSON files.

2. GSON Streaming API

GSON streaming is the most powerful approach to process JSON. JsonReader and JsonWriter class is the heart of GSON streaming API. we can use GSON streaming api in a case where entire JSON file/ objects have not fit into memory or we do not have complete data at a time.

2.1 GSON maven dependency

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.2.4</version>
</dependency>

2.2 Crate JsonReader

We need to pass Reader instance to JsonReader constructor. Here is the code snippets

InputStreamReader streamReader = new InputStreamReader(new FileInputStream(jsonFilePath), StandardCharsets.UTF_8);
JsonReader jsonReader1 = new JsonReader(streamReader);

3. Sample Input

I have 1.5GB of JSON file which contains document and it’s related metadata information. This file contains around 2.5M records and I want to index this file in Elastic Search and Solr to do some analytics. Here are few records from an input file.

[ 
    { 
      "documentId" : "1",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Java Blog",
    "isParent" : true,
    "parentDocId" : 0,
      "docLanguage" : ["en","fr"]
    },
    { 
      "documentId" : "2",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Spring boot Blog",
    "isParent" : true,
    "parentDocId" : 0,
      "docLanguage" : ["en","fr"]
    },
  { 
      "documentId" : "5",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Solr Blog",
    "isParent" : false,
    "parentDocId" : 1,
      "docLanguage" : ["fr","slovak"]
    },
    { 
      "documentId" : "8",
      "docType" : "pdf",
    "docAuthor" : "Java Developer Zone",
      "docTitle" : "Elastic Search Blog",
    "isParent" : false,
    "parentDocId" : 1,
      "docLanguage" : ["en","czech"]
    }
]

4. Example

4.1 JsonStreamingGsonExample

package com.javadeveloperzone;

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.stream.JsonReader;

import java.io.*;
import java.nio.charset.StandardCharsets;

public class JsonStreamingGsonExample {

    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        JsonStreamingGsonExample jsonStreamingGsonExample = new JsonStreamingGsonExample();
        String jsonFilePath = "H:\\Work\\Data\\sample.json";
        jsonStreamingGsonExample.parse(jsonFilePath);
        System.out.println("Total Time Taken : "+(System.currentTimeMillis() - start)/1000 + " secs");
    }

    public void parse(String jsonFilePath){

        //create JsonReader object and pass it the json file,json source or json text.
        try(JsonReader jsonReader = new JsonReader(
                new InputStreamReader(
                        new FileInputStream(jsonFilePath), StandardCharsets.UTF_8))) {

            Gson gson = new GsonBuilder().create();

            jsonReader.beginArray(); //start of json array
            int numberOfRecords = 0;
            while (jsonReader.hasNext()){ //next json array element
                Document document = gson.fromJson(jsonReader, Document.class);
                //do something real
//                System.out.println(document);
                numberOfRecords++;
            }
            jsonReader.endArray();
            System.out.println("Total Records Found : "+numberOfRecords);
        }
        catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

4.2 Document

package com.javadeveloperzone;

import java.util.List;

public class Document {
    int documentId,parentDocId;
    String docType,docAuthor,docTitle;
    boolean isParent;
    List<String> docLanguage;

    public int getDocumentId() {
        return documentId;
    }

    @Override
    public String toString() {
        return "Document{" +
                "documentId=" + documentId +
                ", parentDocId=" + parentDocId +
                ", docType='" + docType + '\'' +
                ", docAuthor='" + docAuthor + '\'' +
                ", docTitle='" + docTitle + '\'' +
                ", isParent=" + isParent +
                ", docLanguage=" + docLanguage +
                '}';
    }

    public void setDocumentId(int documentId) {
        this.documentId = documentId;
    }

    public int getParentDocId() {
        return parentDocId;
    }

    public void setParentDocId(int parentDocId) {
        this.parentDocId = parentDocId;
    }

    public String getDocType() {
        return docType;
    }

    public void setDocType(String docType) {
        this.docType = docType;
    }

    public String getDocAuthor() {
        return docAuthor;
    }

    public void setDocAuthor(String docAuthor) {
        this.docAuthor = docAuthor;
    }

    public String getDocTitle() {
        return docTitle;
    }

    public void setDocTitle(String docTitle) {
        this.docTitle = docTitle;
    }

    public boolean isParent() {
        return isParent;
    }

    public void setParent(boolean parent) {
        isParent = parent;
    }

    public List<String> getDocLanguage() {
        return docLanguage;
    }

    public void setDocLanguage(List<String> docLanguage) {
        this.docLanguage = docLanguage;
    }
}

4.3 Output

Total Records Found : 2587800
Total Time Taken : 288 secs

5. Conclusion

In this article, we have discussed GSON Streaming API and parse large JSON file and process it. JsonReader class is used to read JSON file as a stream.

6. References

7. Source Code

Parse Large Json File GSON Example

Was this post helpful?
Let us know, if you liked the post. Only in this way, we can improve us.
Yes
No

Leave a Reply

Your email address will not be published. Required fields are marked *