Solr provide facility to create custom transformers.In this article we are going to discuss transformers, why we need it , how we create it , and configure it.

If you need any kind of custom processing before sending the row to Solr, you can write a transformer of your own.

Let us take an example use-case. Suppose, you have a field named “FULL_TEXT” in your schema which is of type=”text_general”. In Database only filePath is stored but we want to index file content. A solution is to write a ReadFileTransformer which read fileContent and pass it to solr for indexing.

Step 1: Write ReadFileTransformer

To write any custom transformers in solr we need to perform following steps.

  1. Add solr-dataimporthandler and slf4j library in project classpath
  2. Need to extends Transformer class
  3. Override it’ transformerRow method
  4. Write Logic to read file and put it into map

package com.javadevzone;
import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImporter;
import org.apache.solr.handler.dataimport.Transformer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Map;

/**
 * Created by JavaDeveloperZone on 7/27/2017.
 */
public class ReadFileTransformer extends Transformer {
    private static Logger LOGGER = LoggerFactory.getLogger(ReadFileTransformer.class);
    @Override
    public Object transformRow(Map<String, Object> row, Context context) {
        List<Map<String, String>> fields = context.getAllEntityFields();
        for (Map<String, String> field : fields) {
            // Check if this field has readFile="true" specified in the data-config.xml
            String trim = field.get("readFile");
            if ("true".equals(trim)){
                String columnName = field.get(DataImporter.COLUMN);
                // Get this field's value from the current row
                Object filePath = row.get(columnName);
                // Read file content and put the updated value back in the current row
                if (filePath != null) {
                    try {
                        Path path = Paths.get(filePath.toString());
                        if (Files.exists(path) && !Files.isDirectory(path)) {
                            byte[] fileContent = Files.readAllBytes(path);
                            row.put(columnName, new String(fileContent,0,fileContent.length));
                        }
                    }catch (Exception e){
                        LOGGER.error("Error while reading file!!! ",e);
                    }
                }
            }
        }
        return row;
    }
}

Step 2 : Configure ReadFileTransformer

Configure ReadFileTransformer in db-data-config.xml as below.

<dataConfig>
<dataSource name="jdbc-1" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/products" user="root" password="root@123" />
    <document name="products">
        <entity name="item" dataSource="jdbc-1" query="select * from item">
            <field column="ID" name="ID" />
            <field column="FULL_TEXT_1" name="FULL_TEXT_1" readFile="true"/>
            <field column="FULL_TEXT_2" name="FULL_TEXT_2" readFile="true"/>
        </entity>
    </document>
</dataConfig>

Step 3 : Configure ReadFileTransformer dependency

Build project dependency and add it to solr core lib directory.

Refer DIHCustomTransformer for more details.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *