Solr provides functionality to index/search documents based on case sensitive or case insensitive.

In this article we will discuss how we can configure our schema to support case insensitive indexing/searching.

Solr provide two filter factory to achive this.

  1. solr.lowerCaseFilterfactory
  2. solr.upperCaseFilterFactory

We can use any one of them in our analyzer chain to accomplish.

1. solr.lowerCaseFilterfactory

lower case filter factory used to convert all tokens into lower case. Refer below steps to indexing/searching with loweCaseFilterFactory.

Step 1.1 : Create field type

Add solr.lowerCaseFilterFactory in our fieldtype configuration.

<filter class="solr.LowerCaseFilterFactory"/>

FieldType Configuration

<fieldType name="text_gen_lower_case" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

Step 1.2:Field Configuration

Used above created field type in our field definition.

<field name="FULL_TEXT_LOWER_CASE" type="text_gen_lower_case" indexed="true" stored="true"/>

Step 1.3: Index documents

add some documents to our index. Here we have added through csv update from solr dashboard.Refer below sample data.

id,FULL_TEXT_LOWER_CASE
1,THIS is exAMPle of LowerCase filter FactORY
2,Java developer zone solr blogs
3,JAVA DEVELOPER ZONE SOLR BLOGS

Step 1.4: Search documents

As we have index text content in different case.Fire below query to check solr give proper result or not.

FULL_TEXT_LOWER_CASE:("JAVA DEVELOPER ZONE" AND "JAVA developer ZONE" AND "java developer zone")

It will return two documents which have id 1 and 2 as below.

FULL_TEXT_LOWER_CASE

 

2 solr.upperCaseFilterfactory

upper case filter factory used to convert all tokens into upper case. Refer below steps to indexing/searching with upperCaseFilterFactory.

Step 2.1: Create field type

Add solr.UpperCaseFilterFactory in our fieldtype configuration.

<filter class="solr.UpperCaseFilterFactory"/>

FieldType Configuration

<fieldType name="text_gen_upper_case" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.UpperCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.UpperCaseFilterFactory"/>
      </analyzer>
</fieldType>

Step 2.2:Field Configuration

Used above created field type in our field definition.

<field name="FULL_TEXT_UPPER_CASE" type="text_gen_upper_case" indexed="true" stored="true"/>

Step 2.3: Index documents

add some documents to our index. Here we have added through csv update from solr dashboard.Refer below sample data.

id,FULL_TEXT_UPPER_CASE
4,THIS is exAMPle of LowerCase filter FactORY
5,Java developer zone solr blogs
6,JAVA DEVELOPER ZONE SOLR BLOGS

Step 2.4: Search documents

As we have index text content in different case.Fire below query to check solr give proper result or not.

FULL_TEXT_UPPER_CASE:("JAVA DEVELOPER ZONE" AND "JAVA developer ZONE" AND "java developer zone")

It will return two documents which have id 1 and 2 as below.

FULL_TEXT_UPPER_CASE

NOTE: In Unicode, this transformation may lose information when the upper case character represents more than one lower case character. Use this filter when you require uppercase tokens. Use the LowerCaseFilterFactory for general search matching

Refer UpperCaseFilterFactory , LowerCaseFilterFactory for more details.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *