Configure stemming in solr

Solr provide option to configure stemming at the time of indexing as well as in searching.

In this post we will discuss what is stemming , how to setup stemming on a field and how it’s behave.

Table of Contents

Basics of Stemming
Stemming Configuration
- Step 1 : Create field type or change existing one
  - FieldType configuration :
- Step 2 : Field configuration:
Solr Stemming algorithm implementations:
Was this post helpful?

Basics of Stemming

” Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form.”

To quickly explain stemming in the context of Solr, lets take an example. Consider that you have the following documents uploaded in a field called FULL_TEXT within your Solr core:

this is testing of our passion
Site has been tested by QA team.
All test cases run successfully.

When stemming is not setup on the FULL_TEXT field (containing the documents above), a Solr query searching on the term “test” (so essentially a search parameter of q?FULL_TEXT:run) will return only the 3rd document , while if stemming is setup on the FULL_TEXT field, all or a subset of the 3 documents will be returned as part of the search result set. How many of these documents will be returned with stemming enabled depends on the stemming algorithm being applied.

Stemming Configuration

Step 1 : Create field type or change existing one

We need to add Filter called PorterStemFilterFactory in our field type defination to enable stemming while indexing or searching.There are more filters available for stemming that we discussed later in this post.

<filter class="solr.PorterStemFilterFactory"/>

FieldType configuration :

<fieldType name="text_gen_stem" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
</fieldType>

Step 2 : Field configuration:

We are using above created field type in our field definition.

<field name="FULL_TEXT" type="text_gen_stem" indexed="true" stored="true"/>

That’s it.Now to cross verify our Stemming configuration do following.

Select solr core name from drop down list
Click on Analysis.
Select field name that we have created earlier.
Enter text in Field Value(query) like “testing” and click on analysis value.

Solr Stemming algorithm implementations:

There are a few flavors of stemming algorithms supported by Solr, some are more aggressive than others, these are:
1. SnowballPorterFilterFactory
2. PorterStemFilterFactory
3. HunspellStemFilterFactory
4. KStemFilterFactory

Refer HunspellStemFilter , KStemFilter , SnowballPorterStemmerFilter for more details.

Was this post helpful?

Let us know if you liked the post. That’s the only way we can improve.

Tags: indexing, porter-stemmer, searching, solr, stemming

Java Developer Zone

http://javadeveloperzone.com

JavaDeveloperZone is a group of innovative software developers. We are experienced in, ● Java Software Development ● Java web development ● Big Data development ● Data analytics ● Artificial Intelligence Development Our contributions will help Java developers and make development journey easy. Feel free to ask any questions and suggestions. Always have space for improvement! Feel free to Contact us for any software development services.

Configure stemming in solr

Basics of Stemming

Stemming Configuration

Step 1 : Create field type or change existing one

FieldType configuration :

Step 2 : Field configuration:

Solr Stemming algorithm implementations:

Was this post helpful?

Related Articles

Solr index document from database – Data Import handler

Solr Query for compare two date fields

Solr Regular expression part-2

Leave a Reply Cancel reply