Configure stop word in solr is easy.Most written text has a lot of functional words, like “this”, “that”, or “is” which are important to the person reading the content as they help it flow in a cohesive manner, but aren’t necessary as important to someone searching the content of your documents or web tutorials.
This is generally done in one of two ways.Either ignoring those utility words when they are present in the search query or second one is to remove these utility words at time of indexing.
By default solr enable this features in many field types like text_general and so on. But stop words file does not contain any stopwords.
Let’s have a look at the complete example of configuration and indexing process with stopwords and how it’s behave.
Table of Contents
Step 1 : Create field type or change existing one
We need to add Filter called StopFilterFactory in our field type defination to remove stop words while indexing and searching.
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
It has two configuration parameter called ignoreCase and words.
It specify the case sensivity. If it’s value is true then it does not check for case sensitivity.
Need to specify stopword file path.if we only specify filename then solr look that file into core’s conf directory.
FieldType configuration :
<fieldType name="text_gen_stopword" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Step 2 : Field configuration:
We are using above created field type in our field defination.
<field name="FULL_TEXT" type="text_gen_stopword" indexed="true" stored="true"/>
Step 3 : Configure stop words:
Add below stopwords in your stopwords.txt file.It should be one stopword per line.
this is the that
That’s it.Now to cross verify our stopwords configure do following.
- Select solr core name from drop down list
- Click on Analysis.
- Select field name that we have created earlier.
- Enter text like “this is example of solr stop word” and click on analysis value.
- you will see as below that stop words has been removed by solr stopFilterFactory class.