Solr support many languages where user can indexing/searching their documents.In this article we will discuss how indexing/searching done in one of the most popular language in india which is also nation’s national language.

Solr provide three filters to handle hindi language very well.These are as below:

  1. IndicNormalizationFilterFactory
  2. HindiNormalizationFilterFactory
  3. HindiStemFilterFactory

Let’s look now how we can configure above filterfactories and use them.

Step 1: Create FieldTye

Create custom fieldType and add above FilterFactory as below.

<fieldType name="text_hindi" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="hindi/synonyms.txt" ignoreCase="true" expand="true"/>
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory" words="hindi/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.HindiStemFilterFactory" protected="hindi/protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.IndicNormalizationFilterFactory"/>
    <filter class="solr.HindiNormalizationFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="hindi/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.HindiStemFilterFactory" protected="hindi/protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.IndicNormalizationFilterFactory"/>
    <filter class="solr.HindiNormalizationFilterFactory"/>
  </analyzer>
</fieldType>

Step 2: Field Configuration

Now use above created field type in field defination.

<field name="FULL_TEXT" type="text_hindi" indexed="true" stored="true"/>

Step 3: Add documents

Add documents which has hindi content like “जावा डेवलपर ज़ोन बहुत अच्छे ब्लॉग लिखते हैं”. here we are using solr upload document command  solr gui dashboard.

Solr Hindi document Indexing

Step 4: Search documents

That’s it.To test whether particular document is indexed or not.Fire query like FULL_TEXT:”जावा डेवलपर”.Solr will return one document as below.

Solr Hindi Document Searching

Refer Language Analysis , Stemming , Configure stop words , Configure synonyms for more details.

 

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *