

Solr support many languages where user can indexing/searching their documents.In this article we will discuss how indexing/searching done in one of the most popular language in india which is also nation’s national language.
Solr provide three filters to handle hindi language very well.These are as below:
- IndicNormalizationFilterFactory
- HindiNormalizationFilterFactory
- HindiStemFilterFactory
Let’s look now how we can configure above filterfactories and use them.
Table of Contents
Step 1: Create FieldTye
Create custom fieldType and add above FilterFactory as below.
<fieldType name="text_hindi" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="hindi/synonyms.txt" ignoreCase="true" expand="true"/> <!-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. --> <filter class="solr.StopFilterFactory" words="hindi/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.HindiStemFilterFactory" protected="hindi/protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.IndicNormalizationFilterFactory"/> <filter class="solr.HindiNormalizationFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="hindi/stopwords.txt" ignoreCase="true" enablePositionIncrements="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.HindiStemFilterFactory" protected="hindi/protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.IndicNormalizationFilterFactory"/> <filter class="solr.HindiNormalizationFilterFactory"/> </analyzer> </fieldType>
Step 2: Field Configuration
Now use above created field type in field defination.
<field name="FULL_TEXT" type="text_hindi" indexed="true" stored="true"/>
Step 3: Add documents
Add documents which has hindi content like “जावा डेवलपर ज़ोन बहुत अच्छे ब्लॉग लिखते हैं”. here we are using solr upload document command solr gui dashboard.
Step 4: Search documents
That’s it.To test whether particular document is indexed or not.Fire query like FULL_TEXT:”जावा डेवलपर”.Solr will return one document as below.
Refer Language Analysis , Stemming , Configure stop words , Configure synonyms for more details.