Solr support regular expression search support.The Solr/Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators.

Here we discussed some of the standard operators that solr/lucene support and basic usages of that operators.

Standard operators

Anchoring

Lucene’s patterns are always anchored so no need to write ^ to indicate the beginning or $ to indicate the end. The pattern provided must match the entire string. For string “solrdev”:

so.*     # match
solrde     # no match

Allowed characters

Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped. The standard reserved characters are:

. ? + * | { } [ ] ( ) " \

If you enable optional features (see below) then these characters may also be reserved:

# @ & < >  ~

Any reserved character can be escaped with a backslash “\*” including a literal backslash character: “\\”

Additionally, any characters (except double quotes) are interpreted literally when surrounded by double quotes:

java"@developer.com"

Match any character

The period “.” can be used to represent any character. For string “solrdev”:

solr...   # match
s.l.d.v   # match

One-or-more

The plus sign “+” can be used to repeat the preceding shortest pattern once or more times. For string “sssooolllrrr”:

s+o+l+r+        # match
ss+oo+ll+rr+      # match
s+.+        # match
ss+oooo+     # no match

Zero-or-more

The asterisk “*” can be used to match the preceding shortest pattern zero-or-more times. For string “mmmnnn”:

m*n*        # match
m*n*o*      # match
.*nnn.*     # match
mmm*nnn*    # match

Zero-or-one

The question mark “?” makes the preceding shortest pattern optional. It matches zero or one times. For string “yyyzzz”:

yyy?zzz?    # match
yyy?zzzz?  # match
.....?.?    # match
yy?zz?      # no match

Min-to-max

Curly brackets “{}” can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. The allowed forms are:

{4}     # repeat exactly 4 times
{3,6}   # repeat at least thrice and at most 6 times
{2,}    # repeat at least twice

For string “aaabbb”:

a{3}b{3}        # match
a{2,4}b{2,4}    # match
a{2,}b{2,}      # match
.{3}.{3}        # match
a{4}b{4}        # no match
a{4,6}b{4,6}    # no match
a{4,}b{4,}      # no match

Click solr-regular-expression-part-2 to read Solr Regular expression part-2.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *