Solr support regular expression search support.The Solr/Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators.

In previous article solr-regular-expression-part-1 we have discussed some of the basic operators that solr/lucene supports.

Grouping

Parentheses “()” can be used to form sub-patterns. The quantity operators listed above operate on the shortest previous pattern, which can be a group. For string “ababab”:

(ab)+       # match
ab(ab)+     # match
(..)+       # match
(...)+      # no match
(ab)*       # match
abab(ab)?   # match
ab(ab)?     # no match
(ab){3}     # match
(ab){1,2}   # no match

Alternation

The pipe symbol “|” acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches. The alternation applies to the longest pattern, not the shortest. For string “aabb”:

aabb|bbaa   # match
aacc|bb     # no match
aa(cc|bb)   # match
a+|b+       # no match
a+b+|b+a+   # match
a+(b|c)+    # match

Character classes

Ranges of potential characters may be represented as character classes by enclosing them in square brackets “[]”. A leading ^ negates the character class. The allowed forms are:

[abc]   # 'a' or 'b' or 'c'
[a-c]   # 'a' or 'b' or 'c'
[-abc]  # '-' or 'a' or 'b' or 'c'
[abc\-] # '-' or 'a' or 'b' or 'c'
[^abc]  # any character except 'a' or 'b' or 'c'
[^a-c]  # any character except 'a' or 'b' or 'c'
[^-abc]  # any character except '-' or 'a' or 'b' or 'c'
[^abc\-] # any character except '-' or 'a' or 'b' or 'c'

“the dash “-” indicates a range of characters, unless it is the first character or if it is escaped with a backslash.”

For string “abcd”:

ab[cd]+     # match
[a-d]+      # match
[^a-d]+     # no match

Optional operators

These operators are available by default as the flags parameter defaults to ALL.

Complement

The complement is probably the most useful option. The shortest pattern that follows a tilde “~” is negated. For instance, “ab~cd” means:

Starts with a
Followed by b
Followed by a string of any length that it anything but c
Ends with d
For the string “abcdef”:

ab~df     # match
ab~cf     # match
ab~cdef   # no match
a~(cb)def # match
a~(bc)def # no match

Enabled with the COMPLEMENT or ALL flags.

Interval

The interval option enables the use of numeric ranges, enclosed by angle brackets “<>”. For string: “solr90”:

solr<1-100>     # match
solr<01-100>    # match
solr<001-100>   # no match

Enabled with the INTERVAL or ALL flags.

Intersection

The ampersand “&” joins two patterns in a way that both of them have to match. For string “aaabbb”:

aaa.+&.+bbb     # match
aaa&bbb         # no match

Using this feature usually means that you should rewrite your regular expression.

Any string

The at sign “@” matches any string in its entirety. This could be combined with the intersection and complement above to express “everything except”. For instance:

@&~(solr.+)      # anything except string beginning with "solr"

Click solr-regular-expression-part-1 to ready solr regular expression part-1.

Was this post helpful?

Leave a Reply

Your email address will not be published. Required fields are marked *