A regular expression is notation for specifying a set of strings. e.g., the set of all valid email addresses or the set of all binary strings with an even number of 1s.
There are five basic operations for creating regular expressions, as below,
There are five basic operations for creating regular expressions, as below,
Operation Regular Expression Yes No Concatenation aabaab aabaab every other string Logical OR
(Alternation)aa | baab aa
baabevery other string Replication
(Kleene closure)ab*a aa
aba
abbaε
ab
ababaGrouping a(a|b)aab aaaab
abaabevery other string Wildcard a..a abba
abaaaa
aaaaa
- Concatenation: the simplest type of regular expression is formed by concatenating a bunch of symbols together, one after the other, like aabaab. This regular expression matches only the single stringaabaab. We can perform simple spell checking by using the concatenation operation. For example, we could form the regular expression niether and then for each word in a dictionary check whether that word matches the regular expression. Presumably no word in the dictionary would match, and we would conclude that niether is misspelled.
- Logical OR: the logical OR operator enables us to choose from one of several possibilities. For example, the regular expression aa | baab matches exactly two strings aa and baab. Many spam filters (e.g., SpamAssassin) work by searching for a long list of common spamming terms. They might form a regular expression such as AMAZING | GUARANTEE | Viagra. The logical OR operator enables us to specify many strings with a single regular expression. For example, if our phone number is 734-8527, we might like to know whether it spells out any word on the phonepad (2 = abc, 3 = def, 4 = ghi, 5 = jkl, 6 = mno, 7 = prs, 8 = tuv, 9 = wxy). The following regular expression specifies all of the 3^7 possible combinations (p|r|s)(d|e|f)(g|h|i)(t|u|v)(j|k|l)(a|b|c)(p|r|s). It turns out that the only English word that matches is the word regular. (Replace this example with decoding an IM message that uses the "phone code.")
- Replication: the replication operator enables us to specify infinitely many possibilities. For example, the regular expression ab*a matches aa, aba, abba, abbba, an so forth. Note that 0 replications of b are permitted.
- Grouping: the grouping operator enables us to specify precedence to the various operators. The replication operator has the highest precedence, then concatenation, then logical OR. If we want to specify the set of strings a, aba, ababa, abababa, and so forth, we must write (ab)*a to indicate that the ab pattern must be replicated together.
- Wildcard: the wildcard symbol matches exactly one occurrence of any single character.
The first four basic operations above (concatenation, logical or, replication, grouping) are the theoretical minimum needed to describe regular expressions. Most programming environments support additional operations for convenience (including the wildcard operation), and Java is no exception. The table below includes some of the highlights.
Operation Java Regular Expression Yes No One or more a(bc)+de abcde
abcbcdeade
abcOnce or not at all a(bc)?de ade
abcdeabc
abcbcdeCharacter classes [a-m]* blackmail
imbecileabove
belowNegation of character classes [^aeiou] b
ca
eExactly N times [^aeiou]{6} rhythm
syzygyrhythms
allowedBetween M and N times [a-z]{4,6} spider
tigerjellyfish
cowWhitespace characters [a-z\s]*hello hello
say helloOthello
2hello
Comments
Post a Comment