A regular expression is notation for specifying a set of strings. e.g., the set of all valid email addresses or the set of all binary strings with an even number of 1s.

There are five basic operations for creating regular expressions, as below,

There are five basic operations for creating regular expressions, as below,

Operation Regular Expression Yes No Concatenation aabaabaabaabevery other string Logical OR

(Alternation)aa | baabaa

baabevery other string Replication

(Kleene closure)ab*aaa

aba

abbaÎµ ab

ababaGrouping a(a|b)aabaaaab

abaabevery other string Wildcard a..aabba

abaaaaaaaaa

*Concatenation*: the simplest type of regular expression is formed by concatenating a bunch of symbols together, one after the other, like`aabaab`. This regular expression matches only the single string`aabaab`. We can perform simple spell checking by using the concatenation operation. For example, we could form the regular expression`niether`and then for each word in a dictionary check whether that word matches the regular expression. Presumably no word in the dictionary would match, and we would conclude that`niether`is misspelled.*Logical OR*: the logical OR operator enables us to choose from one of several possibilities. For example, the regular expression`aa | baab`matches exactly two strings`aa`and`baab`. Many spam filters (e.g., SpamAssassin) work by searching for a long list of common spamming terms. They might form a regular expression such as`AMAZING | GUARANTEE | Viagra`. The logical OR operator enables us to specify many strings with a single regular expression. For example, if our phone number is 734-8527, we might like to know whether it spells out any word on the phonepad (2 = abc, 3 = def, 4 = ghi, 5 = jkl, 6 = mno, 7 = prs, 8 = tuv, 9 = wxy). The following regular expression specifies all of the 3^7 possible combinations`(p|r|s)(d|e|f)(g|h|i)(t|u|v)(j|k|l)(a|b|c)(p|r|s)`. It turns out that the only English word that matches is the word`regular`. (Replace this example with decoding an IM message that uses the "phone code.")*Replication*: the replication operator enables us to specify infinitely many possibilities. For example, the regular expression`ab*a`matches`aa`,`aba`,`abba`,`abbba`, an so forth. Note that 0 replications of`b`are permitted.*Grouping*: the grouping operator enables us to specify precedence to the various operators. The replication operator has the highest precedence, then concatenation, then logical OR. If we want to specify the set of strings`a`,`aba`,`ababa`,`abababa`, and so forth, we must write`(ab)*a`to indicate that the`ab`pattern must be replicated together.*Wildcard*: the wildcard symbol matches exactly one occurrence of any single character.

The first four basic operations above (concatenation, logical or, replication, grouping) are the theoretical minimum needed to describe regular expressions. Most programming environments support additional operations for convenience (including the wildcard operation), and Java is no exception. The table below includes some of the highlights.

Operation Java Regular Expression Yes No One or more a(bc)+deabcdeabcbcdeadeabcOnce or not at all a(bc)?deade

abcdeabc

abcbcdeCharacter classes [a-m]*blackmailimbecileabove

belowNegation of character classes [^aeiou]b

ca

eExactly N times [^aeiou]{6}rhythmsyzygyrhythmsallowedBetween M and N times [a-z]{4,6}spider

tigerjellyfish

cowWhitespace characters [a-z\s]*hellohellosay helloOthello

2hello

## Comments

## Post a Comment